# CNN based DQN agent against fixed opponent

As discussed in `5-improving-dqn-architecture.ipynb` we thought of three aspects that might be the root of the agent's not learning to play the game pleasingly:
- Training two DQN agents simultaneously is known to be though, especially when starting from a random initialisation
- The network used was a simple MLP
- The training is not done over enough iterations

In the notebooks `5-improving-dqn-architecture.ipynb` and `6-dqn-using-a-cnn.ipynb`, two alternative networks besides MLP were used.
Whilst these give somewhat satisfactory results when trained for long enough and incentivising moves by giving a reward for making a move, it is still far from perfect.
The iterations were also boosted to a couple of hours on a CUDA GPU, which didn't improve things all that much.

Thus, what is most likely to be an issue is the fact that we are training two agents simultaneously.
This makes it hard to get a good performing agent and makes the target non stationary as both agents evolve over time.
An alternative to this is training an agent for a couple of epochs whilst freezing the other and alternating this between the agents.
This makes the problem to learn more stationary and is known to make learning easier.
What is also done, often in very complex games, is starting from a somewhat smart agent instead of a random one.

Whilst some libraries such as Ray RL lib offer implementations of such a training strategy, the experimental notebook `4-rllib-for-more-learning-control.ipynb` found that even the Ray provided example results in error codes.
Seeing their GitHub page has many open issues, the one we encountered being one of them, we refrain from using a different library considering Tianshou has many algorithms implemented and we have found a way to make things work.

<hr><hr>

## Table of Contents

- Contact information
- Checking requirements
  - Correct Anaconda environment
  - Correct module access
  - Correct CUDA access
- Training two DQN agents on connect four Gym
  - Building the environment
  - Implementing the DQN policy
  - Building agents
  - Function for letting agents learn
  - Function for watching learned agent
  - Doing the experiment
- Discussion

<hr><hr>

## Contact information

| Name             | Student ID | VUB mail                                                  | Personal mail                                               |
| ---------------- | ---------- | --------------------------------------------------------- | ----------------------------------------------------------- |
| Lennert Bontinck | 0568702    | [lennert.bontinck@vub.be](mailto:lennert.bontinck@vub.be) | [info@lennertbontinck.com](mailto:info@lennertbontinck.com) |



<hr><hr>

## Checking requirements

### Correct Anaconda environment

The `rl-project` anaconda environment should be active to ensure proper support. Installation instructions are available on [the GitHub repository of the RL course project and homeworks](https://github.com/pikawika/vub-rl).

In [1]:
####################################################
# CHECKING FOR RIGHT ANACONDA ENVIRONMENT
####################################################

import os
from platform import python_version

print(f"Active environment: {os.environ['CONDA_DEFAULT_ENV']}")
print(f"Correct environment: {os.environ['CONDA_DEFAULT_ENV'] == 'rl-project'}")
print(f"\nPython version: {python_version()}")
print(f"Correct Python version: {python_version() == '3.8.10'}")

Active environment: rl-project
Correct environment: True

Python version: 3.8.10
Correct Python version: True


<hr>

### Correct module access

The following code block will load in all required modules and show if the versions match those that are recommended.

In [2]:
####################################################
# LOADING MODULES
####################################################

# Allow reloading of libraries
import importlib

# Plotting
import matplotlib; print(f"Matplotlib version (3.5.1 recommended): {matplotlib.__version__}")
import matplotlib.pyplot as plt

# Argparser
import argparse

# More data types
import typing
import numpy as np

# Pygame
import pygame; print(f"Pygame version (2.1.2 recommended): {pygame.__version__}")

# Gym environment
import gym; print(f"Gym version (0.21.0 recommended): {gym.__version__}")

# Tianshou for RL algorithms
import tianshou as ts; print(f"Tianshou version (0.4.8 recommended): {ts.__version__}")

# Torch is a popular DL framework
import torch; print(f"Torch version (1.12.0 recommended): {torch.__version__}")

# PPrint is a pretty print for variables
from pprint import pprint

# Our custom connect four gym environment
import sys
sys.path.append('../')
import gym_connect4_pygame.envs.ConnectFourPygameEnvV2 as cfgym
importlib.invalidate_caches()
importlib.reload(cfgym)

# Time for allowing "freezes" in execution
import time;

# Allow for copying objects in a non reference manner
import copy

# Used for updating notebook display
from IPython.display import clear_output

Matplotlib version (3.5.1 recommended): 3.5.1
pygame 2.1.2 (SDL 2.0.18, Python 3.8.10)
Hello from the pygame community. https://www.pygame.org/contribute.html
Pygame version (2.1.2 recommended): 2.1.2
Gym version (0.21.0 recommended): 0.21.0


  from .autonotebook import tqdm as notebook_tqdm


Tianshou version (0.4.8 recommended): 0.4.8
Torch version (1.12.0 recommended): 1.12.0.dev20220520+cu116


<hr>

### Correct CUDA access

The installation instructions specify how to install PyTorch with CUDA 11.6.
The following code block tests if this was done successfully.

In [3]:
####################################################
# CUDA VALIDATION
####################################################

# Check cuda available
print(f"CUDA is available: {torch.cuda.is_available()}")

# Show cuda devices
print(f"\nAmount of connected devices supporting CUDA: {torch.cuda.device_count()}")

# Show current cuda device
print(f"\nCurrent CUDA device: {torch.cuda.current_device()}")

# Show cuda device name
print(f"Cuda device 0 name: {torch.cuda.get_device_name(0)}")

CUDA is available: True

Amount of connected devices supporting CUDA: 1

Current CUDA device: 0
Cuda device 0 name: NVIDIA GeForce GTX 970


<hr><hr>

## Training two DQN agents on connect four Gym

Our connect four gym setup requires two agents, one for each player.
To reduce complexity, agents will always play as the same player, e.g. always as player 1.
It is important to note that connect four is a *solved game*.
According to [The Washington Post](https://www.washingtonpost.com/news/wonk/wp/2015/05/08/how-to-win-any-popular-game-according-to-data-scientists/):

> Connect Four is what mathematicians call a "solved game," meaning you can play it perfectly every time, no matter what your opponent does. You will need to get the first move, but as long as you do so, you can always win within 41 moves.

<hr>

### Building the environment

This code is taken from previous notebooks.
We don't allow invalid moves to make the problem easier for now.

In [4]:
####################################################
# CONNECT FOUR V2 ENVIRONMENT
####################################################

def get_env():
    """
    Returns the connect four gym environment V2 altered for Tianshou and Petting Zoo compatibility.
    Already wrapped with a ts.env.PettingZooEnv wrapper.
    """
    return ts.env.PettingZooEnv(cfgym.env(reward_move= 0, # Set to 1 for reward to make moves (incentivise longer games)
                                          reward_invalid= -3,
                                          reward_draw= 100,
                                          reward_win= 25,
                                          reward_loss= -25,
                                          allow_invalid_move= False))
    
    
# Test the environment
env = get_env()
print(f"Observation space: {env.observation_space}")
print(f"\nAction space: {env.action_space}")

# Reset the environment to start from a clean state, returns the initial observation
observation = env.reset()

print("\n Initial player id:")
print(observation["agent_id"])

print("\n Initial observation:")
print(observation["obs"])

print("\n Initial mask:")
print(observation["mask"])

# Clean unused variables
del observation
del env

Observation space: Dict(action_mask:Box([0 0 0 0 0 0 0], [1 1 1 1 1 1 1], (7,), int8), observation:Box([[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]], [[2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]], (6, 7), int8))

Action space: Discrete(7)

 Initial player id:
player_1

 Initial observation:
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]

 Initial mask:
[True, True, True, True, True, True, True]


<hr>

### Implementing the DQN policy

The DQN policy for the agent is configured and set up below.
This is identical to the previous notebook with the added option of "freezing" an agent which corresponds to giving it an optimizer with learning rate 0.

In [5]:
####################################################
# DQN ARCHITECTURE
####################################################

class CNNBasedDQN(torch.nn.Module):
    """
    Custom DQN using a model based on CNN
    """
    def __init__(self,
                 state_shape: typing.Sequence[int],
                 action_shape: typing.Sequence[int],
                 device: typing.Union[str, int, torch.device] = 'cuda' if torch.cuda.is_available() else 'cpu',):
        # Parent call
        super().__init__()
        
        # Save device (e.g. cuda)
        self.device = device
        
        # Number of input channels
        input_channels_cnn = 1
        output_channels_cnn = 32
        flatten_size = (state_shape[0] - 3) * (state_shape[1] - 3) * output_channels_cnn
        output_size= np.prod(action_shape)
        
        self.model = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels= input_channels_cnn, out_channels= output_channels_cnn, kernel_size= 4, stride= 1), torch.nn.ReLU(inplace=True),
            torch.nn.Flatten(0,-1),
            torch.nn.Unflatten(0, (1, flatten_size)),
            torch.nn.Linear(flatten_size, 128), torch.nn.ReLU(inplace=True),
            torch.nn.Linear(128, 128), torch.nn.ReLU(inplace=True),
            torch.nn.Linear(128, output_size),
        )

    def forward(self, obs, state=None, info={}):
        if not isinstance(obs, torch.Tensor):
            obs = torch.tensor(obs, dtype=torch.float, device=self.device)
        
        logits = self.model(obs)
        return logits, state


In [6]:
####################################################
# DQN POLICY
####################################################

def cf_cnn_dqn_policy(state_shape: tuple,
                      action_shape: tuple,
                      optim: typing.Optional[torch.optim.Optimizer] = None,
                      learning_rate: float =  0.0001,
                      gamma: float = 0.9, # Smaller gamma favours "faster" win
                      n_step: int = 4, # Number of steps to look ahead
                      frozen: bool = False,
                      target_update_freq: int = 320):
    # Use cuda device if possible
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    # Network to be used for DQN
    net = CNNBasedDQN(state_shape, action_shape, device= device).to(device)
    
    # Default optimizer is an adam optimizer with the argparser learning rate
    if optim is None:
        optim = torch.optim.Adam(net.parameters(), lr= learning_rate)
        
    # If we are frozen, we use an optimizer that has learning rate 0
    if frozen:
        optim = torch.optim.SGD(net.parameters(), lr= 0)
        
        
    # Our agent DQN policy
    return ts.policy.DQNPolicy(model= net,
                               optim= optim,
                               discount_factor= gamma,
                               estimation_step= n_step,
                               target_update_freq= target_update_freq)

<hr>

### Building agents

This is identical to the previous notebook with the added option of "freezing" an agent which corresponds to giving it an optimizer with learning rate 0.

In [7]:
####################################################
# AGENT CREATION
####################################################

def get_agents(agent_player1: typing.Optional[ts.policy.BasePolicy] = None,
               agent_player2: typing.Optional[ts.policy.BasePolicy] = None,
               optim: typing.Optional[torch.optim.Optimizer] = None,
               resume_path_player_1: str = '', # Path to file to resume agent training from
               resume_path_player_2: str = '', 
               agent_player1_frozen: bool = False, # Freeze a player -> don't let it learn further
               agent_player2_frozen: bool = False,
               ) -> typing.Tuple[ts.policy.BasePolicy, torch.optim.Optimizer, list]:
    """
    Gets a multi agent policy manager, optimizer and player ids for the connect four V2 gym environment.
    Per default this returns 
        - Multi agent manager for 2 agents using DQN
        - Adam optimizer
        - ['player_1', 'player_2'] from the connect four environment
    """
    
    # Get the environment to play in (Connect four gym V2)
    env = get_env()
    
    # Get the observation space from the environment, depending on typo of space (ternary operator)
    observation_space = env.observation_space['observation'] if isinstance(env.observation_space, gym.spaces.Dict) else env.observation_space
    
    # Set the arguments
    state_shape = observation_space.shape or observation_space.n
    action_shape = env.action_space.shape or env.action_space.n
    
    # Configure agent player 1 to be a DQN if no policy is passed.
    if agent_player1 is None:
        # Our agent1 uses a DQN policy
        agent_player1 = cf_cnn_dqn_policy(state_shape= state_shape,
                                          action_shape= action_shape,
                                          optim= optim,
                                          frozen= agent_player1_frozen)
        
        # If we resume our agent we need to load the previous config
        if resume_path_player_1:
            agent_player1.load_state_dict(torch.load(resume_path_player_1))
            
    
    # Configure agent player 2 to be a DQN if no policy is passed.
    if agent_player2 is None:
        # Our agent1 uses a DQN policy
        agent_player2 = cf_cnn_dqn_policy(state_shape= state_shape,
                                          action_shape= action_shape,
                                          optim= optim,
                                          frozen= agent_player2_frozen)
        
        # If we resume our agent we need to load the previous config
        if resume_path_player_2:
            agent_player2.load_state_dict(torch.load(resume_path_player_2))

    # Both our agents are DQN agents by default
    agents = [agent_player1, agent_player2]
        
    # Our policy depends on the order of the agents
    policy = ts.policy.MultiAgentPolicyManager(agents, env)
    
    # Return our policy, optimizer and the available agents in the environment
    # Per default: 
    #   - Multi agent manager for 2 agents using DQN
    #   - Adam optimizer
    #   - ['player_1', 'player_2'] from the connect four environment
    
    return policy, optim, env.agents

<hr>

### Function for letting agents learn

This is identical to the previous notebook with the added option of "freezing" an agent which corresponds to giving it an optimizer with learning rate 0.

In [8]:
####################################################
# AGENT TRAINING
####################################################

def train_agent(filename: str = "dqn_vs_dqn_cnn_based",
                agent_player1: typing.Optional[ts.policy.BasePolicy] = None,
                agent_player2: typing.Optional[ts.policy.BasePolicy] = None,
                agent_player1_frozen: bool = False, # Freeze a player -> don't let it learn further
                agent_player2_frozen: bool = False,
                single_agent_score_as_reward: bool= False, # Uses non frozen agent's score as reward
                optim: typing.Optional[torch.optim.Optimizer] = None,
                training_env_num: int = 1,
                testing_env_num: int = 1,
                buffer_size: int = 2^14,
                batch_size: int = 1, 
                epochs: int = 50, #50
                step_per_epoch: int = 1024, #1024
                step_per_collect: int = 64, # transition before update
                update_per_step: float = 0.1,
                testing_eps: float = 0.05,
                training_eps: float = 0.1,
                ) -> typing.Tuple[dict, ts.policy.BasePolicy]:
    """
    Trains two agents in the connect four V2 environment and saves their best model and logs.
    Returns:
        - result from offpolicy_trainer
        - final version of agent 1
        - final version of agent 2
    """

    # ======== notebook specific =========
    notebook_version = '7' # Used for foldering logs and models

    # ======== environment setup =========
    train_envs = ts.env.DummyVectorEnv([get_env for _ in range(training_env_num)])
    test_envs = ts.env.DummyVectorEnv([get_env for _ in range(testing_env_num)])
    
    # set the seed for reproducibility
    np.random.seed(1998)
    torch.manual_seed(1998)
    train_envs.seed(1998)
    test_envs.seed(1998)

    # ======== agent setup =========
    # Gets our agents from the previously made function
    # Per default: 
    #   - Multi agent manager for 2 agents using DQN
    #   - Adam optimizer
    #   - ['player_1', 'player_2'] from the connect four environment
    policy, optim, agents = get_agents(agent_player1=agent_player1,
                                       agent_player2=agent_player2,
                                       agent_player1_frozen= agent_player1_frozen,
                                       agent_player2_frozen= agent_player2_frozen,
                                       optim=optim)

    # ======== collector setup =========
    # Make a collector for the training environments
    train_collector = ts.data.Collector(policy= policy,
                                        env= train_envs,
                                        buffer= ts.data.VectorReplayBuffer(buffer_size, len(train_envs)),
                                        exploration_noise= True)
    
    # Make a collector for the testing environments
    test_collector = ts.data.Collector(policy= policy,
                                       env= test_envs,
                                       buffer= ts.data.VectorReplayBuffer(buffer_size, len(test_envs)),
                                       exploration_noise= True)
    
    # Uncomment below if you want to set epsilon in epsilon policy
    # policy.set_eps(1)
    
    # Collect data fot the training evnironments
    train_collector.collect(n_step= batch_size * training_env_num)
    
    # ======== ensure folders exist =========
    if not os.path.exists(os.path.join('./logs', 'paper_notebooks', notebook_version, filename)):
        os.makedirs(os.path.join('./logs', 'paper_notebooks', notebook_version, filename))
    if not os.path.exists(os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename)):
        os.makedirs(os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename))

    # ======== tensorboard logging setup =========
    # Allows to save the training progress to tensorboard compatable logs
    log_path = os.path.join('./logs', 'paper_notebooks', notebook_version, filename)
    writer = torch.utils.tensorboard.SummaryWriter(log_path)
    logger = ts.utils.TensorboardLogger(writer)

    # ======== callback functions used during training =========
    # We want to save our best policy
    def save_best_fn(policy):
        """
        Callback to save the best model
        """
        # Save best agent 1
        model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'best_policy_agent1.pth')
        torch.save(policy.policies[agents[0]].state_dict(), model_save_path)
        
        # Save best agent 2
        model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'best_policy_agent2.pth')
        torch.save(policy.policies[agents[1]].state_dict(), model_save_path)
        
        # Save agent2

    def stop_fn(mean_rewards):
        """
        Callback to stop training when we've reached the win rate
        """
        return mean_rewards >= 7 # (win = 10, 70% win without invalid moves = mean of 7)

    def train_fn(epoch, env_step):
        """
        Callback before training
        """        
        # Before training we want to configure the epsilon for the agents
        # In general more exploratory than the test case
        policy.policies[agents[0]].set_eps(training_eps)
        policy.policies[agents[1]].set_eps(training_eps)

    def test_fn(epoch, env_step):
        """
        Callback beore testing
        """        
        # Before testing we want to configure the epsilon for the agents
        # In general more greedy than the train case but not
        #   to avoid getting stuck on invalid moves
        policy.policies[agents[0]].set_eps(testing_eps)
        policy.policies[agents[1]].set_eps(testing_eps)

    def reward_metric(rews):
        """
        Callback for reward collection
        """        
        if agent_player2_frozen and single_agent_score_as_reward:
            # agent 2 frozen, optimizing for agent 1
            return rews[:, 0]
        
        if agent_player1_frozen and single_agent_score_as_reward:
            # agent 1 frozen, optimizing for agent 2
            return rews[:, 1]
        
        # Per default we are interested in optimizing both agents
        return rews[:, 0] + rews[:, 1]
    
            

    # trainer
    result = ts.trainer.offpolicy_trainer(policy= policy,
                                          train_collector= train_collector,
                                          test_collector= test_collector,
                                          max_epoch= epochs,
                                          step_per_epoch= step_per_epoch,
                                          step_per_collect= step_per_collect,
                                          episode_per_test= testing_env_num,
                                          batch_size= batch_size,
                                          train_fn= train_fn,
                                          test_fn= test_fn,
                                          # Stop function to stop before specified amount of epochs
                                          #stop_fn= stop_fn
                                          save_best_fn= save_best_fn,
                                          update_per_step= update_per_step,
                                          logger= logger,
                                          test_in_train= False,
                                          reward_metric= reward_metric)
    
    # Save final agent 1
    model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'final_policy_agent1.pth')
    torch.save(policy.policies[agents[0]].state_dict(), model_save_path)

    # Save final agent 2
    model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'final_policy_agent2.pth')
    torch.save(policy.policies[agents[1]].state_dict(), model_save_path)

    return result, policy.policies[agents[0]], policy.policies[agents[1]]

<hr>

### Function for watching learned agent

Identical to the previous notebook.

In [9]:
####################################################
# WATCHING THE LEARNED POLICY IN ACTION
####################################################

def watch(numer_of_games: int = 3,
          agent_player1: typing.Optional[ts.policy.BasePolicy] = None,
          agent_player2: typing.Optional[ts.policy.BasePolicy] = None,
          test_epsilon: float = 0.05, # For the watching we act completely greedy but low random for not getting stuck on invalid move
          render_speed: float = 0.15, # Amount of seconds to update frame/ do a step
          ) -> None:
    
    # Get the connect four V2 environment (must be a list)
    env= ts.env.DummyVectorEnv([get_env])
    
    # Get the agents from the trained agents
    policy, optim, agents = get_agents(agent_player1= agent_player1,
                                       agent_player2= agent_player2)
    
    # Evaluate the policy
    policy.eval()
    
    # Set the testing policy epsilon for our agents
    policy.policies[agents[0]].set_eps(test_epsilon)
    policy.policies[agents[1]].set_eps(test_epsilon)
    
    # Collect the test data
    collector = ts.data.Collector(policy= policy,
                                  env= env,
                                  exploration_noise= True)
    
    # Render games in human mode to see how it plays
    result = collector.collect(n_episode= numer_of_games, render= render_speed)
    
    # Close the environment aftering collecting the results
    # This closes the pygame window after completion
    env.close()
    
    # Get the rewards and length from the test trials
    rewards, length = result["rews"], result["lens"]
    
    # Print the final reward for the first agent
    print(f"Average steps of game:  {length.mean()}")
    print(f"Final mean reward agent 1: {rewards[:, 0].mean()}, std: {rewards[:, 0].std()}")
    print(f"Final mean reward agent 2: {rewards[:, 1].mean()}, std: {rewards[:, 1].std()}")

<hr>

### Doing the experiment

We now do the experiment with using our previously created functions.
We freeze one agent and initialize both agents from previous versions.

The following iterations were made:

1. Freeze agent 1, train agent 2:
    - Model save name: `1-cnn_dqn_frozen_agent1` 
    - Agent 1 start: `./saved_variables/paper_notebooks/6/dqn_vs_dqn_cnn_based/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/6/dqn_vs_dqn_cnn_based/best_policy_agent2.pth`
    - Learning rate: `0.0001`
    - Training epsilon: `0.2`
    - Look ahead steps: `4`
    - Reward for move/invalid: `+1` / `-3`
    - Allow invalid move: `False`
    - Epochs: `1000`
    - Gamma: `0.9`
    - Best epoch: `51` with test reward `1102`
    - Scoring: sum of `both` agent's score
2. Freeze agent 2, train agent 1:
    - Model save name: `2-cnn_dqn_frozen_agent2` 
    - Agent 1 start: `./saved_variables/paper_notebooks/6/dqn_vs_dqn_cnn_based/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/7/1-cnn_dqn_frozen_agent1/final_policy_agent2.pth`
    - Learning rate: `0.0001`
    - Training epsilon: `0.2`
    - Look ahead steps: `4`
    - Reward for move/invalid: `+1` / `-3`
    - Allow invalid move: `False`
    - Epochs: `1000`
    - Gamma: `0.9`
    - Best epoch: `360` with test reward `1102`
    - Scoring: sum of `both` agent's score

After which the agent was so focused on prolonging the game, we decided to lower the learning rate and start optimizing for winning again. We also lowered the amount of epochs in each iterations of swapping the frozen agent.

3. Freeze agent 1, train agent 2:
    - Model save name: `3-cnn_dqn_frozen_agent1` 
    - Agent 1 start: `./saved_variables/paper_notebooks/7/2-cnn_dqn_frozen_agent2/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/7/1-cnn_dqn_frozen_agent1/final_policy_agent2.pth`
    - Learning rate: `0.00005` # halfed learning rate
    - Training epsilon: `0.1` # halfed training epsilon
    - Look ahead steps: `4`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `500`
    - Gamma: `0.8` # Lowered to not make agent want to play too fast again
    - Best epoch: `1` with test reward `100` - tie game
    - Scoring: reward of `agent 2`
4. Freeze agent 2, train agent 1:
    - Model save name: `4-cnn_dqn_frozen_agent2` 
    - Agent 1 start: `./saved_variables/paper_notebooks/7/2-cnn_dqn_frozen_agent2/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/7/3-cnn_dqn_frozen_agent1/best_policy_agent2.pth`
    - Learning rate: `0.00005`
    - Training epsilon: `0.1`
    - Look ahead steps: `4`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `500`
    - Gamma: `0.8` # Lowered to not make agent want to play too fast again
    - Best epoch: `1` with test reward `100` - tie game
    - Scoring: reward of `agent 1`
    
To do further training, a loop was created which alternated between freezing agens every 50 epochs. This loop was executed 20 times. The learning rate was also lowered once again.

5. Loop frozen agents:
    - Model save name: `5-looping-iteration-i` 
    - Agent 1 start: `./saved_variables/paper_notebooks/7/4-cnn_dqn_frozen_agent2/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/7/3-cnn_dqn_frozen_agent1/best_policy_agent2.pth`
    - Learning rate: `0.000001`
    - Training epsilon: `0.1`
    - Look ahead steps: `4`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `50` x `20` loops 
    - Gamma: `0.8` # Lowered to not make agent want to play too fast again
    - Best epoch: final epoch always taken to next round
    - Scoring: reward of `non frozen agent`
6. Loop frozen agents:
    - Model save name: `6-looping-iteration-i` 
    - Agent 1 start: `./saved_variables/paper_notebooks/7/5-looping-iteration-19/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/7/5-looping-iteration-19/best_policy_agent2.pth`
    - Learning rate: `0.000003`
    - Training epsilon: `0.1`
    - Look ahead steps: `8`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `20` x `100` loops 
    - Gamma: `0.9` # Lowered to not make agent want to play too fast again
    - Best epoch: final epoch always taken to next round
    - Scoring: reward of `non frozen agent`
7. Loop frozen agents:
    - Model save name: `7-looping-iteration-i` 
    - Agent 1 start: `./saved_variables/paper_notebooks/7/6-looping-iteration-99/best_policy_agent1.pth`
    - Agent 2 start: `./saved_variables/paper_notebooks/7/6-looping-iteration-99/best_policy_agent2.pth`
    - Learning rate: `0.001`
    - Training epsilon: `0.05`
    - Look ahead steps: `8`
    - Reward for move/invalid: `0` / `-3`
    - Allow invalid move: `False`
    - Epochs: `20` x `500` loops 
    - Gamma: `0.9` # Lowered to not make agent want to play too fast again
    - Best epoch: final epoch always taken to next round
    - Scoring: reward of `non frozen agent`

For file size reasons, only a portion of the saved agents are kept and stored on GitHub.


In [39]:
####################################################
# EXPERIMENT: TRAINING AGENTS
####################################################

# Configs for the agents
freeze_agent1 = False
agent1_starting_params = "./saved_variables/paper_notebooks/7/6/6-looping-iteration-99/best_policy_agent1.pth"

freeze_agent2 = True
agent2_starting_params = "./saved_variables/paper_notebooks/7/6/6-looping-iteration-99/best_policy_agent2.pth"

single_agent_score_as_reward = True # To use combined reward or non frozen agent reward as scoring
filename = "7/7-looping-iteration-i"
epochs = 20
loops = 500

learning_rate = 0.001
training_eps = 0.05
gamma = 0.9
n_step = 8

for loop_idx in range(loops):
    # Filename
    filename = f"7-20epoch_500loop/7-looping-iteration-{loop_idx}"
    
    # Use provided starting params in first loop, the one from previous iteration in next
    if loop_idx > 0:
        agent1_starting_params = f"./saved_variables/paper_notebooks/7/7-20epoch_500loop/7-looping-iteration-{loop_idx-1}/final_policy_agent1.pth"
        agent2_starting_params = f"./saved_variables/paper_notebooks/7/7-20epoch_500loop/7-looping-iteration-{loop_idx-1}/final_policy_agent2.pth"
    
    # Determine what agent to freeze
    freeze_agent1 = True if loop_idx % 2 == 1 else False
    freeze_agent2 = True if loop_idx % 2 == 0 else False
    
    # Get the environment settings
    env = get_env()
    observation_space = env.observation_space['observation'] if isinstance(env.observation_space, gym.spaces.Dict) else env.observation_space
    state_shape = observation_space.shape or observation_space.n
    action_shape = env.action_space.shape or env.action_space.n
    
    # Configure agent 1
    agent1 = cf_cnn_dqn_policy(state_shape= state_shape,
                               action_shape= action_shape,
                               gamma= gamma,
                               frozen= freeze_agent1,
                               learning_rate = learning_rate,
                               n_step= n_step)
    
    if agent1_starting_params:
        agent1.load_state_dict(torch.load(agent1_starting_params))
        
        # Configure agent 2
        agent2 = cf_cnn_dqn_policy(state_shape= state_shape,
                                   action_shape= action_shape,
                                   gamma= gamma,
                                   frozen= freeze_agent2,
                                   learning_rate = learning_rate,
                                   n_step= n_step)
        
        if agent2_starting_params:
            agent2.load_state_dict(torch.load(agent2_starting_params))
            
            
            # Train the agent
            off_policy_traininer_results, final_agent_player1, final_agent_player2 = train_agent(epochs= epochs,
                                                                                                 agent_player1= agent1,
                                                                                                 agent_player1_frozen = freeze_agent1,
                                                                                                 agent_player2= agent2,
                                                                                                 agent_player2_frozen = freeze_agent2,
                                                                                                 filename= filename,
                                                                                                 single_agent_score_as_reward = single_agent_score_as_reward,
                                                                                                 training_eps= training_eps)
            
            

Epoch #1: 1025it [00:02, 413.59it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=206.076, player_2/loss=189.878, rew=6.25]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 402.41it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=233.242, player_2/loss=253.772, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 437.08it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=211.324, player_2/loss=293.873, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 438.36it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=120.575, player_2/loss=248.289, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 437.90it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=69.703, player_2/loss=252.948, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 441.29it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=95.868, player_2/loss=260.717, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 438.28it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=94.053, player_2/loss=269.596, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 441.91it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=55.768, player_2/loss=273.681, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 440.46it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=27.881, player_2/loss=269.081, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 437.54it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=50.764, player_2/loss=266.531, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 438.44it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=79.028, player_2/loss=286.661, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 428.40it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=101.460, player_2/loss=283.395, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 409.32it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=155.681, player_2/loss=256.007, rew=17.86]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 397.43it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=92.398, player_2/loss=244.309, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 372.89it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=65.044, player_2/loss=286.054, rew=19.44]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 369.98it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=98.299, player_2/loss=285.658, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 353.76it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=91.756, player_2/loss=228.613, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 402.95it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=96.114, player_2/loss=217.135, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 435.13it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=98.329, player_2/loss=273.891, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 435.98it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=141.864, player_2/loss=222.448, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 417.73it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=156.028, player_2/loss=175.812, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 447.36it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=177.960, player_2/loss=159.571, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 423.52it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=206.813, player_2/loss=200.421, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 446.10it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=226.003, player_2/loss=168.249, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 447.42it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=199.262, player_2/loss=138.718, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 448.69it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=209.351, player_2/loss=158.255, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 446.00it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=200.051, player_2/loss=58.430, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 447.99it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=175.130, player_2/loss=56.776, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 451.10it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=179.944, player_2/loss=61.448, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 445.36it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=191.061, player_2/loss=43.922, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 446.57it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=206.787, player_2/loss=15.017, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 452.59it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=218.878, rew=25.00]       


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 447.19it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=204.907, player_2/loss=36.414, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 417.62it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=185.129, player_2/loss=23.314, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 405.32it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=238.206, player_2/loss=8.075, rew=15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 392.93it/s, env_step=17408, len=13, n/ep=6, n/st=64, player_1/loss=256.856, rew=25.00]       


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 379.99it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=224.990, player_2/loss=34.422, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 379.01it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=210.214, player_2/loss=38.402, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 370.20it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=194.462, player_2/loss=235.287, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 364.49it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=202.730, player_2/loss=317.151, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 365.41it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=205.398, player_2/loss=402.257, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 360.60it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=156.831, rew=25.00]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 365.33it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=101.947, player_2/loss=466.341, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 364.59it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=114.737, player_2/loss=451.462, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 364.85it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=104.680, player_2/loss=469.384, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 363.25it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=100.815, player_2/loss=492.428, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 365.07it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=134.095, player_2/loss=462.128, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 364.23it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=169.099, player_2/loss=401.540, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 365.67it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=136.828, player_2/loss=453.726, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 363.13it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=31.237, player_2/loss=443.236, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 362.74it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=131.810, player_2/loss=399.359, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 364.98it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=174.253, player_2/loss=456.101, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 363.89it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=67.479, player_2/loss=515.512, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 365.80it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=73.004, player_2/loss=489.838, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 359.30it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=84.531, player_2/loss=521.626, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 359.62it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=81.088, player_2/loss=534.566, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 372.46it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=114.105, player_2/loss=437.457, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 375.20it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=110.595, player_2/loss=348.301, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 365.79it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=127.230, player_2/loss=327.048, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 366.30it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=167.043, player_2/loss=229.291, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 365.71it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=195.551, player_2/loss=214.342, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 365.06it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=179.090, player_2/loss=241.908, rew=5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 363.45it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=139.091, player_2/loss=192.805, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 366.96it/s, env_step=7168, len=28, n/ep=3, n/st=64, player_1/loss=122.953, player_2/loss=139.101, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 359.41it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=128.426, player_2/loss=130.330, rew=-15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 364.89it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=123.613, player_2/loss=127.834, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 365.64it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=120.718, player_2/loss=125.285, rew=-19.44]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 365.20it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=147.817, player_2/loss=140.205, rew=5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 362.76it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=269.645, player_2/loss=154.994, rew=-13.89]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 367.09it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=271.680, player_2/loss=147.431, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 364.85it/s, env_step=14336, len=25, n/ep=3, n/st=64, player_1/loss=161.364, player_2/loss=96.250, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 364.16it/s, env_step=15360, len=24, n/ep=2, n/st=64, player_1/loss=130.303, player_2/loss=106.031, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 364.04it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=97.636, player_2/loss=116.088, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 362.82it/s, env_step=17408, len=24, n/ep=3, n/st=64, player_1/loss=85.669, player_2/loss=55.370, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 365.68it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=99.863, player_2/loss=97.904, rew=15.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 365.85it/s, env_step=19456, len=13, n/ep=4, n/st=64, player_1/loss=123.306, player_2/loss=173.394, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 365.72it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=117.789, player_2/loss=151.483, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.17it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=132.361, player_2/loss=175.705, rew=18.75]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 363.14it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=133.347, player_2/loss=218.451, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 362.03it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=143.210, player_2/loss=223.022, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 362.56it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=93.332, player_2/loss=238.340, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 364.12it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=81.862, player_2/loss=237.847, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 361.67it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=76.874, player_2/loss=253.272, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 360.78it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=28.382, player_2/loss=256.107, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 363.29it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=40.516, player_2/loss=260.912, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 363.77it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=38.573, player_2/loss=264.977, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 362.40it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=22.525, player_2/loss=265.904, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 363.27it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=26.359, player_2/loss=284.854, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 364.13it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=21.616, player_2/loss=287.245, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 362.75it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=15.834, player_2/loss=278.234, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 361.46it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=54.013, player_2/loss=239.203, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 364.11it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=65.997, player_2/loss=235.132, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 364.49it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=41.922, player_2/loss=235.053, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 362.64it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=16.446, player_2/loss=214.157, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 363.36it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=11.370, player_2/loss=226.669, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 364.95it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=18.597, player_2/loss=244.084, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 365.64it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=148.849, player_2/loss=146.884, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 361.40it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=269.823, player_2/loss=68.731, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 375.58it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=252.481, player_2/loss=60.935, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 377.40it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=292.028, player_2/loss=31.997, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 371.38it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=279.413, player_2/loss=70.261, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 366.13it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=220.413, player_2/loss=133.228, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 364.64it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=292.734, player_2/loss=86.448, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 365.02it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=370.278, player_2/loss=39.513, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 365.29it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_2/loss=31.859, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 363.24it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=335.677, player_2/loss=15.337, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 364.84it/s, env_step=12288, len=15, n/ep=5, n/st=64, player_1/loss=306.812, player_2/loss=88.250, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 362.07it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=252.250, player_2/loss=148.335, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 349.77it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=210.150, player_2/loss=76.153, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 363.37it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=220.294, player_2/loss=14.948, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 361.88it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=288.073, player_2/loss=6.408, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 363.31it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=358.109, player_2/loss=7.495, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 363.39it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=340.930, player_2/loss=7.428, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 362.64it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=294.133, player_2/loss=14.524, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 363.78it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=176.526, player_2/loss=49.874, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.69it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=182.937, player_2/loss=99.511, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 365.45it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=170.692, player_2/loss=77.234, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 364.17it/s, env_step=4096, len=23, n/ep=2, n/st=64, player_1/loss=138.879, player_2/loss=81.317, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 363.64it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=108.492, player_2/loss=149.323, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 362.82it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=131.363, player_2/loss=169.582, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 359.50it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=131.317, player_2/loss=230.737, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 360.39it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=117.959, player_2/loss=251.505, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 356.50it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=152.320, player_2/loss=198.675, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 362.73it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=160.555, player_2/loss=259.619, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 361.94it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=146.971, player_2/loss=333.533, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 361.60it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_2/loss=439.040, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 364.02it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=150.209, player_2/loss=485.776, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 361.84it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=156.003, player_2/loss=424.900, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 361.77it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=117.775, player_2/loss=355.072, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 360.14it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=61.600, player_2/loss=342.840, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 362.19it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=92.751, player_2/loss=357.974, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 359.73it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=89.606, player_2/loss=327.042, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 361.58it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=56.671, player_2/loss=354.818, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 365.37it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=49.988, player_2/loss=369.258, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 363.98it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=75.043, player_2/loss=364.271, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 363.92it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=117.644, player_2/loss=321.963, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 362.29it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=144.007, player_2/loss=209.231, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 365.51it/s, env_step=5120, len=26, n/ep=2, n/st=64, player_1/loss=134.082, player_2/loss=126.862, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 364.14it/s, env_step=6144, len=31, n/ep=2, n/st=64, player_1/loss=121.634, player_2/loss=90.957, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 361.18it/s, env_step=7168, len=24, n/ep=2, n/st=64, player_1/loss=110.896, player_2/loss=78.070, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 369.89it/s, env_step=8192, len=24, n/ep=2, n/st=64, player_1/loss=124.817, player_2/loss=85.517, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 377.64it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=166.576, player_2/loss=83.089, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 376.67it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=212.621, player_2/loss=121.644, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 376.53it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=208.553, player_2/loss=100.290, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 379.95it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=170.738, player_2/loss=46.755, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 378.19it/s, env_step=13312, len=24, n/ep=3, n/st=64, player_1/loss=146.891, player_2/loss=50.457, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 377.93it/s, env_step=14336, len=20, n/ep=4, n/st=64, player_1/loss=192.985, player_2/loss=61.472, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 377.75it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=189.713, player_2/loss=62.551, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 376.43it/s, env_step=16384, len=25, n/ep=2, n/st=64, player_1/loss=186.289, player_2/loss=38.840, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 371.50it/s, env_step=17408, len=19, n/ep=4, n/st=64, player_1/loss=160.166, player_2/loss=42.504, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 363.64it/s, env_step=18432, len=24, n/ep=3, n/st=64, player_1/loss=179.981, player_2/loss=50.940, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 364.40it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=171.966, player_2/loss=32.652, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 365.92it/s, env_step=1024, len=24, n/ep=3, n/st=64, player_1/loss=136.556, player_2/loss=35.243, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.88it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=98.263, player_2/loss=31.814, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 363.57it/s, env_step=3072, len=25, n/ep=2, n/st=64, player_1/loss=54.039, player_2/loss=25.909, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 363.94it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=80.797, player_2/loss=78.175, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 365.25it/s, env_step=5120, len=26, n/ep=3, n/st=64, player_1/loss=91.201, player_2/loss=77.433, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 364.54it/s, env_step=6144, len=26, n/ep=2, n/st=64, player_1/loss=92.637, player_2/loss=25.740, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 364.62it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=70.751, player_2/loss=31.954, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 362.49it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=57.143, player_2/loss=22.616, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 363.86it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=52.468, player_2/loss=32.099, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 366.08it/s, env_step=10240, len=26, n/ep=2, n/st=64, player_1/loss=42.843, player_2/loss=35.739, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 363.31it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=33.130, player_2/loss=46.659, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 361.01it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=37.982, player_2/loss=45.387, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 364.56it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=26.539, player_2/loss=45.375, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 363.86it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=20.669, player_2/loss=38.416, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 361.46it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=43.349, player_2/loss=37.131, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #16: 1025it [00:02, 365.21it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=144.752, player_2/loss=114.751, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #17: 1025it [00:02, 363.71it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=201.387, player_2/loss=207.626, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #18: 1025it [00:02, 356.58it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=138.225, player_2/loss=287.800, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #19: 1025it [00:02, 363.45it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=79.472, player_2/loss=234.211, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #1: 1025it [00:02, 364.32it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=120.708, player_2/loss=268.596, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.25it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=137.375, player_2/loss=226.751, rew=15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 362.62it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=295.509, player_2/loss=121.177, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 362.34it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=487.301, player_2/loss=77.589, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 363.24it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=539.026, player_2/loss=47.202, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 361.73it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=482.638, player_2/loss=21.516, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 361.46it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=413.535, player_2/loss=16.898, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 363.13it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=417.186, player_2/loss=16.831, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 362.52it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=373.700, player_2/loss=57.903, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 363.07it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=365.486, rew=15.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 354.74it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=270.904, player_2/loss=11.619, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 361.08it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=329.794, player_2/loss=33.477, rew=15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 375.09it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=544.298, player_2/loss=34.767, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 374.08it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=540.085, player_2/loss=27.436, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 361.57it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=464.396, player_2/loss=38.659, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 360.21it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=473.593, player_2/loss=23.438, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 361.06it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=536.894, player_2/loss=23.296, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 362.37it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=445.085, player_2/loss=25.010, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 363.06it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=379.376, player_2/loss=11.279, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 365.15it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=371.728, player_2/loss=63.469, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.17it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=314.808, player_2/loss=57.653, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 359.78it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=228.478, player_2/loss=89.286, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 362.40it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=209.616, player_2/loss=170.574, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 364.73it/s, env_step=5120, len=17, n/ep=3, n/st=64, player_1/loss=193.923, player_2/loss=105.394, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 367.55it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=161.937, player_2/loss=127.470, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 361.67it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=165.391, player_2/loss=213.862, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 365.92it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=139.964, player_2/loss=189.666, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 362.45it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=96.089, player_2/loss=200.707, rew=-5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 362.48it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=163.067, player_2/loss=216.066, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 361.75it/s, env_step=11264, len=17, n/ep=3, n/st=64, player_1/loss=151.148, player_2/loss=247.006, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 362.98it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=84.138, player_2/loss=181.799, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 363.11it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=122.246, player_2/loss=198.870, rew=-15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 364.69it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=147.358, player_2/loss=311.679, rew=-3.57]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 363.37it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=150.428, player_2/loss=240.710, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 363.55it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=179.607, player_2/loss=155.709, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 363.25it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=185.842, player_2/loss=351.484, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 360.00it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=146.496, player_2/loss=513.343, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 362.11it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=101.809, player_2/loss=483.221, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 363.88it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=156.073, player_2/loss=409.104, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.29it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=153.501, player_2/loss=346.813, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 366.25it/s, env_step=3072, len=28, n/ep=2, n/st=64, player_1/loss=141.659, player_2/loss=235.076, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 365.15it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=150.178, player_2/loss=179.510, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 364.57it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=153.054, player_2/loss=175.545, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 363.88it/s, env_step=6144, len=20, n/ep=4, n/st=64, player_1/loss=107.537, player_2/loss=112.792, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 362.47it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=93.152, player_2/loss=89.554, rew=-5.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 366.21it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=112.902, player_2/loss=108.233, rew=-5.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 366.74it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=143.133, player_2/loss=115.934, rew=-12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 364.57it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=167.606, player_2/loss=88.597, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 365.72it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=228.094, player_2/loss=87.755, rew=-18.75]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 362.97it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_2/loss=52.270, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 364.79it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=162.223, player_2/loss=27.214, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 361.93it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=143.790, player_2/loss=33.815, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 359.62it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=169.725, player_2/loss=66.127, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 372.45it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=177.587, player_2/loss=73.628, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 374.87it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=152.476, player_2/loss=27.622, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 373.75it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=197.627, player_2/loss=8.217, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 363.16it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=218.954, player_2/loss=6.958, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 361.97it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=143.802, player_2/loss=8.222, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 366.26it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=115.070, player_2/loss=114.949, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 362.82it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=88.704, rew=25.00]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 365.09it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=78.052, player_2/loss=477.409, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 364.62it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=79.558, player_2/loss=412.997, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 362.80it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_2/loss=300.812, rew=12.50]         


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 363.95it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=92.122, player_2/loss=314.644, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 363.29it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=30.838, player_2/loss=378.873, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 363.77it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=14.263, player_2/loss=394.268, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 364.68it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=24.302, player_2/loss=322.120, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 362.73it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=27.977, player_2/loss=356.499, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 365.49it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=19.409, player_2/loss=332.767, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 364.38it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=88.081, player_2/loss=239.238, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 365.07it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=70.979, player_2/loss=253.689, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 363.10it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=14.214, player_2/loss=311.355, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 362.67it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=20.923, player_2/loss=274.868, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 362.50it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=46.775, player_2/loss=226.186, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 361.93it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=72.065, player_2/loss=267.158, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 361.15it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=49.058, player_2/loss=314.080, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 364.78it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=7.034, player_2/loss=247.637, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.53it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=27.984, player_2/loss=166.184, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 365.78it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=102.599, player_2/loss=124.155, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 363.31it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=138.103, player_2/loss=142.993, rew=-5.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 365.06it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=76.657, player_2/loss=152.258, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 365.01it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=94.797, player_2/loss=144.217, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 362.54it/s, env_step=7168, len=16, n/ep=3, n/st=64, player_1/loss=145.690, player_2/loss=131.977, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 364.93it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=162.575, player_2/loss=108.487, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 363.30it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=147.430, player_2/loss=85.428, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 363.79it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=158.897, player_2/loss=44.377, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 362.25it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=218.808, player_2/loss=24.780, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 365.11it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=256.110, player_2/loss=55.791, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 365.78it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=234.855, player_2/loss=70.728, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 364.95it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=248.745, rew=25.00]       


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 364.50it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=241.594, player_2/loss=113.693, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 365.83it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=251.712, player_2/loss=80.512, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 362.79it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=265.945, player_2/loss=24.870, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 361.30it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=237.235, player_2/loss=16.880, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 358.32it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=250.953, player_2/loss=23.290, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 370.27it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=278.252, player_2/loss=333.779, rew=13.89]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 373.34it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=190.567, player_2/loss=526.800, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 372.27it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=92.946, player_2/loss=715.133, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 373.01it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=90.894, player_2/loss=755.973, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 368.28it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=111.026, player_2/loss=685.410, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 362.39it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=77.176, player_2/loss=509.239, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 360.41it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=27.871, player_2/loss=594.280, rew=13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 362.27it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=89.230, player_2/loss=641.015, rew=13.89]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 353.15it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=171.411, player_2/loss=579.429, rew=13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 358.13it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=179.891, player_2/loss=527.271, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 362.52it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=139.280, player_2/loss=520.721, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 361.94it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=134.949, player_2/loss=539.876, rew=19.44]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 362.11it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=107.258, player_2/loss=559.158, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 359.73it/s, env_step=14336, len=7, n/ep=10, n/st=64, player_1/loss=74.114, player_2/loss=563.127, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 359.02it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=69.970, player_2/loss=513.432, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 360.19it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=56.842, player_2/loss=603.343, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 360.65it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=46.842, player_2/loss=671.645, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 362.34it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=73.462, player_2/loss=755.597, rew=2.78]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 360.81it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=97.805, player_2/loss=728.870, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 360.94it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=194.766, player_2/loss=382.328, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.38it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=212.590, player_2/loss=295.926, rew=-5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 362.94it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=189.897, player_2/loss=223.508, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 363.02it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=151.332, player_2/loss=170.297, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 364.86it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=171.688, player_2/loss=111.115, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 361.49it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=183.741, player_2/loss=63.237, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 362.61it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=166.363, player_2/loss=35.145, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 363.46it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=182.473, player_2/loss=56.102, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 362.76it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=228.232, rew=12.50]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 364.07it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=214.977, player_2/loss=80.836, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 361.68it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=148.347, player_2/loss=82.248, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 363.78it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=181.690, player_2/loss=13.464, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 364.68it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=230.008, player_2/loss=11.438, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 363.18it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=233.905, player_2/loss=94.578, rew=-19.44]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 361.89it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=217.011, player_2/loss=111.583, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 364.91it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=188.507, player_2/loss=111.852, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 364.87it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=200.595, player_2/loss=58.248, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 361.49it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=244.334, player_2/loss=44.816, rew=12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 364.25it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=217.342, player_2/loss=135.518, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 358.22it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=116.861, player_2/loss=306.490, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 358.57it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=77.688, player_2/loss=355.078, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 362.41it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=81.910, player_2/loss=333.863, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 355.47it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=61.526, player_2/loss=300.490, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 368.86it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=87.053, player_2/loss=288.058, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 370.45it/s, env_step=6144, len=9, n/ep=8, n/st=64, player_1/loss=55.262, player_2/loss=315.569, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 372.76it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=62.489, rew=13.89]           


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 374.33it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=78.570, player_2/loss=335.585, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 371.27it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=88.576, player_2/loss=304.275, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 373.28it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=88.570, player_2/loss=300.716, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 364.38it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=107.189, player_2/loss=305.014, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 358.80it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=181.014, player_2/loss=326.383, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 363.07it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=99.190, player_2/loss=341.811, rew=19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 359.57it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=57.803, player_2/loss=368.073, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 360.39it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=85.116, player_2/loss=351.734, rew=10.71]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 359.43it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=59.003, player_2/loss=269.581, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 361.98it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=21.865, player_2/loss=307.382, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 359.69it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=37.471, player_2/loss=304.960, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 361.36it/s, env_step=19456, len=9, n/ep=5, n/st=64, player_1/loss=27.009, player_2/loss=315.024, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 362.85it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=730.617, player_2/loss=228.054, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 365.15it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=750.167, player_2/loss=207.590, rew=17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 364.06it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=1017.687, player_2/loss=189.547, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 360.89it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=786.026, player_2/loss=165.467, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 362.19it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=417.015, player_2/loss=141.467, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 363.30it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_2/loss=101.478, rew=5.00]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 364.79it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=535.067, player_2/loss=103.876, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 363.53it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=655.302, player_2/loss=68.681, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 364.93it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=619.179, player_2/loss=59.442, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 363.89it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=454.801, player_2/loss=68.681, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 363.19it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=472.806, player_2/loss=67.209, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 362.03it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=527.459, player_2/loss=36.474, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 363.78it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=479.143, player_2/loss=53.783, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 362.46it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=452.004, player_2/loss=127.296, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 364.32it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=444.479, player_2/loss=155.898, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 364.59it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=412.040, player_2/loss=74.096, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 362.91it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=440.751, player_2/loss=52.241, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 362.31it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=433.493, rew=5.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 361.05it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=440.131, player_2/loss=81.270, rew=16.67]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 361.12it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=392.249, player_2/loss=410.929, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.06it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=264.291, player_2/loss=393.399, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 362.54it/s, env_step=3072, len=13, n/ep=6, n/st=64, player_1/loss=161.270, player_2/loss=387.721, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 363.62it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=146.857, player_2/loss=339.747, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 364.23it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=153.388, player_2/loss=287.813, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 361.52it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=111.171, player_2/loss=256.072, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 362.28it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=107.675, player_2/loss=326.931, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 361.46it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=169.134, player_2/loss=373.943, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 436.02it/s, env_step=9216, len=10, n/ep=5, n/st=64, player_1/loss=139.764, player_2/loss=367.259, rew=15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 417.04it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=68.476, player_2/loss=369.540, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 404.96it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=37.005, player_2/loss=339.992, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 391.57it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=69.892, player_2/loss=346.717, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 378.43it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=106.157, player_2/loss=284.030, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 376.61it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=96.782, player_2/loss=325.070, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 366.78it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=93.430, player_2/loss=371.058, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 362.57it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=84.929, player_2/loss=379.621, rew=16.67]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 364.82it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=103.580, player_2/loss=385.142, rew=15.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 362.30it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=99.920, player_2/loss=422.653, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 361.88it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=80.917, player_2/loss=383.193, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 361.83it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=43.871, player_2/loss=312.421, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 364.60it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=74.863, player_2/loss=238.831, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 363.56it/s, env_step=3072, len=31, n/ep=2, n/st=64, player_1/loss=109.216, player_2/loss=205.519, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 362.37it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=269.104, player_2/loss=240.604, rew=-5.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 365.73it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=320.056, player_2/loss=256.008, rew=-5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 366.45it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=485.108, player_2/loss=233.317, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 363.90it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=712.138, player_2/loss=147.275, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 366.94it/s, env_step=8192, len=20, n/ep=4, n/st=64, player_1/loss=691.204, player_2/loss=96.474, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 365.41it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=862.944, player_2/loss=73.077, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 363.70it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=1115.028, player_2/loss=68.586, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 366.11it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=824.942, player_2/loss=68.774, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 364.26it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=750.253, player_2/loss=39.545, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 365.95it/s, env_step=13312, len=22, n/ep=2, n/st=64, player_1/loss=783.471, player_2/loss=23.658, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 364.90it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=625.934, player_2/loss=39.003, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 363.30it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=652.852, player_2/loss=38.688, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 363.27it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=758.676, player_2/loss=33.089, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 363.94it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=818.800, player_2/loss=25.695, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 365.59it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=714.551, player_2/loss=35.077, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 362.71it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=809.202, player_2/loss=49.919, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 357.09it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=414.555, player_2/loss=25.855, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 364.11it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=401.342, player_2/loss=45.753, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 362.50it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=269.624, player_2/loss=32.422, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 361.84it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=243.198, player_2/loss=29.214, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 363.21it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=280.308, player_2/loss=68.528, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 365.08it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=208.841, player_2/loss=66.346, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 362.96it/s, env_step=7168, len=20, n/ep=4, n/st=64, player_1/loss=165.809, player_2/loss=46.138, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 361.46it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=186.060, player_2/loss=66.936, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 363.43it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=193.880, player_2/loss=211.531, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 361.21it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=131.174, player_2/loss=334.828, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 362.01it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=116.070, player_2/loss=318.165, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 364.06it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=84.223, player_2/loss=288.280, rew=15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 362.40it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=82.152, player_2/loss=312.588, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 367.35it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=92.821, player_2/loss=295.220, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 375.27it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=100.776, player_2/loss=296.722, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 365.94it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=115.104, player_2/loss=306.997, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 361.40it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_2/loss=318.054, rew=8.33]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 362.17it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=62.293, player_2/loss=345.304, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 362.25it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=63.582, player_2/loss=289.335, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 361.50it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=99.207, player_2/loss=243.429, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 364.90it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=67.798, player_2/loss=246.721, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 362.43it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=74.314, player_2/loss=216.269, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 367.61it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=59.900, player_2/loss=196.112, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 365.59it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=41.137, player_2/loss=183.935, rew=-16.67]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 364.93it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=32.293, player_2/loss=155.116, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 362.72it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=38.353, player_2/loss=174.116, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 364.18it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=73.707, player_2/loss=195.891, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 365.75it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=65.168, player_2/loss=178.417, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 365.17it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=29.815, player_2/loss=170.204, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 363.06it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=68.142, player_2/loss=139.885, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 368.16it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=121.286, player_2/loss=129.764, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 365.17it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=172.564, player_2/loss=90.347, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #14: 1025it [00:02, 364.41it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=316.292, player_2/loss=96.955, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #15: 1025it [00:02, 363.08it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=422.014, player_2/loss=133.184, rew=10.71]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #16: 1025it [00:02, 362.27it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=419.863, player_2/loss=148.137, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #17: 1025it [00:02, 363.47it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=415.309, player_2/loss=150.228, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #18: 1025it [00:02, 364.52it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=329.265, player_2/loss=161.974, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #19: 1025it [00:02, 363.28it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=333.324, player_2/loss=143.900, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #1: 1025it [00:02, 364.22it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=290.494, player_2/loss=186.093, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.63it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=180.320, player_2/loss=295.955, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 364.52it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=81.042, player_2/loss=462.410, rew=17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 362.32it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=84.336, player_2/loss=500.426, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 362.56it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=55.654, player_2/loss=370.602, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 364.71it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=78.353, player_2/loss=280.237, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 353.76it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=91.023, player_2/loss=307.193, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 362.43it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=56.206, player_2/loss=333.784, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 362.61it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=44.556, player_2/loss=345.831, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 363.46it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=31.288, player_2/loss=313.646, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 362.48it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=11.574, player_2/loss=323.364, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 363.66it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=12.475, player_2/loss=384.657, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 362.34it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=13.376, player_2/loss=446.795, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 362.83it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=10.835, player_2/loss=391.897, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 361.64it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=4.375, rew=12.50]         


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 365.08it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=6.374, player_2/loss=408.450, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 362.46it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=6.629, player_2/loss=383.504, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 382.23it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=3.359, player_2/loss=399.964, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 377.96it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=3.557, player_2/loss=424.058, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 377.01it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=6.487, player_2/loss=307.309, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 373.80it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=102.159, player_2/loss=246.161, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 363.24it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=262.572, player_2/loss=155.585, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 362.74it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=311.039, player_2/loss=109.610, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 365.29it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=260.845, player_2/loss=80.734, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 364.83it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=266.266, player_2/loss=42.146, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 362.62it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=305.662, player_2/loss=19.061, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 363.91it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=229.137, player_2/loss=18.709, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 364.57it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=197.010, player_2/loss=48.017, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 366.25it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=304.810, player_2/loss=56.936, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 364.07it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=299.738, player_2/loss=57.229, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 363.59it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=306.756, player_2/loss=40.186, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 364.63it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=375.386, player_2/loss=66.866, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 364.15it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=297.273, player_2/loss=52.119, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 366.55it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=232.104, player_2/loss=47.390, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 362.66it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=207.179, player_2/loss=125.796, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 364.77it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=207.774, player_2/loss=104.833, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 363.23it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=217.408, player_2/loss=61.227, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 361.52it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=249.865, player_2/loss=52.937, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 362.67it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=118.211, player_2/loss=19.873, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.99it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=133.356, player_2/loss=70.356, rew=-5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 362.93it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=112.991, player_2/loss=186.612, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 361.61it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=75.580, player_2/loss=234.766, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 363.99it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=60.359, player_2/loss=324.007, rew=15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 364.53it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=66.507, player_2/loss=355.936, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 361.14it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=69.620, player_2/loss=363.789, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 360.53it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=72.049, player_2/loss=383.682, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 360.16it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=68.089, player_2/loss=449.549, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 362.93it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=32.034, player_2/loss=461.823, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 361.76it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=16.275, player_2/loss=363.239, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 361.01it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=10.934, player_2/loss=360.215, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 361.12it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=32.792, player_2/loss=368.329, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 360.31it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=32.366, player_2/loss=371.823, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 362.60it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=73.400, player_2/loss=456.119, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 359.86it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=74.331, player_2/loss=474.422, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 364.16it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=6.854, player_2/loss=446.105, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 363.54it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=4.552, player_2/loss=405.792, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 359.10it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=7.113, rew=25.00]          


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 364.53it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=45.417, player_2/loss=282.252, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.46it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=68.111, player_2/loss=289.326, rew=-5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 374.36it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=58.999, player_2/loss=249.536, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 363.87it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=99.571, player_2/loss=198.486, rew=16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 361.63it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=321.697, player_2/loss=203.669, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 362.09it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=487.122, player_2/loss=175.738, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 360.19it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=484.024, player_2/loss=113.712, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 363.96it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=482.869, player_2/loss=60.131, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 361.99it/s, env_step=9216, len=8, n/ep=7, n/st=64, player_1/loss=448.553, player_2/loss=64.780, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 357.93it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=477.868, player_2/loss=33.039, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 361.73it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=467.715, player_2/loss=56.528, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 361.17it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=468.708, player_2/loss=46.166, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 361.69it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=510.099, player_2/loss=9.058, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 362.49it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=505.255, player_2/loss=21.108, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 361.23it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=489.014, player_2/loss=18.458, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 361.14it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=424.964, player_2/loss=12.734, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 361.24it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=483.882, player_2/loss=14.006, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 361.80it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=452.941, player_2/loss=32.099, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 361.55it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=479.368, player_2/loss=35.229, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 361.78it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=313.330, player_2/loss=118.061, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 361.69it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=232.932, player_2/loss=258.588, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 361.98it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=151.418, player_2/loss=511.695, rew=13.89]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 362.82it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=103.529, player_2/loss=724.101, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 361.41it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=75.396, player_2/loss=718.825, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 364.88it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=80.195, player_2/loss=731.051, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 363.39it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=32.844, player_2/loss=731.056, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 360.67it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=21.885, player_2/loss=631.705, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 363.34it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=35.406, player_2/loss=569.039, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 360.77it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=42.985, player_2/loss=672.825, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 361.61it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=45.524, player_2/loss=833.412, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 363.42it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=26.995, player_2/loss=862.106, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 364.16it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=74.687, player_2/loss=740.120, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 362.11it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=128.432, player_2/loss=607.261, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 363.94it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=58.663, player_2/loss=565.495, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 359.70it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=32.914, player_2/loss=634.318, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 361.40it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=24.075, player_2/loss=699.181, rew=3.57]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 362.58it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=51.834, player_2/loss=631.853, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 361.40it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=30.858, player_2/loss=644.097, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 364.01it/s, env_step=1024, len=10, n/ep=7, n/st=64, player_1/loss=55.286, player_2/loss=625.300, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 363.21it/s, env_step=2048, len=11, n/ep=4, n/st=64, player_1/loss=37.065, player_2/loss=445.238, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 362.25it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=48.981, player_2/loss=200.711, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 363.12it/s, env_step=4096, len=16, n/ep=3, n/st=64, player_1/loss=128.768, player_2/loss=97.303, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 364.14it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=184.335, player_2/loss=89.034, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 364.55it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=194.128, player_2/loss=92.761, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 371.30it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=205.333, player_2/loss=78.774, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 373.94it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=281.549, player_2/loss=55.816, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 377.98it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=313.143, player_2/loss=66.739, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 376.91it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=376.367, player_2/loss=103.341, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 364.68it/s, env_step=11264, len=15, n/ep=5, n/st=64, player_1/loss=326.339, player_2/loss=85.486, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 363.16it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=214.438, player_2/loss=40.695, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 364.34it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=227.456, player_2/loss=34.117, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 363.66it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=269.919, player_2/loss=34.217, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 364.67it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=280.066, player_2/loss=27.563, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 363.57it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=301.897, player_2/loss=22.303, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 365.31it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=316.854, player_2/loss=58.389, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 364.11it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=356.699, player_2/loss=56.540, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 360.43it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_2/loss=24.120, rew=12.50]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 361.82it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=166.217, player_2/loss=30.959, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 363.33it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=102.455, player_2/loss=31.581, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 363.90it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=103.333, player_2/loss=28.908, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 365.24it/s, env_step=4096, len=24, n/ep=2, n/st=64, player_1/loss=102.748, player_2/loss=19.844, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 365.36it/s, env_step=5120, len=15, n/ep=3, n/st=64, player_1/loss=107.420, player_2/loss=46.758, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 361.90it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=117.560, player_2/loss=65.146, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 362.42it/s, env_step=7168, len=30, n/ep=2, n/st=64, player_1/loss=96.876, player_2/loss=33.678, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 365.57it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=97.334, player_2/loss=62.356, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 364.62it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=77.241, player_2/loss=130.467, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 363.98it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=36.485, player_2/loss=201.254, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 362.24it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=49.190, player_2/loss=228.024, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 361.85it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=57.775, player_2/loss=164.009, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 362.99it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=68.630, player_2/loss=125.462, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 363.18it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=148.822, player_2/loss=191.542, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 363.08it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=149.139, player_2/loss=289.216, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 360.34it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=116.884, player_2/loss=351.514, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 360.02it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=104.649, player_2/loss=363.161, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 360.85it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=124.809, player_2/loss=444.941, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 358.49it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=97.049, player_2/loss=502.613, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 359.72it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=113.277, player_2/loss=405.162, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 364.46it/s, env_step=2048, len=7, n/ep=6, n/st=64, player_1/loss=83.194, player_2/loss=354.770, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 363.29it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=86.695, player_2/loss=316.845, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 360.05it/s, env_step=4096, len=25, n/ep=3, n/st=64, player_1/loss=151.405, player_2/loss=235.761, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 360.57it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=223.439, player_2/loss=146.815, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 363.76it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=232.828, player_2/loss=86.509, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 362.42it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=210.229, player_2/loss=85.941, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 362.03it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=291.106, player_2/loss=61.418, rew=17.86]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 360.19it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=342.043, player_2/loss=23.605, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 398.14it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=359.006, player_2/loss=16.448, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 425.03it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=353.481, player_2/loss=27.435, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 405.57it/s, env_step=12288, len=13, n/ep=6, n/st=64, player_1/loss=310.696, player_2/loss=25.082, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 393.84it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=323.499, player_2/loss=40.455, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 380.64it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=341.031, player_2/loss=57.998, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 373.16it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=367.564, player_2/loss=68.970, rew=-8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 372.52it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=269.198, player_2/loss=113.545, rew=-8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 373.93it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=212.546, player_2/loss=105.212, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 375.96it/s, env_step=18432, len=27, n/ep=3, n/st=64, player_1/loss=198.282, player_2/loss=96.433, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 375.07it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=178.184, player_2/loss=142.402, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 370.90it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=182.166, player_2/loss=65.413, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 376.69it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=127.273, player_2/loss=76.794, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 363.08it/s, env_step=3072, len=16, n/ep=5, n/st=64, player_1/loss=70.973, player_2/loss=76.338, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 363.31it/s, env_step=4096, len=27, n/ep=2, n/st=64, player_1/loss=60.879, player_2/loss=77.755, rew=25.00]


Epoch #4: test_reward: 100.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 361.16it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=116.272, player_2/loss=114.409, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 357.54it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=189.529, player_2/loss=259.888, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 360.66it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=133.977, player_2/loss=365.990, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 360.21it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=91.559, player_2/loss=379.954, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 363.77it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=51.938, player_2/loss=415.187, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 361.36it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=41.836, rew=19.44]         


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 358.66it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=39.483, player_2/loss=416.396, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 363.44it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=52.021, player_2/loss=403.968, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 360.49it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=60.993, player_2/loss=432.762, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 361.77it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=27.640, player_2/loss=385.980, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 358.70it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=60.210, player_2/loss=395.657, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 361.40it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=98.201, player_2/loss=416.647, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 362.40it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=115.172, player_2/loss=409.021, rew=18.75]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 358.71it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=100.791, player_2/loss=390.431, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 360.64it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=149.553, player_2/loss=368.903, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 363.69it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=34.548, player_2/loss=396.488, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.62it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=80.320, player_2/loss=324.746, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 362.03it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=158.670, player_2/loss=202.482, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 361.12it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=213.742, player_2/loss=82.131, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 363.60it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=235.035, player_2/loss=42.558, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 362.94it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=170.110, player_2/loss=32.106, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 362.27it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=155.872, player_2/loss=23.048, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 360.66it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=210.046, player_2/loss=70.874, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 364.60it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=239.357, player_2/loss=156.077, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 365.48it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=205.830, player_2/loss=170.152, rew=-12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 363.44it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=159.530, player_2/loss=84.924, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 352.06it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=119.009, player_2/loss=30.456, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 364.01it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=141.757, player_2/loss=28.716, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 359.81it/s, env_step=14336, len=16, n/ep=3, n/st=64, player_1/loss=137.253, player_2/loss=48.183, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 386.48it/s, env_step=15360, len=17, n/ep=3, n/st=64, player_1/loss=166.151, player_2/loss=41.720, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 384.49it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=164.554, player_2/loss=38.909, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 374.17it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=136.253, player_2/loss=45.718, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 375.12it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=108.868, player_2/loss=28.781, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 376.27it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=153.491, player_2/loss=33.671, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 363.21it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=231.323, player_2/loss=132.081, rew=-5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.77it/s, env_step=2048, len=10, n/ep=7, n/st=64, player_2/loss=240.151, rew=17.86]         


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 359.11it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=196.856, player_2/loss=389.496, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 359.04it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=163.783, player_2/loss=400.099, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 360.99it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=117.473, player_2/loss=409.609, rew=13.89]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 357.60it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=72.037, player_2/loss=392.981, rew=6.25]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 358.45it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=53.134, player_2/loss=386.019, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 360.56it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=67.312, player_2/loss=358.435, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 358.01it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=179.234, player_2/loss=348.228, rew=19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 357.50it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=180.677, player_2/loss=339.295, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 359.06it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=113.548, player_2/loss=367.458, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 361.55it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=117.776, player_2/loss=334.417, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 361.92it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=95.404, player_2/loss=318.457, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 358.59it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=84.074, player_2/loss=408.497, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 355.95it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=91.069, player_2/loss=428.135, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 359.89it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=70.866, player_2/loss=393.806, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 361.09it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=50.613, player_2/loss=416.580, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 358.89it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=64.275, player_2/loss=422.886, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 359.67it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=69.236, player_2/loss=466.557, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 361.80it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=98.259, player_2/loss=454.367, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.11it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=119.164, player_2/loss=368.352, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 361.47it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=120.076, player_2/loss=301.055, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 363.00it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=110.057, player_2/loss=296.322, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 363.55it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=123.407, player_2/loss=294.691, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 361.99it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=192.499, player_2/loss=272.999, rew=-10.71]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 364.07it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=222.740, player_2/loss=174.891, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 362.99it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=139.470, player_2/loss=118.310, rew=12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 362.23it/s, env_step=9216, len=16, n/ep=3, n/st=64, player_1/loss=144.490, player_2/loss=117.363, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 362.46it/s, env_step=10240, len=19, n/ep=4, n/st=64, player_1/loss=263.504, player_2/loss=112.286, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 359.83it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=233.582, player_2/loss=128.334, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 363.59it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=252.208, player_2/loss=59.241, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 361.66it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_2/loss=61.413, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 363.66it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=341.353, player_2/loss=72.049, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 362.38it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=345.044, player_2/loss=50.416, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 362.38it/s, env_step=16384, len=16, n/ep=3, n/st=64, player_1/loss=303.596, player_2/loss=35.603, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 362.66it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=304.509, player_2/loss=69.035, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 357.01it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=242.519, player_2/loss=92.240, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 365.28it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=192.010, player_2/loss=52.118, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 374.50it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=166.956, player_2/loss=51.914, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.84it/s, env_step=2048, len=23, n/ep=2, n/st=64, player_1/loss=139.467, rew=0.00]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 362.97it/s, env_step=3072, len=29, n/ep=2, n/st=64, player_1/loss=111.161, player_2/loss=94.439, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 362.85it/s, env_step=4096, len=30, n/ep=2, n/st=64, player_1/loss=88.249, player_2/loss=177.043, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 363.53it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=70.291, rew=0.00]           


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 364.94it/s, env_step=6144, len=31, n/ep=2, n/st=64, player_1/loss=66.657, player_2/loss=175.769, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 363.27it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=73.842, player_2/loss=172.733, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 364.27it/s, env_step=8192, len=24, n/ep=3, n/st=64, player_1/loss=59.514, player_2/loss=162.080, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 366.22it/s, env_step=9216, len=27, n/ep=2, n/st=64, player_1/loss=52.803, player_2/loss=155.836, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 363.62it/s, env_step=10240, len=22, n/ep=2, n/st=64, player_1/loss=58.806, player_2/loss=207.281, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 364.48it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=70.332, player_2/loss=215.982, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 363.11it/s, env_step=12288, len=29, n/ep=3, n/st=64, player_1/loss=87.020, player_2/loss=197.028, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 361.32it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=73.534, player_2/loss=246.896, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 359.68it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=54.540, player_2/loss=296.504, rew=2.78]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 360.36it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=91.086, player_2/loss=329.289, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 362.75it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=106.252, player_2/loss=273.936, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 360.50it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=59.933, player_2/loss=279.663, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 363.46it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=89.452, player_2/loss=352.204, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 361.04it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=70.290, player_2/loss=418.427, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 362.66it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=122.620, player_2/loss=302.191, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.34it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=22.229, player_2/loss=340.286, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 361.65it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=84.456, player_2/loss=332.696, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 362.07it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=194.518, player_2/loss=276.644, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 364.05it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=239.614, player_2/loss=171.345, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 362.13it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=235.488, player_2/loss=93.289, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 363.64it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=230.288, player_2/loss=44.610, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 360.99it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=232.184, player_2/loss=15.453, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 362.84it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=251.280, player_2/loss=20.311, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 362.34it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=259.077, player_2/loss=17.999, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 361.55it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=244.639, player_2/loss=23.169, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 364.12it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=251.780, player_2/loss=55.131, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 361.47it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=219.434, player_2/loss=55.289, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 363.24it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=198.057, player_2/loss=32.057, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 361.34it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=197.903, player_2/loss=50.205, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 362.51it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=230.759, player_2/loss=40.020, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 365.59it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=207.031, player_2/loss=18.797, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 360.28it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=162.627, player_2/loss=13.320, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 363.96it/s, env_step=19456, len=28, n/ep=2, n/st=64, player_1/loss=162.592, player_2/loss=10.486, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 360.17it/s, env_step=1024, len=25, n/ep=3, n/st=64, player_1/loss=49.769, player_2/loss=134.686, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 363.14it/s, env_step=2048, len=13, n/ep=6, n/st=64, player_1/loss=91.874, player_2/loss=127.372, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 360.24it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=96.142, player_2/loss=247.315, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 373.32it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=54.353, player_2/loss=305.910, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 374.99it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=70.721, player_2/loss=360.367, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 372.67it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=83.489, player_2/loss=348.174, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 371.86it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=83.840, player_2/loss=340.545, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 372.79it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=63.959, player_2/loss=310.313, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 359.54it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=20.133, player_2/loss=232.272, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 360.88it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=41.160, player_2/loss=265.187, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 357.52it/s, env_step=11264, len=14, n/ep=6, n/st=64, player_1/loss=50.418, player_2/loss=373.453, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 354.06it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=19.530, player_2/loss=381.471, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 360.75it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=41.509, player_2/loss=348.759, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 357.80it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=52.667, player_2/loss=384.992, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 360.12it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=28.192, player_2/loss=440.213, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 360.38it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=21.334, player_2/loss=407.184, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 361.10it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=10.197, player_2/loss=402.953, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 361.08it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=28.594, player_2/loss=443.562, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 357.35it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=36.220, player_2/loss=441.580, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 358.84it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=26.455, player_2/loss=326.208, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.34it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=22.313, player_2/loss=281.429, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 360.52it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=28.280, player_2/loss=224.229, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 360.64it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=28.409, player_2/loss=181.657, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 362.45it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=11.071, player_2/loss=178.832, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 359.25it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=79.209, player_2/loss=131.824, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 360.87it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=163.960, player_2/loss=103.407, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 360.19it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=158.964, rew=-25.00]        


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 360.81it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=155.156, player_2/loss=60.982, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 361.77it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=207.908, player_2/loss=47.750, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 362.61it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=275.038, player_2/loss=13.991, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 361.57it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=340.838, player_2/loss=59.340, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 363.01it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=303.982, player_2/loss=55.283, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 359.05it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=247.394, player_2/loss=44.147, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 361.41it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=206.744, player_2/loss=61.931, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 359.28it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=239.848, player_2/loss=69.869, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 362.10it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=249.550, player_2/loss=74.991, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 358.52it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=252.235, player_2/loss=52.886, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 359.81it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=272.415, player_2/loss=79.035, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 357.84it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=83.709, player_2/loss=234.701, rew=13.89]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 358.30it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=67.902, player_2/loss=297.548, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 357.87it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=108.489, rew=19.44]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 358.91it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=145.155, player_2/loss=317.697, rew=6.25]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 356.92it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=95.382, player_2/loss=305.663, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 358.63it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=49.030, player_2/loss=301.576, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 361.26it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=42.336, player_2/loss=311.528, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 367.65it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=32.090, player_2/loss=315.936, rew=6.25]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 359.10it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=23.018, player_2/loss=299.850, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 359.60it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=82.557, player_2/loss=276.462, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 358.60it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_2/loss=248.913, rew=19.44]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 359.14it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=77.485, player_2/loss=236.719, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 358.01it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=145.614, player_2/loss=242.729, rew=13.89]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 356.52it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=131.481, player_2/loss=257.118, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 358.44it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=67.760, player_2/loss=311.426, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 357.21it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=76.149, player_2/loss=327.833, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 358.15it/s, env_step=17408, len=7, n/ep=10, n/st=64, player_1/loss=76.159, player_2/loss=345.238, rew=20.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 354.81it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=97.009, player_2/loss=311.477, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 347.32it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=105.909, player_2/loss=328.529, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 359.94it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=107.715, player_2/loss=323.785, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.08it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=86.045, player_2/loss=323.512, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 361.27it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=43.390, player_2/loss=285.119, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 360.23it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=42.128, player_2/loss=244.064, rew=-19.44]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 359.77it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=53.019, player_2/loss=229.648, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 361.92it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=126.501, player_2/loss=132.306, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 359.69it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=194.225, player_2/loss=30.102, rew=-15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 359.58it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=188.363, player_2/loss=72.899, rew=15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 360.32it/s, env_step=9216, len=16, n/ep=3, n/st=64, player_2/loss=82.139, rew=25.00]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 355.21it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_2/loss=23.185, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 357.56it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=228.784, player_2/loss=18.266, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 357.89it/s, env_step=12288, len=22, n/ep=3, n/st=64, player_1/loss=199.339, player_2/loss=40.151, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 357.63it/s, env_step=13312, len=16, n/ep=3, n/st=64, player_2/loss=33.352, rew=8.33]         


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 361.43it/s, env_step=14336, len=16, n/ep=3, n/st=64, player_1/loss=218.908, player_2/loss=7.982, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 361.94it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=227.711, player_2/loss=11.703, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 361.01it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=231.976, player_2/loss=10.911, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 359.68it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=203.392, player_2/loss=5.855, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 360.16it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=206.101, player_2/loss=65.895, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 357.91it/s, env_step=19456, len=19, n/ep=3, n/st=64, player_1/loss=183.521, player_2/loss=115.215, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 358.28it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=110.677, player_2/loss=151.432, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.07it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=73.304, player_2/loss=250.930, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.38it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=60.113, player_2/loss=314.319, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 359.62it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=44.217, player_2/loss=297.149, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 359.41it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=38.638, player_2/loss=263.081, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 359.10it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=49.136, player_2/loss=317.124, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 359.47it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=28.385, player_2/loss=342.077, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 358.62it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=28.417, player_2/loss=332.825, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 357.49it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=78.640, player_2/loss=265.072, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 352.43it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=62.731, player_2/loss=283.763, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 358.61it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=22.196, player_2/loss=256.781, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 374.64it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=38.124, player_2/loss=297.975, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 372.04it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=55.716, player_2/loss=271.379, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 360.26it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=37.175, player_2/loss=246.620, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 359.48it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=29.491, player_2/loss=310.390, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 358.05it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=35.243, player_2/loss=341.454, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 359.79it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=53.933, player_2/loss=327.807, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 360.40it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=81.100, player_2/loss=328.258, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 360.21it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=60.527, rew=16.67]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 360.80it/s, env_step=1024, len=20, n/ep=4, n/st=64, player_1/loss=168.490, player_2/loss=219.857, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 356.93it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=173.243, player_2/loss=147.065, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 361.68it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=195.044, player_2/loss=68.403, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 358.86it/s, env_step=4096, len=21, n/ep=4, n/st=64, player_1/loss=243.167, player_2/loss=32.345, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 358.60it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=209.857, player_2/loss=60.272, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 361.16it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=130.941, player_2/loss=60.891, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 360.61it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=212.507, player_2/loss=76.004, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 360.28it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=268.258, player_2/loss=58.286, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 362.79it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=259.730, player_2/loss=30.885, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 362.93it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=221.640, player_2/loss=80.341, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 360.32it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=203.208, player_2/loss=68.423, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 360.39it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=186.161, player_2/loss=21.032, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 361.57it/s, env_step=13312, len=16, n/ep=5, n/st=64, player_1/loss=231.616, player_2/loss=25.429, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 359.81it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=201.986, player_2/loss=18.581, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 359.72it/s, env_step=15360, len=10, n/ep=7, n/st=64, player_1/loss=170.115, player_2/loss=64.273, rew=-17.86]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 361.23it/s, env_step=16384, len=16, n/ep=3, n/st=64, player_1/loss=194.833, player_2/loss=65.610, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 361.56it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=172.980, player_2/loss=117.671, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 360.63it/s, env_step=18432, len=22, n/ep=4, n/st=64, player_1/loss=175.365, player_2/loss=148.457, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 356.78it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=316.698, player_2/loss=121.430, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 357.52it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=148.761, player_2/loss=16.059, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.01it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=90.500, player_2/loss=28.280, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 363.76it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=94.779, player_2/loss=81.605, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 363.26it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=93.767, player_2/loss=121.147, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 359.20it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=100.381, player_2/loss=93.387, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 363.51it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=78.419, player_2/loss=60.533, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 363.44it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=39.214, player_2/loss=39.353, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 362.33it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=42.779, player_2/loss=53.241, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 361.62it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=36.391, player_2/loss=54.562, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 360.68it/s, env_step=10240, len=20, n/ep=4, n/st=64, player_1/loss=23.734, player_2/loss=51.153, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 362.29it/s, env_step=11264, len=19, n/ep=4, n/st=64, player_1/loss=25.541, player_2/loss=52.834, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 361.84it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=25.029, player_2/loss=70.196, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 361.90it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=110.985, player_2/loss=128.757, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 374.36it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=159.853, player_2/loss=193.204, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 384.70it/s, env_step=15360, len=27, n/ep=3, n/st=64, player_1/loss=89.789, player_2/loss=167.584, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 376.58it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=65.882, player_2/loss=197.685, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 374.83it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=83.278, player_2/loss=179.260, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 371.97it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=79.591, player_2/loss=189.206, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 371.27it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=59.796, player_2/loss=227.308, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 362.23it/s, env_step=1024, len=29, n/ep=3, n/st=64, player_1/loss=77.364, player_2/loss=220.319, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.24it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=78.597, player_2/loss=194.359, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 360.59it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=76.144, player_2/loss=148.916, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 359.53it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=228.170, player_2/loss=378.107, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 361.92it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=246.134, player_2/loss=391.528, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 362.16it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=101.481, player_2/loss=349.329, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 359.41it/s, env_step=7168, len=28, n/ep=2, n/st=64, player_1/loss=83.525, player_2/loss=117.340, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 358.84it/s, env_step=8192, len=30, n/ep=2, n/st=64, player_1/loss=79.937, player_2/loss=98.875, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 361.07it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=131.768, player_2/loss=101.405, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 361.97it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=163.636, player_2/loss=105.511, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 361.71it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=127.551, player_2/loss=83.788, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 359.51it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=128.222, player_2/loss=89.749, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 362.29it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=127.260, player_2/loss=111.841, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 364.26it/s, env_step=14336, len=20, n/ep=4, n/st=64, player_1/loss=125.283, player_2/loss=111.816, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 361.88it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=124.613, player_2/loss=117.966, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 362.78it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=111.508, player_2/loss=118.983, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 362.68it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=101.300, player_2/loss=98.987, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 361.99it/s, env_step=18432, len=22, n/ep=3, n/st=64, player_1/loss=124.990, player_2/loss=68.006, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 362.97it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=159.195, player_2/loss=70.925, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 359.66it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=103.169, player_2/loss=92.566, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.19it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=116.400, player_2/loss=117.255, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 358.06it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=104.627, player_2/loss=126.121, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 359.42it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=83.644, player_2/loss=111.764, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 362.91it/s, env_step=5120, len=20, n/ep=4, n/st=64, player_1/loss=110.096, player_2/loss=110.525, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 360.31it/s, env_step=6144, len=14, n/ep=6, n/st=64, player_1/loss=134.202, player_2/loss=144.553, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 359.87it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=132.229, player_2/loss=163.633, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 360.08it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=101.499, player_2/loss=212.735, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 360.72it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=56.976, player_2/loss=227.500, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 359.35it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=37.992, player_2/loss=207.494, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 358.00it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=33.749, player_2/loss=170.004, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 359.66it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=22.316, player_2/loss=166.282, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 360.66it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=54.529, player_2/loss=183.232, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 361.26it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=49.914, player_2/loss=193.613, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 359.42it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=7.997, player_2/loss=175.095, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 358.27it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=10.306, player_2/loss=174.053, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 360.10it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=11.962, player_2/loss=170.325, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 360.59it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=12.219, player_2/loss=173.253, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 369.35it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=24.295, player_2/loss=174.490, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 372.37it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=11.140, player_2/loss=183.399, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 366.00it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=16.457, player_2/loss=152.676, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 358.74it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=22.052, player_2/loss=163.941, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 358.43it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=16.707, player_2/loss=166.698, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 362.77it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=12.353, player_2/loss=144.327, rew=-16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 359.17it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=198.340, player_2/loss=173.760, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 358.94it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=409.554, player_2/loss=140.575, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 362.06it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=394.560, player_2/loss=77.914, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 361.29it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=407.371, rew=25.00]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 360.62it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=380.305, player_2/loss=52.155, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 361.35it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=336.921, player_2/loss=23.390, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 362.40it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=341.561, player_2/loss=17.813, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 357.76it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=347.316, player_2/loss=17.510, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 361.14it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_2/loss=24.657, rew=25.00]         


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 362.83it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=368.569, player_2/loss=5.754, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 363.71it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=409.372, player_2/loss=4.802, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 362.46it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=402.587, player_2/loss=8.379, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 360.65it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=374.375, player_2/loss=35.121, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 360.68it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=327.117, player_2/loss=55.540, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 352.74it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=255.535, player_2/loss=399.671, rew=13.89]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 357.01it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=123.697, player_2/loss=519.081, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 363.61it/s, env_step=3072, len=7, n/ep=10, n/st=64, player_1/loss=90.061, player_2/loss=582.850, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 358.98it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=89.830, player_2/loss=579.512, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 359.83it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=102.775, player_2/loss=432.145, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 357.04it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=95.562, player_2/loss=468.366, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 359.46it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=92.085, player_2/loss=653.862, rew=13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 359.21it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=101.524, player_2/loss=612.771, rew=13.89]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 358.97it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=114.492, player_2/loss=515.879, rew=13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 360.55it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=165.965, player_2/loss=569.681, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 357.83it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=90.247, player_2/loss=577.428, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 361.65it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=74.122, player_2/loss=533.264, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 357.74it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=100.838, player_2/loss=520.696, rew=13.89]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 358.45it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=110.557, player_2/loss=553.502, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 357.78it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=65.226, player_2/loss=614.873, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 359.71it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=35.413, player_2/loss=558.105, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 360.84it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=29.914, player_2/loss=698.053, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 358.06it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=39.230, player_2/loss=673.195, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 359.91it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=24.338, player_2/loss=650.174, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 358.86it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=19.992, player_2/loss=516.099, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.69it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=20.470, player_2/loss=458.252, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 369.25it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=14.544, player_2/loss=341.295, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 372.10it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=7.897, player_2/loss=270.361, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 360.86it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=31.811, player_2/loss=227.952, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 362.06it/s, env_step=6144, len=20, n/ep=4, n/st=64, player_1/loss=67.191, player_2/loss=221.430, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 359.76it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=134.705, player_2/loss=209.781, rew=-19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 360.97it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=140.434, player_2/loss=224.659, rew=-6.25]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 360.21it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=172.107, player_2/loss=197.384, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 360.46it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=214.515, player_2/loss=129.040, rew=-19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 354.48it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=217.669, player_2/loss=93.836, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 359.67it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=161.642, player_2/loss=149.142, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 360.69it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=109.107, player_2/loss=159.082, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 361.99it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=121.524, player_2/loss=148.264, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 361.81it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=134.239, player_2/loss=130.714, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 360.36it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=159.691, player_2/loss=178.908, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 361.52it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=76.912, player_2/loss=213.406, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 359.96it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=76.271, player_2/loss=223.310, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 361.86it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=80.581, player_2/loss=183.322, rew=-15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 361.17it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=42.919, player_2/loss=155.169, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 360.17it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=42.038, player_2/loss=166.481, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 361.64it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=27.769, player_2/loss=194.851, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 360.19it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=19.206, player_2/loss=187.398, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 358.88it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=14.696, player_2/loss=137.873, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 360.84it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=23.983, player_2/loss=160.765, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 359.47it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=39.400, player_2/loss=166.291, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 361.45it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=51.687, player_2/loss=187.764, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 358.52it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=57.181, player_2/loss=172.947, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 358.98it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=19.596, player_2/loss=166.251, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 359.95it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=16.804, player_2/loss=149.543, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 360.91it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=36.065, player_2/loss=161.283, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 357.10it/s, env_step=13312, len=12, n/ep=6, n/st=64, player_1/loss=26.786, player_2/loss=141.825, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 360.03it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=5.138, player_2/loss=155.378, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 362.40it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=10.426, player_2/loss=146.072, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 359.34it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=12.855, player_2/loss=173.514, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 358.05it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=5.189, player_2/loss=171.213, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 361.07it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=3.210, player_2/loss=201.943, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 358.49it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=103.962, player_2/loss=212.560, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 360.07it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=51.012, player_2/loss=151.600, rew=15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 360.67it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=149.232, player_2/loss=198.127, rew=-13.89]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 359.11it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=161.760, player_2/loss=180.690, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 359.89it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=77.632, player_2/loss=134.664, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 361.70it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=110.478, player_2/loss=177.457, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 361.61it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=209.425, player_2/loss=220.452, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 377.40it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=218.361, player_2/loss=183.710, rew=-18.75]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 376.00it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=137.523, player_2/loss=140.995, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 363.37it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=61.664, player_2/loss=83.460, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 358.85it/s, env_step=10240, len=29, n/ep=2, n/st=64, player_1/loss=55.673, player_2/loss=44.435, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 359.10it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_2/loss=66.625, rew=8.33]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 361.46it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=170.441, player_2/loss=105.153, rew=-6.25]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 358.84it/s, env_step=13312, len=24, n/ep=3, n/st=64, player_1/loss=203.537, player_2/loss=158.553, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 359.22it/s, env_step=14336, len=15, n/ep=3, n/st=64, player_1/loss=171.803, player_2/loss=147.200, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 362.35it/s, env_step=15360, len=24, n/ep=3, n/st=64, player_1/loss=125.991, player_2/loss=97.116, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 359.72it/s, env_step=16384, len=24, n/ep=3, n/st=64, player_1/loss=109.983, player_2/loss=93.551, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 359.34it/s, env_step=17408, len=24, n/ep=2, n/st=64, player_1/loss=120.038, player_2/loss=75.178, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 359.68it/s, env_step=18432, len=23, n/ep=2, n/st=64, player_1/loss=112.069, player_2/loss=58.272, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 360.54it/s, env_step=19456, len=23, n/ep=2, n/st=64, player_1/loss=108.383, player_2/loss=71.240, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 357.33it/s, env_step=1024, len=23, n/ep=2, n/st=64, player_1/loss=115.422, player_2/loss=84.455, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 358.66it/s, env_step=2048, len=23, n/ep=2, n/st=64, player_1/loss=88.268, player_2/loss=68.429, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 359.00it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=98.483, player_2/loss=72.634, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 358.22it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=102.394, player_2/loss=71.358, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 358.45it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=118.309, player_2/loss=105.067, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 361.10it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=111.011, player_2/loss=113.605, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 357.23it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=62.434, player_2/loss=131.622, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 356.93it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=29.794, player_2/loss=178.686, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 358.87it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=10.410, player_2/loss=182.636, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 355.48it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=6.394, player_2/loss=164.357, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 356.34it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=33.174, player_2/loss=184.509, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 357.89it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=42.734, player_2/loss=169.243, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 359.55it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=26.549, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 358.80it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=17.333, player_2/loss=176.099, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 357.57it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=23.557, player_2/loss=225.982, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 358.23it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=21.239, rew=25.00]        


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 359.65it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=4.129, player_2/loss=208.598, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 357.66it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=3.121, player_2/loss=193.628, rew=16.67]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 358.61it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=1.254, player_2/loss=178.440, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 359.23it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=72.163, player_2/loss=134.127, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.42it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=39.420, player_2/loss=110.702, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 359.51it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=4.466, player_2/loss=84.988, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 358.03it/s, env_step=4096, len=16, n/ep=3, n/st=64, player_1/loss=84.248, player_2/loss=116.375, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 361.58it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=182.260, player_2/loss=141.862, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 359.28it/s, env_step=6144, len=27, n/ep=2, n/st=64, player_1/loss=188.342, player_2/loss=117.708, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 361.90it/s, env_step=7168, len=28, n/ep=2, n/st=64, player_1/loss=170.165, player_2/loss=97.184, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 358.90it/s, env_step=8192, len=16, n/ep=3, n/st=64, player_1/loss=158.098, player_2/loss=81.543, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 355.14it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=210.469, player_2/loss=115.655, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 366.45it/s, env_step=10240, len=25, n/ep=3, n/st=64, player_1/loss=240.723, player_2/loss=138.329, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 373.65it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=214.373, player_2/loss=122.472, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 373.97it/s, env_step=12288, len=25, n/ep=2, n/st=64, player_1/loss=217.907, player_2/loss=95.991, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 375.91it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=438.861, player_2/loss=105.491, rew=-15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 370.87it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=478.294, player_2/loss=138.611, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 371.14it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=263.901, player_2/loss=151.957, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 359.85it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=411.615, player_2/loss=111.298, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 356.72it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=530.571, player_2/loss=17.773, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 361.05it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=471.788, player_2/loss=55.568, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 359.96it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=412.857, player_2/loss=53.700, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 360.10it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=116.253, player_2/loss=258.914, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 356.98it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=96.975, player_2/loss=376.467, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 357.60it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=82.550, player_2/loss=434.171, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 357.25it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=16.381, player_2/loss=406.432, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 356.40it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=20.638, player_2/loss=333.235, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 356.00it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=23.714, player_2/loss=389.432, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 356.07it/s, env_step=7168, len=8, n/ep=9, n/st=64, player_1/loss=68.398, player_2/loss=367.601, rew=13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 354.87it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=89.216, player_2/loss=367.213, rew=6.25]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 357.75it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=57.285, player_2/loss=331.942, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 358.69it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=54.376, player_2/loss=388.681, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 357.08it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=37.322, player_2/loss=370.915, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 356.59it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=50.919, player_2/loss=329.549, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 357.34it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=23.654, player_2/loss=314.003, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 353.09it/s, env_step=14336, len=8, n/ep=9, n/st=64, player_1/loss=27.848, player_2/loss=367.977, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 354.81it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=14.260, player_2/loss=392.596, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 357.55it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=6.753, player_2/loss=361.856, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 354.75it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=27.523, player_2/loss=372.631, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 358.03it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=81.648, player_2/loss=358.120, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 356.24it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=98.289, player_2/loss=368.687, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 358.80it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=117.765, player_2/loss=299.491, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 360.40it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=159.209, player_2/loss=202.638, rew=-16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 360.71it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=139.338, player_2/loss=162.452, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 360.00it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=65.059, player_2/loss=170.748, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 357.46it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=143.818, player_2/loss=192.310, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 358.17it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=221.795, player_2/loss=207.050, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 358.62it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=244.737, player_2/loss=205.397, rew=-13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 357.83it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=270.054, player_2/loss=165.301, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 358.70it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=232.084, player_2/loss=177.745, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 361.58it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=131.733, player_2/loss=158.504, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 362.03it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=29.572, player_2/loss=133.827, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 358.22it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=49.378, player_2/loss=108.213, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 361.76it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=89.811, player_2/loss=121.290, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 375.91it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=136.195, player_2/loss=120.309, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 377.47it/s, env_step=15360, len=24, n/ep=3, n/st=64, player_1/loss=168.535, player_2/loss=123.072, rew=-8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 363.38it/s, env_step=16384, len=23, n/ep=3, n/st=64, player_1/loss=141.050, player_2/loss=84.307, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 359.59it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=133.706, player_2/loss=40.522, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 360.66it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=179.615, player_2/loss=54.796, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 358.39it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=140.997, player_2/loss=73.242, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 360.62it/s, env_step=1024, len=25, n/ep=2, n/st=64, player_1/loss=131.413, player_2/loss=37.745, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.57it/s, env_step=2048, len=25, n/ep=2, n/st=64, player_1/loss=93.844, player_2/loss=22.535, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 359.10it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=64.589, player_2/loss=34.006, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 356.74it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=122.841, player_2/loss=89.888, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 361.93it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=141.836, player_2/loss=128.136, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 359.02it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=86.727, player_2/loss=116.113, rew=12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 360.07it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=68.994, player_2/loss=131.866, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 358.07it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=56.349, player_2/loss=149.901, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 358.46it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=27.371, player_2/loss=159.395, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 362.97it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=78.902, player_2/loss=167.286, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 361.58it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=104.618, player_2/loss=207.650, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 360.93it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=42.933, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 360.65it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=123.632, player_2/loss=156.689, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 359.52it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=130.794, player_2/loss=148.668, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 360.66it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=88.740, player_2/loss=191.424, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 359.54it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=62.357, player_2/loss=181.765, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 360.33it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=23.002, player_2/loss=138.388, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 362.23it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=17.816, player_2/loss=110.843, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 359.60it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=26.706, player_2/loss=108.897, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 358.17it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=35.980, player_2/loss=191.027, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 360.95it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=49.959, player_2/loss=166.215, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 362.64it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=44.808, player_2/loss=135.395, rew=-12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 361.22it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=53.697, player_2/loss=119.940, rew=-5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 362.65it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=63.217, player_2/loss=102.707, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 358.80it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=73.944, rew=-12.50]         


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 361.28it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=134.450, player_2/loss=147.998, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 348.33it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=144.691, player_2/loss=110.207, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 359.44it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=221.280, player_2/loss=127.775, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 361.42it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=258.231, player_2/loss=161.966, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 363.42it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=174.018, player_2/loss=136.564, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 360.27it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=213.676, player_2/loss=165.535, rew=-15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 358.09it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=234.165, player_2/loss=206.839, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 360.60it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=303.873, player_2/loss=171.626, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 361.02it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=337.197, player_2/loss=111.803, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 357.47it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=321.633, player_2/loss=71.342, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 359.81it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=309.196, player_2/loss=79.944, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 371.96it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=335.251, player_2/loss=76.544, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 369.90it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=379.641, player_2/loss=51.868, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 359.38it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=328.480, player_2/loss=212.783, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 356.55it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=234.970, rew=19.44]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 356.87it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=142.951, player_2/loss=643.981, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 358.15it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=100.365, player_2/loss=711.875, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 359.45it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=114.726, player_2/loss=632.791, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 359.10it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=178.979, rew=12.50]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 356.27it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=109.763, player_2/loss=617.031, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 357.14it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=52.028, player_2/loss=645.157, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 358.99it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=86.285, player_2/loss=600.015, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 356.81it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_2/loss=545.505, rew=12.50]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 356.23it/s, env_step=11264, len=7, n/ep=7, n/st=64, player_1/loss=63.531, rew=25.00]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 357.54it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=103.682, player_2/loss=566.682, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 359.65it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=135.302, player_2/loss=594.386, rew=13.89]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 357.44it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=116.512, player_2/loss=560.197, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 357.65it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=119.354, player_2/loss=560.766, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 356.81it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=104.474, player_2/loss=605.595, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 358.47it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=92.724, player_2/loss=628.983, rew=13.89]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 356.73it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=84.962, player_2/loss=608.861, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 357.36it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=49.199, player_2/loss=656.727, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 360.08it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=31.996, player_2/loss=458.630, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.16it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=172.412, player_2/loss=373.794, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 358.01it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=239.395, player_2/loss=245.203, rew=-19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 362.06it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=360.604, player_2/loss=105.516, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 358.14it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=490.094, player_2/loss=31.374, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 360.09it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=439.230, player_2/loss=55.153, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 360.41it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=392.026, player_2/loss=66.431, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 361.77it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=437.709, player_2/loss=32.954, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 360.83it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_2/loss=17.114, rew=25.00]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 359.87it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=543.362, player_2/loss=10.743, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 362.24it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=584.381, player_2/loss=6.026, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 359.99it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=498.980, player_2/loss=9.915, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 361.88it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=472.136, player_2/loss=25.821, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 359.10it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=429.883, player_2/loss=21.377, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 358.97it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=488.070, player_2/loss=4.753, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 361.00it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=443.011, player_2/loss=28.012, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 356.65it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=345.676, player_2/loss=47.257, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 359.61it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=353.043, player_2/loss=26.438, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 361.17it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=462.783, player_2/loss=7.479, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 355.46it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=272.542, player_2/loss=93.891, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 358.05it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=227.025, player_2/loss=272.776, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.88it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=226.350, player_2/loss=394.509, rew=-15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 358.84it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=168.796, player_2/loss=330.151, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 359.55it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=79.657, player_2/loss=456.796, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 360.41it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=88.397, player_2/loss=446.111, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 361.42it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=82.993, player_2/loss=422.369, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 358.97it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=118.642, player_2/loss=451.199, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 355.61it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=182.431, player_2/loss=386.325, rew=16.67]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 353.46it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=175.229, player_2/loss=468.387, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 358.16it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=131.164, player_2/loss=420.807, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 356.58it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=108.077, player_2/loss=371.157, rew=5.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 359.72it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=101.394, player_2/loss=354.758, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 361.24it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=103.636, player_2/loss=338.501, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 359.41it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=126.745, player_2/loss=305.709, rew=15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 361.25it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=105.315, player_2/loss=476.210, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 359.16it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=96.136, player_2/loss=661.933, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 360.53it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=114.802, player_2/loss=453.126, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 358.14it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=103.923, player_2/loss=349.155, rew=16.67]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 358.96it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=50.235, player_2/loss=408.368, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 360.59it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=53.501, player_2/loss=315.778, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 360.52it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=43.484, player_2/loss=217.866, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 362.91it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=87.844, rew=-25.00]         


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 359.71it/s, env_step=5120, len=17, n/ep=3, n/st=64, player_1/loss=118.479, player_2/loss=175.306, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 359.82it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=93.280, player_2/loss=160.771, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 361.74it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=115.883, player_2/loss=155.205, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 360.42it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=97.648, player_2/loss=170.481, rew=-15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 357.58it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=91.102, player_2/loss=145.919, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 361.62it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=182.005, player_2/loss=107.170, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 358.13it/s, env_step=11264, len=10, n/ep=7, n/st=64, player_1/loss=222.641, player_2/loss=116.750, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 360.47it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=194.611, player_2/loss=110.977, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 362.66it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=266.471, player_2/loss=90.975, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 358.86it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=255.713, rew=16.67]       


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 358.80it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=240.645, player_2/loss=104.440, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 357.34it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=242.928, player_2/loss=80.647, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 359.96it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=216.892, player_2/loss=50.044, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 360.21it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=233.612, player_2/loss=31.334, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 358.65it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=253.422, player_2/loss=71.740, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 357.27it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=82.244, player_2/loss=372.032, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 356.74it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=37.199, player_2/loss=360.335, rew=17.86]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 355.73it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=33.516, player_2/loss=327.230, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 356.37it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=24.363, player_2/loss=322.219, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 355.99it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=23.744, player_2/loss=367.057, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 379.79it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=13.696, player_2/loss=364.398, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 368.18it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=8.156, player_2/loss=377.844, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 357.87it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=5.087, player_2/loss=328.449, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 355.37it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=3.151, player_2/loss=339.525, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 356.18it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=10.695, player_2/loss=359.876, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 357.37it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=12.546, player_2/loss=313.164, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 357.78it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=6.613, player_2/loss=308.135, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 356.78it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=4.630, player_2/loss=385.093, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 358.65it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=0.952, player_2/loss=400.036, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 358.08it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=20.968, player_2/loss=378.364, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 355.33it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=15.617, rew=25.00]         


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 355.27it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=5.319, player_2/loss=409.876, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 357.69it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=5.244, player_2/loss=346.083, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 354.22it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=3.876, player_2/loss=369.064, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 359.34it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=1.935, player_2/loss=275.058, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.67it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=49.351, player_2/loss=212.327, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 361.01it/s, env_step=3072, len=11, n/ep=4, n/st=64, player_1/loss=139.424, player_2/loss=190.104, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 357.76it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=149.752, player_2/loss=179.568, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 359.57it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=157.633, player_2/loss=323.350, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 361.79it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=115.895, player_2/loss=354.458, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 361.45it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=73.356, player_2/loss=176.217, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 357.00it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=94.139, player_2/loss=138.634, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 361.80it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=118.213, player_2/loss=104.843, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 360.58it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=99.501, player_2/loss=76.216, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 357.27it/s, env_step=11264, len=20, n/ep=4, n/st=64, player_1/loss=79.949, player_2/loss=52.774, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 362.40it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=114.282, player_2/loss=71.531, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 359.55it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=123.448, player_2/loss=79.834, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 358.50it/s, env_step=14336, len=24, n/ep=2, n/st=64, player_1/loss=86.084, player_2/loss=80.314, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 355.42it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=108.536, player_2/loss=132.850, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 360.53it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=243.800, player_2/loss=119.601, rew=18.75]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 357.60it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=488.797, player_2/loss=111.824, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 358.24it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=586.344, player_2/loss=125.403, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 358.62it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=533.039, player_2/loss=110.694, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 358.63it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=140.118, player_2/loss=140.243, rew=10.71]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.52it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=89.525, player_2/loss=302.443, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.23it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=34.995, player_2/loss=464.434, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 358.00it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=16.994, rew=25.00]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 358.63it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=34.170, player_2/loss=535.167, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 358.88it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=8.093, player_2/loss=607.758, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 359.26it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=22.667, player_2/loss=627.915, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 356.58it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=27.829, player_2/loss=567.809, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 358.24it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=13.361, player_2/loss=614.604, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 371.92it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=32.845, player_2/loss=591.525, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 360.43it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=58.889, player_2/loss=567.585, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 359.77it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=40.270, player_2/loss=483.328, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 357.66it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=38.344, player_2/loss=438.140, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 358.25it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=23.455, player_2/loss=450.653, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 358.92it/s, env_step=15360, len=7, n/ep=10, n/st=64, player_1/loss=8.597, player_2/loss=470.915, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 357.77it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=45.466, player_2/loss=524.279, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 358.63it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=70.560, rew=25.00]         


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 361.65it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=57.063, player_2/loss=503.879, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 357.41it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=35.258, player_2/loss=493.946, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 358.60it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=40.802, player_2/loss=423.031, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.66it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=48.583, player_2/loss=334.771, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 361.21it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=64.472, player_2/loss=253.795, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 359.64it/s, env_step=4096, len=24, n/ep=3, n/st=64, player_1/loss=57.306, player_2/loss=184.181, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 363.66it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=54.335, player_2/loss=133.558, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 361.81it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=75.885, player_2/loss=89.452, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 361.58it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=107.278, player_2/loss=121.056, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 358.62it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=110.266, player_2/loss=138.468, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 360.18it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=109.401, player_2/loss=96.801, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 360.26it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=47.807, player_2/loss=40.522, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 361.90it/s, env_step=11264, len=20, n/ep=4, n/st=64, player_1/loss=48.873, player_2/loss=26.761, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 362.11it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=58.333, player_2/loss=57.067, rew=-8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #13: 1025it [00:02, 360.88it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_2/loss=67.301, rew=-25.00]       


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #14: 1025it [00:02, 358.86it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=68.858, player_2/loss=64.792, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #15: 1025it [00:02, 359.87it/s, env_step=15360, len=26, n/ep=3, n/st=64, player_1/loss=68.131, player_2/loss=20.552, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #16: 1025it [00:02, 359.69it/s, env_step=16384, len=30, n/ep=2, n/st=64, player_1/loss=66.440, player_2/loss=11.918, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #17: 1025it [00:02, 357.93it/s, env_step=17408, len=29, n/ep=2, n/st=64, player_1/loss=60.941, player_2/loss=47.750, rew=0.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #18: 1025it [00:02, 358.81it/s, env_step=18432, len=31, n/ep=2, n/st=64, player_1/loss=50.924, player_2/loss=50.915, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #19: 1025it [00:02, 358.54it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=53.305, player_2/loss=25.097, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #1: 1025it [00:02, 359.20it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=67.052, player_2/loss=56.391, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 361.07it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=52.676, player_2/loss=53.728, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.51it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=70.068, player_2/loss=108.097, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 357.91it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=146.041, player_2/loss=140.610, rew=5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 358.36it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=193.705, player_2/loss=198.027, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 357.20it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=149.203, player_2/loss=196.573, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 358.90it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=79.228, player_2/loss=176.727, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 358.91it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=94.297, player_2/loss=50.036, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 359.42it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=120.066, player_2/loss=34.765, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 356.44it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=106.185, player_2/loss=114.656, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 357.75it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=105.688, player_2/loss=201.940, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 358.29it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=62.644, rew=19.44]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 368.13it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=44.340, player_2/loss=240.849, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 357.36it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=45.156, player_2/loss=245.513, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 358.03it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=92.050, player_2/loss=223.242, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 357.23it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=120.377, player_2/loss=184.835, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 356.67it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=116.627, player_2/loss=211.510, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 358.74it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=127.522, player_2/loss=244.187, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 359.14it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=91.762, rew=25.00]         


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 357.38it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=6.586, player_2/loss=167.446, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.04it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=11.773, player_2/loss=125.017, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 360.14it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=58.420, player_2/loss=107.133, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 359.85it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=74.228, player_2/loss=103.671, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 359.29it/s, env_step=5120, len=10, n/ep=5, n/st=64, player_1/loss=81.972, player_2/loss=109.019, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 361.16it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=89.642, player_2/loss=107.576, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 360.23it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=101.943, player_2/loss=121.691, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 358.55it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=53.249, player_2/loss=112.441, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 360.66it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=60.331, player_2/loss=90.164, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 359.53it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=109.844, player_2/loss=81.847, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 359.35it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=99.573, player_2/loss=23.089, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 357.89it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=72.573, player_2/loss=49.506, rew=-17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 357.57it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=80.183, player_2/loss=63.899, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 360.24it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=80.562, player_2/loss=65.010, rew=-17.86]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 359.26it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=97.318, player_2/loss=23.192, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 359.52it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=78.406, player_2/loss=39.702, rew=-10.71]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 359.36it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=76.501, player_2/loss=40.426, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 359.21it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=62.998, player_2/loss=46.911, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 360.21it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=64.583, player_2/loss=17.747, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 345.88it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=60.417, player_2/loss=10.482, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 356.67it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=59.301, player_2/loss=27.425, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 358.74it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=44.144, player_2/loss=35.237, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 360.45it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=31.759, player_2/loss=64.457, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 358.81it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=27.563, player_2/loss=59.134, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 359.48it/s, env_step=6144, len=11, n/ep=7, n/st=64, player_1/loss=49.482, player_2/loss=42.444, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 358.64it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=50.040, player_2/loss=34.771, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 357.52it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=11.327, player_2/loss=36.753, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 357.92it/s, env_step=9216, len=9, n/ep=6, n/st=64, player_1/loss=7.700, player_2/loss=25.043, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 358.43it/s, env_step=10240, len=10, n/ep=7, n/st=64, player_1/loss=26.921, player_2/loss=41.120, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 360.00it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=32.246, player_2/loss=36.127, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 356.76it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=38.742, player_2/loss=27.020, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 359.84it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=47.561, player_2/loss=30.605, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 359.76it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=38.687, player_2/loss=32.263, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 353.52it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=38.281, player_2/loss=123.406, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 371.39it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=32.371, player_2/loss=133.843, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 375.60it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=24.404, player_2/loss=62.843, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 365.13it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=12.685, player_2/loss=39.919, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 359.64it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=8.029, player_2/loss=14.679, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 359.65it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=47.198, player_2/loss=10.939, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.14it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=43.605, rew=-25.00]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 359.28it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=53.570, player_2/loss=43.405, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 358.54it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=43.539, player_2/loss=28.520, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 360.74it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=26.610, player_2/loss=37.271, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 359.84it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=26.031, player_2/loss=45.782, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 361.00it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=44.296, player_2/loss=51.429, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 357.63it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=49.142, player_2/loss=55.223, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 357.11it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=65.243, player_2/loss=63.550, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 360.10it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=69.177, player_2/loss=64.735, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 360.14it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=50.052, player_2/loss=67.418, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 358.90it/s, env_step=12288, len=35, n/ep=2, n/st=64, player_1/loss=41.894, player_2/loss=11.036, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 359.62it/s, env_step=13312, len=33, n/ep=2, n/st=64, player_1/loss=35.815, player_2/loss=12.427, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 357.22it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=46.267, player_2/loss=91.121, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 358.46it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=120.405, player_2/loss=169.999, rew=-16.67]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 357.64it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=155.609, player_2/loss=213.140, rew=-19.44]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 358.02it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=212.935, player_2/loss=212.140, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 358.42it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=173.311, player_2/loss=228.976, rew=-25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 355.53it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=154.259, player_2/loss=227.503, rew=-16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 358.13it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=83.070, player_2/loss=267.920, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 356.70it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=59.596, player_2/loss=235.100, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.54it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=30.148, player_2/loss=199.638, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 358.78it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=16.268, player_2/loss=175.665, rew=13.89]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 357.07it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=32.749, player_2/loss=180.216, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 357.14it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=30.910, player_2/loss=203.767, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 356.46it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=12.924, player_2/loss=181.773, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 357.43it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=34.755, player_2/loss=183.979, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 355.74it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=37.275, player_2/loss=168.497, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 356.67it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=5.620, player_2/loss=193.988, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 357.62it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=2.351, player_2/loss=189.497, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 359.75it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=51.343, player_2/loss=173.314, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 354.56it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=95.013, player_2/loss=172.217, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 358.44it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=69.284, player_2/loss=194.126, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 355.99it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=53.526, player_2/loss=174.962, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 354.10it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=57.801, player_2/loss=177.904, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 359.01it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=30.840, player_2/loss=180.855, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 359.07it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=71.538, player_2/loss=203.669, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 358.86it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=91.374, player_2/loss=225.002, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 379.85it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=120.112, player_2/loss=180.484, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 372.69it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=79.762, player_2/loss=161.171, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 369.45it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=89.676, player_2/loss=151.185, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 357.89it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=103.779, player_2/loss=145.357, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 358.62it/s, env_step=5120, len=20, n/ep=4, n/st=64, player_1/loss=64.267, player_2/loss=132.930, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 359.15it/s, env_step=6144, len=21, n/ep=2, n/st=64, player_1/loss=93.725, player_2/loss=107.574, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 359.13it/s, env_step=7168, len=28, n/ep=2, n/st=64, player_1/loss=118.017, player_2/loss=103.068, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 355.32it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=100.013, player_2/loss=107.494, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 356.16it/s, env_step=9216, len=25, n/ep=4, n/st=64, player_1/loss=108.915, player_2/loss=87.449, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 359.71it/s, env_step=10240, len=36, n/ep=2, n/st=64, player_1/loss=104.058, player_2/loss=86.684, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 359.31it/s, env_step=11264, len=24, n/ep=2, n/st=64, player_1/loss=92.993, player_2/loss=99.269, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 358.27it/s, env_step=12288, len=26, n/ep=3, n/st=64, player_1/loss=141.109, player_2/loss=113.111, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 358.36it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=149.462, player_2/loss=135.885, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 360.38it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=173.298, player_2/loss=142.404, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 359.90it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=147.819, player_2/loss=105.551, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 357.86it/s, env_step=16384, len=31, n/ep=2, n/st=64, player_1/loss=97.920, player_2/loss=44.866, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 359.64it/s, env_step=17408, len=27, n/ep=2, n/st=64, player_1/loss=148.931, player_2/loss=86.159, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 360.98it/s, env_step=18432, len=29, n/ep=2, n/st=64, player_1/loss=145.752, player_2/loss=91.552, rew=-25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 358.88it/s, env_step=19456, len=18, n/ep=2, n/st=64, player_1/loss=104.434, player_2/loss=56.220, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 358.10it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=138.258, player_2/loss=138.663, rew=13.89]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 360.84it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=127.054, player_2/loss=231.883, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 356.81it/s, env_step=3072, len=7, n/ep=7, n/st=64, player_1/loss=130.058, player_2/loss=264.890, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 355.88it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=65.492, rew=25.00]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 356.75it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=101.339, player_2/loss=183.803, rew=13.89]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 357.86it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=160.658, player_2/loss=251.135, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 357.15it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=82.361, player_2/loss=232.596, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 356.85it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=118.565, player_2/loss=237.237, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 356.95it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=139.893, player_2/loss=266.743, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 357.33it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=128.649, player_2/loss=265.865, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 356.32it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=127.788, player_2/loss=249.687, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 358.99it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=80.754, player_2/loss=193.958, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 354.03it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=67.027, player_2/loss=169.352, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 358.56it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=50.044, player_2/loss=161.562, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 358.17it/s, env_step=15360, len=8, n/ep=9, n/st=64, player_1/loss=11.228, player_2/loss=179.845, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 356.33it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=28.837, player_2/loss=201.819, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 357.08it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=21.590, player_2/loss=194.463, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 357.88it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=18.494, player_2/loss=193.102, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 356.00it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=38.255, player_2/loss=202.869, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 357.12it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=49.024, player_2/loss=156.826, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.33it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=56.577, player_2/loss=197.537, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 355.01it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=137.166, player_2/loss=236.575, rew=-13.89]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 360.67it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=159.208, player_2/loss=205.495, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 371.77it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=129.423, player_2/loss=155.998, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 370.10it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=183.998, player_2/loss=114.925, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 372.04it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=182.584, player_2/loss=123.580, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 361.69it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=98.500, rew=-25.00]         


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 359.08it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=91.455, player_2/loss=64.597, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 356.60it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=129.953, player_2/loss=132.160, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 362.89it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=167.756, player_2/loss=136.231, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 360.07it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=177.640, player_2/loss=125.915, rew=-17.86]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 356.87it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=147.793, player_2/loss=113.184, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 360.01it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=106.374, player_2/loss=127.912, rew=-15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 360.79it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=141.734, player_2/loss=135.630, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 355.44it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=191.454, player_2/loss=155.161, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 347.81it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=200.104, player_2/loss=133.314, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 359.79it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=162.178, player_2/loss=94.808, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:02, 358.75it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_1/loss=148.004, player_2/loss=113.978, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 359.00it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=62.611, player_2/loss=20.964, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 360.91it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=67.384, player_2/loss=269.223, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 357.61it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=50.577, player_2/loss=317.742, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 357.83it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=78.536, player_2/loss=248.094, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 359.09it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=65.352, player_2/loss=292.852, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 352.94it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=36.102, player_2/loss=285.921, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 354.28it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=28.366, player_2/loss=256.957, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 355.21it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=14.205, player_2/loss=214.467, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 355.53it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=10.804, player_2/loss=246.670, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 357.29it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=38.306, player_2/loss=215.759, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 360.02it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=19.613, player_2/loss=165.875, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 356.80it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=14.200, player_2/loss=220.062, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 356.41it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=13.858, player_2/loss=267.188, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 361.06it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=11.920, player_2/loss=285.865, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 358.38it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=21.474, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 358.19it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=21.083, player_2/loss=267.568, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 360.17it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=3.258, player_2/loss=264.858, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 358.18it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=2.276, player_2/loss=249.320, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 356.01it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=18.358, player_2/loss=263.374, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 358.78it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=74.771, player_2/loss=188.115, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.18it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=45.397, player_2/loss=168.579, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.40it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=5.823, player_2/loss=128.206, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 358.14it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=15.109, player_2/loss=88.126, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 358.84it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=15.330, player_2/loss=77.950, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 357.25it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=7.971, player_2/loss=68.576, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 362.07it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=11.934, player_2/loss=76.418, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 378.70it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=7.399, player_2/loss=59.657, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 371.34it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=11.151, player_2/loss=47.267, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 369.26it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=63.171, player_2/loss=56.526, rew=-5.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 356.99it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=124.953, player_2/loss=114.740, rew=-5.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 360.17it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=112.343, player_2/loss=140.172, rew=-16.67]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 358.52it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=166.258, player_2/loss=175.273, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 359.28it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=138.858, player_2/loss=151.739, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 359.28it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=117.499, player_2/loss=111.718, rew=-15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 359.36it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=105.446, player_2/loss=86.720, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 360.58it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=59.891, player_2/loss=119.308, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 356.81it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=76.702, player_2/loss=126.121, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 359.19it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=104.755, player_2/loss=166.317, rew=-16.67]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 357.67it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=65.320, player_2/loss=106.843, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.86it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=90.742, player_2/loss=130.613, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 358.40it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=92.477, player_2/loss=115.815, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 357.86it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=89.063, player_2/loss=95.342, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 356.94it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=138.568, player_2/loss=134.140, rew=-5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 360.91it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=134.531, player_2/loss=168.119, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 358.79it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=85.569, player_2/loss=152.972, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 355.62it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=55.372, player_2/loss=93.847, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 359.03it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=113.516, player_2/loss=105.017, rew=16.67]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 358.12it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=168.290, player_2/loss=125.715, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 355.98it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=99.920, player_2/loss=96.138, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 359.81it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=43.545, player_2/loss=93.638, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 361.59it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=46.657, player_2/loss=80.790, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 355.25it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=78.957, player_2/loss=91.966, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 361.31it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=103.042, rew=8.33]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 358.48it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=96.969, player_2/loss=89.458, rew=-12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 351.73it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=35.611, player_2/loss=102.396, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 359.27it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=47.263, player_2/loss=127.043, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 357.41it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=80.418, player_2/loss=141.657, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 356.20it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=86.196, player_2/loss=65.498, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 358.65it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=66.832, player_2/loss=66.179, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 360.11it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=58.949, player_2/loss=90.210, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 356.22it/s, env_step=4096, len=14, n/ep=3, n/st=64, player_1/loss=90.803, player_2/loss=66.178, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 359.59it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=179.985, player_2/loss=89.382, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 358.51it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=198.733, player_2/loss=192.455, rew=-13.89]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 359.64it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=225.059, player_2/loss=196.227, rew=-19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 360.28it/s, env_step=8192, len=24, n/ep=3, n/st=64, player_1/loss=188.440, player_2/loss=163.513, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 359.46it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=84.909, player_2/loss=113.282, rew=-15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 355.52it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=116.944, player_2/loss=106.002, rew=-8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 355.77it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=128.462, player_2/loss=89.646, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 355.56it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=82.262, player_2/loss=44.445, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 358.83it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=87.157, player_2/loss=110.074, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 361.25it/s, env_step=14336, len=26, n/ep=2, n/st=64, player_1/loss=134.595, player_2/loss=134.458, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 359.75it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=134.921, player_2/loss=83.157, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 358.41it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=108.472, player_2/loss=64.594, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 358.86it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=112.905, player_2/loss=78.469, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 357.47it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=130.750, player_2/loss=95.576, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 359.33it/s, env_step=19456, len=23, n/ep=3, n/st=64, player_1/loss=96.236, player_2/loss=49.094, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 358.11it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=93.535, player_2/loss=108.705, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.38it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=91.507, player_2/loss=114.077, rew=17.86]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 356.39it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=71.565, player_2/loss=91.970, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 358.19it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=79.668, player_2/loss=100.413, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 357.05it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=83.231, player_2/loss=104.851, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 358.38it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=77.181, player_2/loss=95.406, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 355.43it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=54.257, player_2/loss=88.476, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 356.18it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=28.178, player_2/loss=87.367, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 355.65it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=33.635, player_2/loss=100.640, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 357.17it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=39.093, player_2/loss=103.629, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 357.01it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=35.322, player_2/loss=111.006, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 357.81it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=14.932, player_2/loss=103.099, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 357.98it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=13.165, player_2/loss=121.548, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 355.69it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=20.835, player_2/loss=103.923, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 356.47it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=16.079, player_2/loss=76.783, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 356.72it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=12.040, player_2/loss=77.617, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 355.09it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=12.239, player_2/loss=80.058, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 357.09it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=23.515, player_2/loss=81.983, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 358.01it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=34.840, player_2/loss=124.840, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 357.73it/s, env_step=1024, len=9, n/ep=8, n/st=64, player_1/loss=12.791, player_2/loss=39.143, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.24it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_2/loss=107.934, rew=-25.00]        


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 356.23it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=20.058, player_2/loss=127.851, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 358.20it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=27.150, player_2/loss=55.456, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 359.09it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=29.631, player_2/loss=56.278, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 356.33it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=22.290, player_2/loss=66.674, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 358.07it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=24.528, player_2/loss=90.644, rew=-16.67]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 360.11it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=39.527, player_2/loss=85.601, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 355.31it/s, env_step=9216, len=7, n/ep=10, n/st=64, player_1/loss=45.282, player_2/loss=81.603, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 357.61it/s, env_step=10240, len=21, n/ep=4, n/st=64, player_1/loss=49.252, player_2/loss=86.700, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 356.27it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=89.904, player_2/loss=138.063, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 357.49it/s, env_step=12288, len=10, n/ep=7, n/st=64, player_1/loss=179.848, player_2/loss=178.372, rew=-10.71]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 361.13it/s, env_step=13312, len=12, n/ep=4, n/st=64, player_1/loss=241.825, player_2/loss=119.236, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 349.04it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=212.403, player_2/loss=128.262, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 370.65it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=278.691, player_2/loss=104.105, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 359.92it/s, env_step=16384, len=16, n/ep=3, n/st=64, player_1/loss=299.224, player_2/loss=83.255, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 359.26it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=264.444, player_2/loss=54.802, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 355.18it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=251.446, player_2/loss=64.919, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:02, 357.14it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=227.050, player_2/loss=52.273, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 354.85it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=145.798, player_2/loss=30.926, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 356.79it/s, env_step=2048, len=35, n/ep=2, n/st=64, player_1/loss=170.258, player_2/loss=81.370, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.39it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=155.772, player_2/loss=147.877, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 356.91it/s, env_step=4096, len=25, n/ep=2, n/st=64, player_1/loss=89.217, player_2/loss=132.782, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 357.90it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=58.401, player_2/loss=135.266, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 359.00it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=92.050, player_2/loss=148.481, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 357.38it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=83.386, player_2/loss=164.610, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 358.21it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=68.024, player_2/loss=146.787, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 353.36it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=52.513, player_2/loss=201.046, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 355.72it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=71.440, player_2/loss=300.322, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 356.97it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=81.239, player_2/loss=335.044, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 355.14it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=45.731, player_2/loss=330.021, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 353.87it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=42.738, player_2/loss=287.688, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 355.66it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=33.807, player_2/loss=291.560, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 353.16it/s, env_step=15360, len=10, n/ep=7, n/st=64, player_1/loss=14.216, player_2/loss=286.417, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 353.45it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=12.383, player_2/loss=289.686, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 355.30it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=50.501, player_2/loss=324.485, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 355.46it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=55.948, player_2/loss=350.993, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 354.07it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=21.934, player_2/loss=368.641, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 357.23it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=49.606, player_2/loss=253.180, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 356.81it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=71.272, player_2/loss=197.582, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 356.49it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=66.605, player_2/loss=168.669, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 359.02it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=44.619, player_2/loss=139.905, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 358.85it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=160.483, player_2/loss=157.040, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 355.82it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=246.345, player_2/loss=132.617, rew=16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 357.68it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=194.374, player_2/loss=94.740, rew=5.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 357.21it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=204.498, player_2/loss=118.605, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 356.24it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=128.090, player_2/loss=185.863, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 359.00it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=79.587, player_2/loss=170.587, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 356.02it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=71.778, player_2/loss=148.645, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 358.31it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=84.483, player_2/loss=138.003, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 358.24it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=90.065, player_2/loss=142.766, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 355.81it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=53.178, player_2/loss=107.596, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 357.61it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=90.187, player_2/loss=93.792, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 358.83it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=90.559, player_2/loss=102.591, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 358.21it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=97.741, player_2/loss=151.200, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 361.79it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=130.250, player_2/loss=181.729, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #18


Epoch #19: 1025it [00:02, 361.43it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=156.859, player_2/loss=146.521, rew=15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #18


Epoch #1: 1025it [00:02, 355.20it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=146.571, player_2/loss=135.834, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 358.02it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=135.982, player_2/loss=139.222, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 357.97it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=106.443, player_2/loss=104.135, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 355.25it/s, env_step=4096, len=21, n/ep=4, n/st=64, player_1/loss=46.719, player_2/loss=66.193, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 357.35it/s, env_step=5120, len=24, n/ep=2, n/st=64, player_1/loss=28.845, player_2/loss=83.412, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 345.81it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=36.199, player_2/loss=99.744, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 355.64it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=81.524, player_2/loss=87.992, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 359.58it/s, env_step=8192, len=23, n/ep=2, n/st=64, player_1/loss=69.716, player_2/loss=74.972, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 357.59it/s, env_step=9216, len=19, n/ep=2, n/st=64, player_1/loss=63.251, player_2/loss=68.702, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 358.72it/s, env_step=10240, len=18, n/ep=5, n/st=64, player_1/loss=65.024, player_2/loss=88.755, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 354.84it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=54.064, player_2/loss=143.960, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 356.43it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=64.438, player_2/loss=224.426, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 353.84it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=57.391, player_2/loss=230.668, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 354.67it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=46.411, player_2/loss=204.505, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 351.17it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=84.239, player_2/loss=186.422, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 353.08it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=27.598, player_2/loss=194.259, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 356.96it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=26.250, player_2/loss=183.563, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 353.31it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=46.262, player_2/loss=211.521, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 356.26it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=61.244, player_2/loss=202.383, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 357.34it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=40.236, player_2/loss=136.831, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.70it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=44.176, player_2/loss=118.465, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 355.53it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=56.821, player_2/loss=95.880, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 357.69it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=213.232, player_2/loss=224.149, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 354.95it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=431.029, player_2/loss=302.650, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 355.68it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=447.333, player_2/loss=261.468, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 354.47it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=451.368, player_2/loss=158.007, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 357.05it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=387.298, player_2/loss=128.147, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 355.08it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=345.511, player_2/loss=114.306, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 353.42it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=458.459, player_2/loss=69.456, rew=10.71]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 355.27it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=551.152, player_2/loss=63.720, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 358.76it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=497.314, player_2/loss=83.494, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 353.32it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=422.422, player_2/loss=114.093, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 356.83it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=370.537, player_2/loss=120.599, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 357.93it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=357.438, player_2/loss=113.111, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 354.25it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=433.093, player_2/loss=77.612, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 355.13it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=452.625, player_2/loss=85.564, rew=18.75]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 357.98it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=449.826, player_2/loss=60.390, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 353.36it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=461.893, player_2/loss=89.032, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 353.14it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=328.542, player_2/loss=76.772, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.64it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=303.202, player_2/loss=50.716, rew=-18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.56it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=230.382, player_2/loss=42.918, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 358.09it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=144.556, player_2/loss=56.979, rew=-10.71]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 355.38it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=117.067, player_2/loss=40.936, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 358.10it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=105.641, player_2/loss=69.327, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 356.49it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=174.905, player_2/loss=93.580, rew=-18.75]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 355.13it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=275.138, player_2/loss=242.818, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 355.04it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=208.313, player_2/loss=398.924, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 355.07it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=96.503, player_2/loss=589.524, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 357.26it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=117.289, player_2/loss=636.503, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 356.67it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=154.114, player_2/loss=554.318, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 354.98it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=135.843, player_2/loss=539.149, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 357.64it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=153.171, player_2/loss=613.781, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 356.22it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=123.847, player_2/loss=511.299, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 353.53it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=96.935, player_2/loss=568.316, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 356.99it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=137.047, player_2/loss=571.754, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 355.85it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=98.482, player_2/loss=580.225, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 354.99it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=70.266, player_2/loss=588.879, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 358.50it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=76.203, player_2/loss=440.132, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.93it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=46.970, player_2/loss=348.837, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 358.79it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=75.938, player_2/loss=273.449, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 357.46it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=79.603, player_2/loss=233.671, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 357.77it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=39.511, player_2/loss=228.578, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 357.12it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=52.191, player_2/loss=193.796, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 355.65it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=94.613, player_2/loss=198.220, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 354.34it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=212.133, player_2/loss=191.256, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 357.19it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=278.418, player_2/loss=124.166, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 357.74it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=266.054, player_2/loss=51.138, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 355.84it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=251.370, player_2/loss=40.437, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 359.27it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=235.801, player_2/loss=63.365, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 357.79it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=202.372, player_2/loss=69.650, rew=-5.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 355.86it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=89.508, player_2/loss=85.708, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 358.98it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=99.040, player_2/loss=103.133, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 359.86it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=129.006, player_2/loss=133.507, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 357.35it/s, env_step=17408, len=24, n/ep=3, n/st=64, player_1/loss=162.491, player_2/loss=169.581, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 358.80it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_2/loss=150.878, rew=-25.00]      


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 356.88it/s, env_step=19456, len=19, n/ep=3, n/st=64, player_1/loss=128.999, player_2/loss=120.019, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 354.59it/s, env_step=1024, len=18, n/ep=3, n/st=64, player_1/loss=34.810, player_2/loss=33.039, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 358.53it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=37.541, player_2/loss=97.157, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.55it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=29.050, player_2/loss=126.127, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 355.71it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=18.634, player_2/loss=164.409, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 362.52it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=12.443, player_2/loss=192.812, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 367.54it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=8.340, player_2/loss=182.069, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 368.85it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=12.078, player_2/loss=152.944, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 357.90it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=11.461, player_2/loss=147.907, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 355.44it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=25.271, player_2/loss=137.372, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 355.28it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=32.026, player_2/loss=136.848, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 354.57it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=23.113, player_2/loss=120.386, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 357.57it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=15.880, player_2/loss=137.393, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 355.86it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=4.451, player_2/loss=127.165, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 353.07it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=3.416, player_2/loss=148.130, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 357.38it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=4.485, player_2/loss=172.849, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 352.57it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=5.388, player_2/loss=165.806, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 356.56it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=7.076, player_2/loss=126.135, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 356.58it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=5.967, rew=25.00]         


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 355.08it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=38.000, player_2/loss=162.937, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 357.29it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=21.011, player_2/loss=126.930, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 357.95it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=41.983, player_2/loss=103.930, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 356.66it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=57.716, player_2/loss=101.359, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 358.50it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=64.437, player_2/loss=97.733, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 356.61it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=44.208, player_2/loss=95.182, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 358.50it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=15.808, player_2/loss=68.179, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 357.28it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=15.342, player_2/loss=64.163, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 358.02it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=17.133, player_2/loss=66.523, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 356.30it/s, env_step=9216, len=24, n/ep=2, n/st=64, player_1/loss=56.161, player_2/loss=43.665, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 356.72it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=83.678, player_2/loss=47.629, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 356.66it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=114.566, player_2/loss=131.852, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 356.74it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=105.398, player_2/loss=198.439, rew=-13.89]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 355.50it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=98.845, player_2/loss=209.417, rew=-19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 354.54it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=140.530, player_2/loss=214.624, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 358.57it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=217.996, player_2/loss=172.012, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 357.15it/s, env_step=16384, len=15, n/ep=3, n/st=64, player_1/loss=169.009, player_2/loss=114.423, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 357.98it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=102.648, player_2/loss=68.334, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 355.93it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=120.039, player_2/loss=82.664, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 356.87it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=127.297, player_2/loss=94.462, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 356.41it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=120.469, player_2/loss=108.888, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 355.24it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=106.675, player_2/loss=143.056, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 353.85it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=78.928, player_2/loss=159.281, rew=15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 356.26it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=34.495, player_2/loss=175.899, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 354.19it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=48.293, player_2/loss=157.571, rew=15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 355.79it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=55.926, player_2/loss=119.866, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 357.37it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=40.854, player_2/loss=110.660, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 351.62it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=91.458, player_2/loss=146.943, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 363.00it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=95.466, player_2/loss=191.043, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 352.81it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=59.071, player_2/loss=125.056, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 357.67it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=77.151, player_2/loss=124.323, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 356.72it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=138.843, player_2/loss=115.620, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 354.84it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=112.221, player_2/loss=73.998, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 355.93it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=88.833, player_2/loss=71.815, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 357.47it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=79.316, player_2/loss=66.035, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 353.19it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=54.521, player_2/loss=110.040, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 357.47it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=59.445, player_2/loss=105.453, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 354.45it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_2/loss=76.396, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 358.46it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=43.222, player_2/loss=80.244, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 355.55it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=16.635, player_2/loss=68.757, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 356.21it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=49.809, player_2/loss=63.983, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 358.40it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_2/loss=111.045, rew=0.00]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 359.19it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=150.848, player_2/loss=148.410, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 354.89it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=141.715, player_2/loss=123.406, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 357.72it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=104.336, player_2/loss=99.670, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 356.14it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=92.082, player_2/loss=102.070, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 354.39it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=102.369, player_2/loss=104.933, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 354.54it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=104.697, player_2/loss=77.824, rew=12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 354.28it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_2/loss=87.216, rew=-25.00]       


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 359.65it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=184.671, player_2/loss=124.918, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 354.71it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=149.547, player_2/loss=110.476, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 354.26it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=122.120, player_2/loss=91.613, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 359.05it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=122.110, player_2/loss=94.594, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 358.38it/s, env_step=15360, len=24, n/ep=2, n/st=64, player_1/loss=127.145, player_2/loss=122.043, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 355.57it/s, env_step=16384, len=27, n/ep=2, n/st=64, player_1/loss=113.533, player_2/loss=141.459, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 357.78it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=103.256, player_2/loss=79.780, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 355.83it/s, env_step=18432, len=22, n/ep=3, n/st=64, player_1/loss=90.386, player_2/loss=67.894, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 359.00it/s, env_step=19456, len=24, n/ep=2, n/st=64, player_1/loss=93.777, player_2/loss=40.147, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 356.99it/s, env_step=1024, len=25, n/ep=2, n/st=64, player_1/loss=146.457, player_2/loss=121.835, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.11it/s, env_step=2048, len=25, n/ep=2, n/st=64, player_1/loss=108.314, player_2/loss=107.510, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 355.17it/s, env_step=3072, len=13, n/ep=6, n/st=64, player_1/loss=97.504, player_2/loss=92.480, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 342.95it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=105.220, player_2/loss=85.429, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 354.22it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=56.701, player_2/loss=93.313, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 358.12it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=45.460, player_2/loss=79.828, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 355.86it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=33.398, player_2/loss=82.063, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 357.09it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=20.142, player_2/loss=71.742, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 356.99it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=8.355, player_2/loss=71.409, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 355.73it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=12.710, player_2/loss=70.257, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 354.27it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=12.838, player_2/loss=72.840, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 363.77it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=15.726, player_2/loss=97.496, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 356.90it/s, env_step=13312, len=15, n/ep=3, n/st=64, player_1/loss=28.524, player_2/loss=92.480, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 354.48it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=53.021, player_2/loss=102.726, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 354.05it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=49.400, player_2/loss=95.218, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 355.36it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=47.269, player_2/loss=133.331, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 355.31it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=108.224, player_2/loss=140.960, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 356.31it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=85.815, player_2/loss=114.247, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 355.34it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=39.827, player_2/loss=117.622, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 355.32it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=100.335, player_2/loss=155.561, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.96it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=256.411, player_2/loss=227.769, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 356.99it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=392.719, player_2/loss=262.006, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.03it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=456.017, player_2/loss=217.034, rew=10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 355.89it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=449.338, player_2/loss=189.759, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 357.73it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=424.814, player_2/loss=156.420, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 354.53it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=538.006, player_2/loss=123.033, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 355.73it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=514.815, player_2/loss=83.327, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 352.76it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=438.206, player_2/loss=63.234, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 354.96it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=388.800, player_2/loss=31.602, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 355.36it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=501.644, player_2/loss=20.012, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 354.17it/s, env_step=12288, len=10, n/ep=7, n/st=64, player_1/loss=497.158, player_2/loss=44.233, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 354.00it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=442.635, player_2/loss=78.221, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 354.99it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=556.907, player_2/loss=60.339, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 353.86it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=570.079, player_2/loss=45.533, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 356.98it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=531.751, player_2/loss=56.305, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 354.97it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=506.810, player_2/loss=35.146, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 356.10it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=509.039, player_2/loss=20.337, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 354.99it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=472.128, player_2/loss=14.656, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 355.27it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=419.544, player_2/loss=50.185, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 357.25it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=247.124, player_2/loss=274.565, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 353.81it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=53.659, player_2/loss=460.736, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 355.05it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=22.661, player_2/loss=556.552, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 354.69it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=48.377, player_2/loss=584.592, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 356.25it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=85.038, player_2/loss=470.856, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 355.71it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=57.062, player_2/loss=408.440, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 355.97it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=47.118, player_2/loss=397.736, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 355.31it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=77.915, player_2/loss=351.137, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 355.29it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=61.207, player_2/loss=390.344, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 356.39it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=9.980, player_2/loss=473.932, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 357.33it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=64.549, player_2/loss=480.862, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 355.45it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=70.081, player_2/loss=546.368, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 349.60it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=8.193, player_2/loss=444.552, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 367.52it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=12.893, player_2/loss=443.819, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 353.95it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=16.342, player_2/loss=506.339, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 358.03it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=8.568, player_2/loss=597.526, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 355.90it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=5.513, player_2/loss=597.204, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 355.22it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=4.467, player_2/loss=547.907, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 357.70it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=168.743, player_2/loss=106.525, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 358.15it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=158.260, player_2/loss=123.235, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 349.03it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=146.993, player_2/loss=89.021, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 353.90it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=180.912, player_2/loss=31.174, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 356.25it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=234.260, player_2/loss=11.258, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 358.79it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=255.918, player_2/loss=17.121, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 358.89it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=320.408, player_2/loss=17.583, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 360.02it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=238.045, player_2/loss=29.935, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 358.14it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=156.251, player_2/loss=53.789, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 355.01it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=188.950, player_2/loss=61.138, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 358.01it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=254.366, player_2/loss=36.755, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 357.71it/s, env_step=12288, len=15, n/ep=5, n/st=64, player_1/loss=321.696, player_2/loss=54.187, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 354.86it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=333.094, player_2/loss=61.842, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 358.02it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=260.338, player_2/loss=28.854, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 357.11it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=193.457, player_2/loss=6.839, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 358.56it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=151.350, player_2/loss=73.214, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 357.64it/s, env_step=17408, len=16, n/ep=3, n/st=64, player_1/loss=132.072, player_2/loss=132.064, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 353.33it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=177.657, player_2/loss=112.136, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 356.55it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=292.516, player_2/loss=67.597, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 357.19it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=203.708, player_2/loss=109.194, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.01it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=152.589, player_2/loss=68.150, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 356.86it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=124.921, player_2/loss=55.572, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 357.03it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=120.032, player_2/loss=227.122, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 358.00it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=71.094, player_2/loss=302.179, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 357.06it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=41.030, player_2/loss=182.535, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 355.92it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=67.916, player_2/loss=198.679, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 357.94it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=66.629, player_2/loss=182.870, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 358.30it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=67.961, player_2/loss=180.562, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 355.56it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=137.716, player_2/loss=121.053, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 355.81it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=132.164, player_2/loss=140.105, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 357.12it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=84.266, player_2/loss=213.153, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 357.06it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=45.814, player_2/loss=210.872, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 354.62it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=41.201, player_2/loss=186.797, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 354.76it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=26.292, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 353.75it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=23.815, player_2/loss=295.570, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 353.53it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=29.716, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 350.09it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=35.683, player_2/loss=295.774, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 354.97it/s, env_step=19456, len=12, n/ep=4, n/st=64, player_1/loss=19.215, player_2/loss=211.972, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 355.98it/s, env_step=1024, len=25, n/ep=3, n/st=64, player_1/loss=60.213, player_2/loss=157.736, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.13it/s, env_step=2048, len=21, n/ep=4, n/st=64, player_1/loss=117.710, player_2/loss=152.229, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.02it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=144.099, player_2/loss=108.546, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 353.92it/s, env_step=4096, len=25, n/ep=3, n/st=64, player_1/loss=150.522, player_2/loss=60.875, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 356.05it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=194.118, player_2/loss=76.865, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 357.10it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=97.334, player_2/loss=149.423, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 354.56it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=185.929, player_2/loss=156.821, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 356.84it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=329.427, player_2/loss=90.976, rew=10.71]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 354.93it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=309.338, player_2/loss=96.571, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 356.87it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=301.534, player_2/loss=121.537, rew=17.86]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 355.16it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=318.606, player_2/loss=154.987, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 352.63it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=366.013, player_2/loss=143.572, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 356.45it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=360.446, player_2/loss=62.604, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 353.92it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=284.508, player_2/loss=66.905, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 354.31it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=274.252, player_2/loss=83.789, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 353.90it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=286.586, player_2/loss=73.112, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 352.59it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=262.826, player_2/loss=50.676, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 355.81it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=259.436, player_2/loss=36.613, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 352.18it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=294.874, player_2/loss=50.501, rew=17.86]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 354.11it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=131.042, player_2/loss=549.110, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.65it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=115.398, player_2/loss=579.547, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 354.27it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=91.023, player_2/loss=510.666, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 354.54it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=110.155, player_2/loss=469.912, rew=13.89]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 353.99it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=119.496, player_2/loss=511.670, rew=13.89]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 355.32it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=93.620, player_2/loss=533.439, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 353.46it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=81.587, player_2/loss=547.031, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 354.54it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=84.715, player_2/loss=558.401, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 354.73it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=110.451, player_2/loss=536.947, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 353.48it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=106.345, player_2/loss=569.047, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 353.82it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=83.545, player_2/loss=595.583, rew=13.89]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 354.66it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=83.606, player_2/loss=571.469, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 354.51it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=49.551, player_2/loss=552.585, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 354.66it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=39.077, player_2/loss=499.590, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 353.24it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=26.813, player_2/loss=515.467, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 354.35it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=32.449, player_2/loss=594.550, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 352.28it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=37.107, player_2/loss=571.365, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 354.89it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=15.437, player_2/loss=559.836, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 354.82it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=12.702, player_2/loss=587.970, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 350.49it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=110.750, player_2/loss=296.736, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 358.00it/s, env_step=2048, len=12, n/ep=6, n/st=64, player_1/loss=263.931, player_2/loss=186.814, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 370.21it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=392.581, player_2/loss=92.762, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 366.78it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=315.356, player_2/loss=65.413, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 356.19it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=403.758, player_2/loss=46.780, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 355.60it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=404.852, player_2/loss=43.711, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 356.80it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=383.188, player_2/loss=44.951, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 354.95it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=340.767, player_2/loss=96.915, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 356.15it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=315.095, player_2/loss=131.212, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 359.02it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=371.797, player_2/loss=78.674, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 351.06it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=330.234, player_2/loss=47.422, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 355.46it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=293.711, player_2/loss=16.154, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 356.36it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=317.849, rew=25.00]       


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 352.67it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=384.778, player_2/loss=39.953, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 356.93it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=373.983, player_2/loss=61.778, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 354.66it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=264.095, player_2/loss=98.141, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 357.06it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=221.153, player_2/loss=114.081, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 356.60it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=283.470, player_2/loss=131.046, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 355.39it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=275.692, player_2/loss=101.776, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 352.65it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=222.442, player_2/loss=157.388, rew=-5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 353.51it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=112.211, player_2/loss=228.678, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 353.67it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=47.956, player_2/loss=379.061, rew=15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 356.96it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=75.052, player_2/loss=493.449, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 352.81it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=71.582, player_2/loss=523.377, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 354.17it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=70.692, player_2/loss=426.008, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 353.95it/s, env_step=7168, len=17, n/ep=5, n/st=64, player_1/loss=43.563, player_2/loss=296.259, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 356.58it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_2/loss=204.817, rew=25.00]         


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 355.57it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=91.715, player_2/loss=176.189, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 352.63it/s, env_step=10240, len=25, n/ep=2, n/st=64, player_1/loss=72.389, player_2/loss=137.136, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 353.16it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=73.692, player_2/loss=202.994, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 352.37it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=75.670, player_2/loss=191.495, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 356.19it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=44.758, player_2/loss=154.143, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 355.38it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=110.133, player_2/loss=185.392, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 353.52it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=93.194, player_2/loss=186.045, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 354.88it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=60.190, player_2/loss=179.977, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 351.95it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=62.923, player_2/loss=150.097, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 355.41it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=72.809, player_2/loss=130.902, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 355.73it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=49.200, player_2/loss=142.029, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 355.14it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=31.013, player_2/loss=195.414, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 357.12it/s, env_step=2048, len=30, n/ep=2, n/st=64, player_1/loss=31.562, player_2/loss=150.343, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 356.59it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=35.794, player_2/loss=136.327, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 353.67it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=38.490, player_2/loss=97.383, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 360.89it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=137.628, player_2/loss=98.553, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 367.33it/s, env_step=6144, len=16, n/ep=3, n/st=64, player_1/loss=242.416, player_2/loss=101.493, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 361.37it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=246.323, player_2/loss=100.791, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 355.90it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=129.324, player_2/loss=90.580, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 357.27it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=34.404, player_2/loss=86.227, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 354.45it/s, env_step=10240, len=19, n/ep=4, n/st=64, player_1/loss=46.035, player_2/loss=92.691, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 356.64it/s, env_step=11264, len=21, n/ep=4, n/st=64, player_1/loss=58.582, player_2/loss=102.542, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 357.42it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=55.962, player_2/loss=108.685, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 359.05it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=62.858, player_2/loss=118.857, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 357.12it/s, env_step=14336, len=15, n/ep=3, n/st=64, player_1/loss=36.007, player_2/loss=77.416, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 356.98it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=26.177, player_2/loss=74.391, rew=-12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 357.56it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=61.755, player_2/loss=79.367, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 351.58it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=178.087, player_2/loss=139.076, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 355.50it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=247.585, player_2/loss=107.047, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 354.04it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=265.524, player_2/loss=98.295, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 353.49it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=259.309, player_2/loss=97.302, rew=-6.25]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.00it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=211.112, player_2/loss=90.179, rew=-17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 355.76it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=261.386, player_2/loss=159.742, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 355.38it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=261.951, player_2/loss=367.234, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 354.76it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=168.621, player_2/loss=455.364, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 354.35it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=111.846, player_2/loss=398.543, rew=13.89]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 352.10it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=95.765, player_2/loss=465.950, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 354.98it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=72.071, player_2/loss=464.799, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 356.15it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=21.229, player_2/loss=495.809, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 353.36it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=37.995, player_2/loss=489.681, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 356.72it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=55.628, player_2/loss=498.140, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 352.91it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=88.432, player_2/loss=428.527, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 353.50it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=94.354, player_2/loss=399.698, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 353.48it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=43.014, player_2/loss=453.446, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 355.15it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=73.421, player_2/loss=470.043, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 355.18it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=81.251, player_2/loss=392.509, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 354.63it/s, env_step=17408, len=7, n/ep=10, n/st=64, player_1/loss=46.348, player_2/loss=360.993, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 355.14it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=59.987, player_2/loss=412.464, rew=19.44]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 354.70it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=53.694, player_2/loss=401.139, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 354.10it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=78.296, player_2/loss=414.875, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.37it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=111.138, player_2/loss=367.223, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 354.60it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=84.067, player_2/loss=298.388, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 357.86it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=81.037, player_2/loss=253.335, rew=-17.86]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 355.83it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=102.261, player_2/loss=209.548, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 356.32it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=96.314, player_2/loss=140.029, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 352.37it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=182.457, player_2/loss=117.367, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 361.57it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=228.739, player_2/loss=139.645, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 370.61it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=197.289, player_2/loss=146.695, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 356.75it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=195.474, player_2/loss=179.032, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 355.04it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=210.020, player_2/loss=206.510, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 355.34it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=194.684, player_2/loss=205.129, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 357.12it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=197.813, player_2/loss=192.078, rew=-13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 354.35it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=278.994, player_2/loss=132.182, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 354.61it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=308.369, player_2/loss=87.781, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 353.46it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=273.413, player_2/loss=44.677, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 354.70it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=208.449, player_2/loss=25.109, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 355.05it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=197.970, player_2/loss=46.538, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 352.94it/s, env_step=19456, len=13, n/ep=4, n/st=64, player_1/loss=227.862, player_2/loss=47.318, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 356.47it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=208.410, player_2/loss=82.036, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 360.22it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=170.356, player_2/loss=98.440, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.78it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=111.701, player_2/loss=74.211, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 348.88it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=93.138, player_2/loss=44.635, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 355.71it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=157.609, player_2/loss=228.311, rew=5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 353.96it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=159.558, player_2/loss=468.332, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 356.68it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=127.556, player_2/loss=583.637, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 353.73it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=122.870, player_2/loss=471.581, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 355.56it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=98.279, player_2/loss=421.238, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 351.53it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=93.673, player_2/loss=486.105, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 356.31it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=98.145, player_2/loss=418.746, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 355.68it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=136.343, player_2/loss=289.408, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 354.29it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=108.272, rew=16.67]       


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 357.15it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=69.922, player_2/loss=699.280, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 353.52it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=82.460, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 358.43it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=89.088, player_2/loss=464.795, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 354.55it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=111.711, player_2/loss=366.277, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 354.26it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=115.536, rew=0.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 355.48it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=101.284, player_2/loss=471.520, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 356.73it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=21.008, player_2/loss=431.705, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.00it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=51.851, player_2/loss=350.208, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 355.06it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=56.112, player_2/loss=267.776, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 356.12it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=35.971, player_2/loss=229.011, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 356.26it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=36.827, player_2/loss=225.999, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 354.28it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=40.086, player_2/loss=213.480, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 354.88it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=60.717, player_2/loss=183.991, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 359.08it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=115.890, player_2/loss=180.442, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 356.08it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=142.445, player_2/loss=151.392, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 351.68it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=206.433, player_2/loss=133.371, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 357.77it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=219.589, player_2/loss=107.543, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 367.32it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=137.466, player_2/loss=136.337, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 364.91it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_2/loss=156.300, rew=16.67]       


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #14: 1025it [00:02, 356.39it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=142.159, player_2/loss=131.662, rew=-5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #15: 1025it [00:02, 352.44it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=151.547, player_2/loss=92.372, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #16: 1025it [00:02, 354.45it/s, env_step=16384, len=29, n/ep=2, n/st=64, player_1/loss=153.296, player_2/loss=82.505, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #17: 1025it [00:02, 354.85it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=161.118, player_2/loss=94.768, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #18: 1025it [00:02, 348.02it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=180.497, player_2/loss=114.228, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #19: 1025it [00:02, 351.68it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=220.673, player_2/loss=78.026, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #1: 1025it [00:02, 354.83it/s, env_step=1024, len=21, n/ep=2, n/st=64, player_1/loss=165.439, player_2/loss=61.444, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.52it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=122.664, player_2/loss=79.410, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 354.68it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=110.377, player_2/loss=107.897, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 357.23it/s, env_step=4096, len=26, n/ep=3, n/st=64, player_1/loss=118.916, player_2/loss=127.041, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 355.49it/s, env_step=5120, len=27, n/ep=3, n/st=64, player_1/loss=122.753, player_2/loss=143.741, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 356.66it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=101.882, player_2/loss=137.520, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 357.02it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=66.393, player_2/loss=129.567, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 353.70it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=44.375, player_2/loss=169.235, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 357.48it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=40.374, player_2/loss=177.499, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 355.79it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=45.353, player_2/loss=166.697, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 353.91it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=30.870, player_2/loss=153.821, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 356.24it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=20.017, player_2/loss=148.793, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 354.87it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=14.723, player_2/loss=140.727, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 355.54it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=20.477, player_2/loss=176.132, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 354.84it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=33.116, player_2/loss=200.290, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 357.20it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=14.371, player_2/loss=200.597, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 352.58it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=63.076, player_2/loss=228.048, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 353.93it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=58.489, player_2/loss=221.289, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 356.63it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=11.630, player_2/loss=210.492, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 354.35it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=22.963, player_2/loss=177.314, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 356.44it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=58.105, player_2/loss=172.572, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 357.33it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=112.103, player_2/loss=157.326, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 356.85it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=112.212, player_2/loss=141.488, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 355.57it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=104.590, player_2/loss=130.099, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 355.15it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=82.842, player_2/loss=109.694, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 356.18it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=58.830, player_2/loss=104.933, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 356.88it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=57.812, player_2/loss=108.022, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 357.34it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=54.641, player_2/loss=83.667, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 356.77it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=42.105, player_2/loss=79.308, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 353.43it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=60.621, player_2/loss=83.733, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 357.84it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=90.267, player_2/loss=112.170, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #13: 1025it [00:02, 350.63it/s, env_step=13312, len=16, n/ep=3, n/st=64, player_1/loss=104.843, player_2/loss=138.796, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #14: 1025it [00:02, 349.81it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=93.014, player_2/loss=101.765, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #15: 1025it [00:02, 356.54it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=112.762, player_2/loss=68.717, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #16: 1025it [00:02, 356.06it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=141.246, player_2/loss=59.402, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #17: 1025it [00:02, 355.77it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=146.340, player_2/loss=42.372, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #18: 1025it [00:02, 356.55it/s, env_step=18432, len=16, n/ep=3, n/st=64, player_2/loss=57.177, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #19: 1025it [00:02, 353.42it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=196.732, player_2/loss=104.782, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #1: 1025it [00:02, 354.82it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=188.887, player_2/loss=59.825, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.70it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=165.389, player_2/loss=114.887, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.29it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=75.083, player_2/loss=173.248, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 354.13it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=78.389, player_2/loss=238.471, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 354.15it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=59.488, player_2/loss=266.119, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 354.30it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=85.085, player_2/loss=266.808, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 353.14it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=70.048, player_2/loss=253.810, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 346.85it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=20.387, player_2/loss=316.847, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 346.27it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=25.669, player_2/loss=305.730, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 352.25it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=26.952, player_2/loss=276.651, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 354.00it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=33.025, rew=19.44]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 350.14it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=38.309, player_2/loss=280.281, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 352.98it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=16.140, player_2/loss=272.033, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 352.42it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=17.054, player_2/loss=272.367, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 349.64it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=10.802, player_2/loss=285.714, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 338.74it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=8.660, player_2/loss=279.349, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 346.04it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=6.724, player_2/loss=267.226, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 349.86it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=1.679, player_2/loss=286.628, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 349.29it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=5.817, player_2/loss=329.060, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 353.36it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=9.817, player_2/loss=218.403, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.73it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=47.681, player_2/loss=215.422, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.72it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=145.367, player_2/loss=164.896, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 356.05it/s, env_step=4096, len=24, n/ep=3, n/st=64, player_1/loss=155.720, player_2/loss=134.727, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 352.96it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_2/loss=162.091, rew=0.00]          


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 355.00it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=172.662, rew=-25.00]        


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 352.40it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=178.806, player_2/loss=153.372, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 354.71it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=166.151, player_2/loss=107.590, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 351.62it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=174.716, player_2/loss=95.051, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 353.40it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=259.636, player_2/loss=120.896, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 347.46it/s, env_step=11264, len=20, n/ep=4, n/st=64, player_1/loss=236.940, player_2/loss=137.420, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 340.84it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=141.607, player_2/loss=165.684, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 354.54it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=180.826, player_2/loss=131.409, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 351.77it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=219.824, player_2/loss=104.197, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 354.51it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=311.687, player_2/loss=129.645, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 350.87it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=409.952, player_2/loss=128.875, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 357.50it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=427.247, player_2/loss=111.423, rew=-3.57]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 355.67it/s, env_step=18432, len=9, n/ep=8, n/st=64, player_1/loss=327.294, player_2/loss=126.070, rew=6.25]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 353.10it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=405.480, player_2/loss=125.015, rew=10.71]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 352.23it/s, env_step=1024, len=9, n/ep=8, n/st=64, player_1/loss=269.621, player_2/loss=324.912, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.01it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_2/loss=325.937, rew=25.00]         


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.84it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=86.669, player_2/loss=343.646, rew=17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 350.11it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=85.252, player_2/loss=354.953, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 349.50it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=88.840, player_2/loss=339.650, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 354.27it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=56.959, player_2/loss=310.827, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.66it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=21.487, player_2/loss=354.889, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 352.12it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=17.123, player_2/loss=433.049, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 350.41it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=42.004, player_2/loss=407.145, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 354.21it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=51.528, player_2/loss=328.122, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 352.67it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=38.078, player_2/loss=309.369, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 350.34it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=26.915, player_2/loss=324.259, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 351.18it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=8.240, player_2/loss=343.127, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 351.75it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=5.945, player_2/loss=348.641, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 351.41it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=5.905, player_2/loss=331.507, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 352.72it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=18.432, player_2/loss=322.515, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 348.21it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=20.487, player_2/loss=329.765, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 349.64it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=5.396, player_2/loss=315.139, rew=16.67]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 351.13it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=5.324, player_2/loss=350.754, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 351.55it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=28.956, player_2/loss=219.774, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 352.53it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=153.182, player_2/loss=139.276, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 357.23it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=246.773, player_2/loss=120.353, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 356.51it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=324.835, player_2/loss=101.132, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 353.47it/s, env_step=5120, len=10, n/ep=5, n/st=64, player_1/loss=339.571, player_2/loss=27.593, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 351.01it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=259.664, player_2/loss=28.134, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 352.00it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=226.495, player_2/loss=22.796, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 355.86it/s, env_step=8192, len=12, n/ep=4, n/st=64, player_1/loss=254.488, player_2/loss=27.674, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 352.90it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=230.962, player_2/loss=82.752, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 355.59it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_2/loss=81.286, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 356.61it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=225.111, player_2/loss=24.507, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 353.39it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=269.580, player_2/loss=22.905, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 357.53it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=245.005, player_2/loss=18.920, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 353.97it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=251.718, player_2/loss=19.351, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 356.90it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=204.206, player_2/loss=22.749, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 353.32it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=201.074, player_2/loss=29.965, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 354.30it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=211.868, player_2/loss=37.541, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 356.50it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=234.792, player_2/loss=27.903, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 343.30it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=238.738, player_2/loss=20.584, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 364.29it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=159.488, player_2/loss=139.133, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 363.01it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=119.293, player_2/loss=252.440, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 353.07it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=109.831, player_2/loss=295.765, rew=16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 352.56it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_2/loss=413.538, rew=13.89]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 355.32it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=123.530, player_2/loss=459.785, rew=13.89]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 353.44it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=136.082, player_2/loss=394.697, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 353.10it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=102.459, player_2/loss=429.738, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 354.57it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=27.182, player_2/loss=471.009, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 353.43it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=32.551, player_2/loss=462.048, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 351.72it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=98.678, player_2/loss=435.209, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 351.61it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=117.055, player_2/loss=418.230, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 354.03it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=102.054, player_2/loss=439.933, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 353.08it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=94.001, player_2/loss=406.310, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 353.56it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=76.884, player_2/loss=398.984, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 354.17it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=57.608, player_2/loss=309.193, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 352.21it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=52.875, player_2/loss=315.539, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 354.56it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=32.784, player_2/loss=412.786, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 353.14it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=48.501, player_2/loss=367.994, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 354.16it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=41.408, player_2/loss=296.674, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 353.64it/s, env_step=1024, len=9, n/ep=9, n/st=64, player_1/loss=179.276, player_2/loss=206.175, rew=-13.89]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.67it/s, env_step=2048, len=20, n/ep=4, n/st=64, player_1/loss=160.598, player_2/loss=205.640, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 356.64it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=252.701, player_2/loss=162.940, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 354.95it/s, env_step=4096, len=28, n/ep=2, n/st=64, player_1/loss=250.689, player_2/loss=131.548, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 355.36it/s, env_step=5120, len=19, n/ep=4, n/st=64, player_1/loss=170.551, player_2/loss=102.671, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 352.91it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=199.346, player_2/loss=97.056, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 355.07it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=226.771, player_2/loss=115.530, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 355.67it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=242.628, player_2/loss=81.879, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 354.21it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=208.922, player_2/loss=51.375, rew=-5.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 354.45it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=171.334, player_2/loss=117.171, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 352.08it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=156.937, player_2/loss=153.017, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 354.35it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=143.653, player_2/loss=132.079, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 354.14it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=131.510, player_2/loss=116.067, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 354.78it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=157.193, player_2/loss=115.917, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 353.52it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=246.091, player_2/loss=64.437, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 355.38it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=248.591, player_2/loss=97.508, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 352.37it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=181.584, rew=-15.00]      


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 355.08it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=200.702, player_2/loss=126.689, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 357.19it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=195.756, player_2/loss=131.286, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 351.23it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=178.863, player_2/loss=151.806, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 353.85it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=137.007, player_2/loss=132.322, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.84it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=97.203, player_2/loss=135.824, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 364.69it/s, env_step=4096, len=16, n/ep=3, n/st=64, player_1/loss=80.769, player_2/loss=136.701, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 354.13it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=112.548, player_2/loss=128.615, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 352.89it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=85.388, player_2/loss=115.940, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.83it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=66.999, player_2/loss=92.588, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 353.95it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=112.528, player_2/loss=111.457, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 353.88it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=154.853, player_2/loss=170.879, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 352.15it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=156.009, player_2/loss=184.290, rew=-15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 352.69it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=144.075, player_2/loss=152.933, rew=-18.75]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 353.44it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=149.934, player_2/loss=145.592, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 351.98it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=114.397, player_2/loss=129.810, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 355.58it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=143.017, player_2/loss=137.445, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 352.35it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=162.144, player_2/loss=130.896, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 355.29it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=108.565, player_2/loss=131.321, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 353.03it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=56.960, player_2/loss=136.004, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 353.74it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=70.678, player_2/loss=127.850, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 350.79it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=49.418, player_2/loss=142.252, rew=-8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 354.56it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=53.092, player_2/loss=71.311, rew=-3.57]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.23it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=71.040, player_2/loss=71.870, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 353.13it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=92.507, player_2/loss=61.677, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 353.07it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=116.086, player_2/loss=110.732, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 349.89it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=118.253, player_2/loss=143.597, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 354.85it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=117.905, player_2/loss=77.382, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 353.21it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=109.953, player_2/loss=25.702, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 354.32it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=126.798, player_2/loss=51.149, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 352.81it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=112.192, player_2/loss=66.426, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 355.53it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=101.091, player_2/loss=37.825, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 355.24it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=139.439, player_2/loss=55.386, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 355.67it/s, env_step=12288, len=12, n/ep=6, n/st=64, player_1/loss=142.157, player_2/loss=92.841, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 356.19it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=107.504, rew=12.50]       


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 351.68it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=87.540, player_2/loss=53.949, rew=16.67]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 354.85it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=94.162, player_2/loss=10.373, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 354.60it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=96.185, player_2/loss=4.929, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 357.69it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=104.399, player_2/loss=6.215, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 351.66it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=103.238, player_2/loss=48.751, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 355.37it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=95.729, player_2/loss=64.067, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 353.45it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=67.259, player_2/loss=62.345, rew=-12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 353.03it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=103.116, player_2/loss=125.152, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 352.78it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=172.290, player_2/loss=234.019, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 352.79it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=132.095, player_2/loss=396.631, rew=13.89]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 357.03it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=73.889, player_2/loss=410.806, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 338.91it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=44.511, player_2/loss=355.722, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 353.51it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=54.870, player_2/loss=377.720, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 353.30it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=100.609, player_2/loss=375.896, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 353.17it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=59.208, player_2/loss=443.439, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 351.52it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=28.696, player_2/loss=467.092, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 355.46it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=22.629, player_2/loss=442.270, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 351.73it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=34.705, player_2/loss=363.836, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 352.33it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=50.075, player_2/loss=460.340, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 349.57it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=67.460, player_2/loss=478.468, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 351.00it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=71.557, player_2/loss=480.118, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 353.29it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=44.993, player_2/loss=458.503, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 348.82it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=34.021, player_2/loss=425.576, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 354.69it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=48.988, player_2/loss=419.909, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 350.05it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=63.580, player_2/loss=460.473, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 354.13it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=95.248, player_2/loss=235.159, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 353.81it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=390.462, player_2/loss=152.108, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 354.13it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=523.593, player_2/loss=92.109, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 354.86it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=450.036, player_2/loss=82.097, rew=18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 354.51it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=547.514, player_2/loss=78.051, rew=18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 354.50it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_2/loss=85.043, rew=25.00]           


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 353.21it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=514.287, player_2/loss=84.156, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 354.39it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=595.153, player_2/loss=60.598, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 354.42it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=614.890, player_2/loss=36.418, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 354.88it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_2/loss=31.184, rew=18.75]         


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 354.38it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=565.203, player_2/loss=63.745, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 353.46it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=523.414, player_2/loss=101.895, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 355.86it/s, env_step=13312, len=9, n/ep=8, n/st=64, player_1/loss=582.542, player_2/loss=90.077, rew=18.75]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 352.51it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=545.826, player_2/loss=37.723, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 350.36it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=494.029, player_2/loss=30.158, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 353.59it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=469.684, player_2/loss=75.863, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 352.70it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=470.896, player_2/loss=81.473, rew=10.71]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 353.41it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=466.882, player_2/loss=34.651, rew=10.71]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 353.83it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=407.399, player_2/loss=22.981, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 353.80it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=396.163, player_2/loss=37.663, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.22it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=342.953, player_2/loss=117.011, rew=13.89]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 351.44it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=229.263, player_2/loss=355.301, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 352.37it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=73.737, player_2/loss=631.814, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 351.21it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=63.237, player_2/loss=701.744, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 352.51it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=31.777, player_2/loss=854.079, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 352.84it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=73.039, player_2/loss=664.269, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 351.29it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=104.986, player_2/loss=607.311, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 350.99it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=87.535, player_2/loss=630.860, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 361.52it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=54.906, player_2/loss=688.017, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 352.29it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=48.863, player_2/loss=615.206, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 350.56it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=53.689, player_2/loss=602.084, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 353.01it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=65.986, player_2/loss=676.913, rew=13.89]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 351.00it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=49.716, player_2/loss=819.649, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 349.70it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=25.853, player_2/loss=949.460, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 352.38it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=57.495, player_2/loss=872.220, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 350.54it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=55.051, player_2/loss=685.667, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 355.77it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=47.007, player_2/loss=617.062, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 351.95it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=66.757, player_2/loss=542.152, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 353.31it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=76.562, player_2/loss=503.208, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.34it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=62.699, player_2/loss=421.881, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 353.31it/s, env_step=3072, len=9, n/ep=8, n/st=64, player_1/loss=62.898, player_2/loss=273.580, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.71it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=186.091, player_2/loss=230.363, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 352.03it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=198.332, player_2/loss=209.347, rew=16.67]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 355.35it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=125.897, player_2/loss=148.631, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 350.21it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=135.373, player_2/loss=83.320, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 351.04it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=144.334, player_2/loss=84.828, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 354.07it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=158.827, player_2/loss=96.280, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 354.41it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=155.602, player_2/loss=70.273, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 352.95it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=132.700, player_2/loss=36.428, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 353.65it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=218.450, player_2/loss=58.831, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 352.19it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=255.927, player_2/loss=81.013, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 354.11it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=162.439, player_2/loss=110.783, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 353.67it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=178.676, player_2/loss=92.096, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 352.22it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_2/loss=18.529, rew=25.00]        


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 350.86it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=218.499, player_2/loss=10.733, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 350.54it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=148.088, player_2/loss=23.371, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 355.27it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=165.138, player_2/loss=49.359, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 352.09it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=191.509, player_2/loss=121.403, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.65it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=186.588, player_2/loss=154.305, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 351.19it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=170.180, player_2/loss=265.740, rew=5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 352.61it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=86.868, player_2/loss=287.647, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 344.94it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=48.911, player_2/loss=360.273, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 353.28it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=78.111, player_2/loss=339.947, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 349.59it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=35.568, player_2/loss=342.088, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 353.06it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=12.300, player_2/loss=347.674, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 353.67it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=15.664, player_2/loss=413.909, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 353.90it/s, env_step=10240, len=13, n/ep=6, n/st=64, player_1/loss=38.651, player_2/loss=414.663, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 350.26it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=54.017, player_2/loss=365.421, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.91it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=35.660, player_2/loss=348.386, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 354.12it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=15.159, player_2/loss=387.833, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 353.95it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=18.670, player_2/loss=397.400, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 354.68it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=18.975, player_2/loss=440.159, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 351.72it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=13.700, player_2/loss=447.828, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 352.50it/s, env_step=17408, len=9, n/ep=8, n/st=64, player_1/loss=14.462, player_2/loss=447.232, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 352.90it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=51.240, rew=25.00]         


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 351.13it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=17.193, player_2/loss=356.272, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 355.00it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=17.324, player_2/loss=312.897, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 355.43it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=81.901, player_2/loss=179.670, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 355.00it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=149.764, player_2/loss=55.349, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 352.95it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=172.357, player_2/loss=38.365, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 354.33it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=169.428, player_2/loss=46.286, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 357.14it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=154.122, player_2/loss=76.268, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 352.81it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=139.333, player_2/loss=67.511, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 355.71it/s, env_step=8192, len=12, n/ep=4, n/st=64, player_1/loss=147.746, player_2/loss=58.856, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 353.75it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=141.075, player_2/loss=64.510, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 356.32it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=116.055, rew=25.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 353.79it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=154.253, player_2/loss=14.577, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 356.63it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=165.451, player_2/loss=25.783, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 354.41it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=138.228, player_2/loss=24.755, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 356.22it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=126.183, player_2/loss=10.948, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 355.03it/s, env_step=15360, len=15, n/ep=5, n/st=64, player_1/loss=121.348, player_2/loss=41.394, rew=-5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 352.46it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=136.966, player_2/loss=50.553, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 353.50it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=140.133, player_2/loss=16.420, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 354.95it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=144.002, player_2/loss=10.510, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 354.20it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=152.440, player_2/loss=18.596, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 353.14it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=95.707, player_2/loss=95.466, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.43it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=83.072, player_2/loss=53.397, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.29it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=49.684, player_2/loss=22.685, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 354.10it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=117.055, player_2/loss=14.201, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 353.79it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=199.138, player_2/loss=133.871, rew=15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 352.67it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=126.961, player_2/loss=155.834, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 351.80it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=135.575, player_2/loss=199.587, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 352.62it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=239.059, player_2/loss=291.500, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 354.50it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=225.859, player_2/loss=271.133, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 353.04it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=188.013, player_2/loss=311.156, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 351.02it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=178.082, player_2/loss=320.291, rew=-6.25]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 351.42it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=139.206, player_2/loss=333.150, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 348.94it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=135.267, player_2/loss=369.768, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 351.93it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=129.671, player_2/loss=412.496, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 353.90it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=111.162, player_2/loss=420.346, rew=3.57]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 362.43it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=113.014, player_2/loss=368.325, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 350.36it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=89.628, player_2/loss=316.752, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 352.07it/s, env_step=18432, len=9, n/ep=8, n/st=64, player_1/loss=74.700, player_2/loss=309.352, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 348.93it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=33.680, player_2/loss=348.553, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 352.40it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=41.738, player_2/loss=282.330, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.60it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=45.947, player_2/loss=316.158, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 352.02it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=59.804, player_2/loss=293.303, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 352.19it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=83.681, player_2/loss=251.396, rew=15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 352.53it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=169.961, player_2/loss=185.915, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 350.92it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=283.827, player_2/loss=122.028, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 353.17it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=348.285, player_2/loss=75.040, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 352.53it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=393.305, player_2/loss=21.606, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 352.64it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=410.201, player_2/loss=9.114, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 356.63it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=364.255, player_2/loss=10.140, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 352.93it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=318.868, player_2/loss=13.739, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 354.34it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=249.431, player_2/loss=7.666, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 353.00it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=214.451, player_2/loss=10.821, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 351.83it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=226.668, player_2/loss=11.584, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 355.74it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=272.373, player_2/loss=21.958, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 352.11it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=256.641, player_2/loss=38.823, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 354.15it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=243.866, player_2/loss=31.723, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 352.17it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=286.153, player_2/loss=59.468, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 352.19it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=320.046, player_2/loss=68.552, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 353.02it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=215.555, player_2/loss=45.269, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.29it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=173.833, player_2/loss=52.684, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.35it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=155.599, player_2/loss=34.910, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 355.03it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=72.069, player_2/loss=11.753, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 352.64it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=50.233, player_2/loss=15.045, rew=-5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 353.09it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=49.223, player_2/loss=38.628, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 352.12it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=37.734, player_2/loss=33.359, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 351.56it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=52.937, player_2/loss=20.675, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 353.33it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=134.403, player_2/loss=220.088, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 351.12it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=149.742, player_2/loss=390.883, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 353.44it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=129.684, player_2/loss=459.726, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 336.56it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=122.104, player_2/loss=453.035, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 353.20it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=155.337, player_2/loss=460.086, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 350.30it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=145.382, player_2/loss=372.336, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 351.82it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=55.329, player_2/loss=369.011, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 351.67it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=71.965, player_2/loss=421.741, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 348.88it/s, env_step=17408, len=7, n/ep=7, n/st=64, player_1/loss=70.041, player_2/loss=447.679, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 354.17it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=70.635, player_2/loss=418.797, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 355.03it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=67.178, player_2/loss=384.622, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 353.22it/s, env_step=1024, len=8, n/ep=9, n/st=64, player_1/loss=52.642, player_2/loss=295.427, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.01it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=64.229, player_2/loss=288.780, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.48it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=112.169, player_2/loss=197.799, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 353.29it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=225.852, player_2/loss=126.358, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 354.63it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=407.291, player_2/loss=81.985, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 354.57it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=445.521, player_2/loss=20.127, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 353.29it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=389.218, player_2/loss=20.588, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 350.96it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=370.564, player_2/loss=17.007, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 354.64it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=380.777, player_2/loss=3.269, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 352.57it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=353.230, rew=16.67]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 353.02it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=353.205, player_2/loss=8.526, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 352.13it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=326.664, player_2/loss=40.365, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 354.33it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=315.900, player_2/loss=67.697, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 353.64it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=375.042, player_2/loss=37.123, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 355.18it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=422.473, player_2/loss=4.990, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 352.24it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=415.583, player_2/loss=7.300, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 355.31it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=487.120, player_2/loss=11.463, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 354.28it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=407.946, player_2/loss=7.585, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 353.54it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=359.148, player_2/loss=6.097, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 351.36it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=255.346, player_2/loss=2.382, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.39it/s, env_step=2048, len=10, n/ep=7, n/st=64, player_1/loss=168.155, player_2/loss=2.465, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.81it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=81.546, player_2/loss=21.996, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.79it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=79.662, player_2/loss=26.950, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 352.84it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=92.615, player_2/loss=8.818, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 353.07it/s, env_step=6144, len=9, n/ep=6, n/st=64, player_1/loss=91.322, player_2/loss=9.942, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 355.32it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=76.336, player_2/loss=11.887, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 350.53it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=40.138, player_2/loss=28.650, rew=-15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 356.27it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=14.780, rew=-25.00]         


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 352.70it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=49.434, player_2/loss=16.596, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 354.95it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=52.168, player_2/loss=12.054, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 355.49it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=18.735, player_2/loss=8.036, rew=-16.67]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 350.50it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=29.526, player_2/loss=11.501, rew=-15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 353.38it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=43.614, player_2/loss=14.287, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 353.11it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=64.991, player_2/loss=28.111, rew=-16.67]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 354.23it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=101.577, player_2/loss=81.344, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #16


Epoch #17: 1025it [00:02, 351.77it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=191.707, player_2/loss=208.073, rew=18.75]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #16


Epoch #18: 1025it [00:02, 351.49it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=117.150, player_2/loss=337.620, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #16


Epoch #19: 1025it [00:02, 352.95it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=79.461, player_2/loss=463.999, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #16


Epoch #1: 1025it [00:02, 350.04it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=274.959, player_2/loss=122.268, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 359.05it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=370.875, player_2/loss=73.942, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 365.47it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=411.108, player_2/loss=17.447, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 354.74it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=378.463, player_2/loss=18.379, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 351.59it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=352.771, player_2/loss=18.524, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 352.95it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=406.785, player_2/loss=14.480, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 349.15it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=336.854, player_2/loss=13.554, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 352.69it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=298.421, player_2/loss=4.965, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 346.68it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=279.179, player_2/loss=70.315, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 354.71it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=273.098, rew=25.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 352.37it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=284.034, player_2/loss=38.117, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 354.68it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=253.810, player_2/loss=12.614, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 351.77it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=335.456, player_2/loss=13.645, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 354.24it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=506.477, player_2/loss=14.982, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 355.96it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=435.598, player_2/loss=12.562, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 353.32it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=372.235, player_2/loss=7.684, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 355.77it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=368.492, player_2/loss=37.044, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 353.09it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=319.320, player_2/loss=65.053, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 354.22it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=340.915, player_2/loss=41.299, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 352.84it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=181.376, player_2/loss=119.379, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.80it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=147.693, player_2/loss=70.624, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.40it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=107.226, player_2/loss=73.475, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 351.15it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=97.048, player_2/loss=228.260, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 353.90it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=63.654, player_2/loss=307.536, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 351.65it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=28.865, player_2/loss=242.306, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 354.30it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=16.298, player_2/loss=200.202, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 352.19it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=24.082, player_2/loss=204.705, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 354.54it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=16.077, player_2/loss=226.125, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 351.88it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=7.247, player_2/loss=241.509, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 351.31it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=24.132, player_2/loss=226.637, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 352.72it/s, env_step=12288, len=15, n/ep=5, n/st=64, player_1/loss=22.694, player_2/loss=281.113, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 352.12it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=7.667, player_2/loss=284.419, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 353.55it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=8.219, player_2/loss=266.197, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 352.37it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=9.450, player_2/loss=261.657, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 353.55it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=58.287, player_2/loss=270.154, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 350.12it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=31.319, player_2/loss=285.945, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 353.06it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=21.734, player_2/loss=271.205, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 349.30it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=49.332, player_2/loss=227.574, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 355.70it/s, env_step=1024, len=20, n/ep=4, n/st=64, player_1/loss=89.039, player_2/loss=150.747, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.08it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=100.789, player_2/loss=130.568, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 354.67it/s, env_step=3072, len=27, n/ep=2, n/st=64, player_1/loss=127.831, player_2/loss=126.998, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 349.24it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=192.712, player_2/loss=111.019, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 366.96it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=202.915, player_2/loss=86.544, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 354.93it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=166.052, player_2/loss=74.876, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 354.39it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=171.283, player_2/loss=88.303, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 353.44it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=220.342, player_2/loss=94.082, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 354.75it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=293.983, player_2/loss=83.598, rew=12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 354.13it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=217.101, player_2/loss=70.727, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 352.91it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=153.377, player_2/loss=86.308, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 353.66it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=196.100, player_2/loss=73.067, rew=-5.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 354.72it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=169.152, player_2/loss=73.461, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 356.59it/s, env_step=14336, len=23, n/ep=3, n/st=64, player_1/loss=149.191, player_2/loss=81.984, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 352.76it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=120.167, player_2/loss=32.695, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 353.10it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=140.710, player_2/loss=23.714, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 354.01it/s, env_step=17408, len=24, n/ep=3, n/st=64, player_1/loss=180.165, player_2/loss=48.680, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 352.95it/s, env_step=18432, len=28, n/ep=2, n/st=64, player_1/loss=126.698, player_2/loss=80.796, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 353.05it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=87.929, player_2/loss=58.108, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 354.09it/s, env_step=1024, len=24, n/ep=2, n/st=64, player_1/loss=70.098, player_2/loss=14.302, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.33it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=124.864, player_2/loss=135.146, rew=18.75]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.05it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=162.727, rew=12.50]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 350.81it/s, env_step=4096, len=10, n/ep=7, n/st=64, player_1/loss=126.765, player_2/loss=280.563, rew=10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 353.67it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=73.650, player_2/loss=291.674, rew=13.89]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 351.97it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=94.195, player_2/loss=302.724, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.23it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=120.284, player_2/loss=310.865, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 352.72it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=162.759, player_2/loss=259.360, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 352.31it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=148.890, player_2/loss=251.521, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 351.29it/s, env_step=10240, len=9, n/ep=8, n/st=64, player_1/loss=95.797, player_2/loss=233.090, rew=18.75]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 353.45it/s, env_step=11264, len=15, n/ep=5, n/st=64, player_1/loss=111.087, player_2/loss=234.881, rew=5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 350.78it/s, env_step=12288, len=9, n/ep=8, n/st=64, player_1/loss=123.030, player_2/loss=217.068, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 351.76it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=114.662, player_2/loss=253.895, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 349.82it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=114.315, player_2/loss=296.355, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 351.83it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=78.182, player_2/loss=306.571, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 354.55it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=49.179, player_2/loss=372.218, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 351.06it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=49.418, player_2/loss=382.019, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 353.52it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=94.562, rew=19.44]         


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 352.69it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=103.266, player_2/loss=283.431, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 353.92it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=103.101, player_2/loss=203.247, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 352.66it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=116.189, player_2/loss=111.523, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 352.74it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=124.549, player_2/loss=65.939, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 351.86it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=163.024, player_2/loss=75.660, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 353.83it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=235.975, player_2/loss=91.026, rew=-5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 352.43it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=214.827, player_2/loss=158.682, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 349.29it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=155.183, player_2/loss=114.913, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 363.83it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=156.581, player_2/loss=94.817, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 364.81it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=269.159, player_2/loss=57.482, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 352.02it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=359.911, player_2/loss=65.567, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 352.83it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=335.549, player_2/loss=74.264, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 354.70it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=286.835, player_2/loss=153.932, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 352.06it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=208.007, player_2/loss=172.293, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 356.94it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=167.638, player_2/loss=159.918, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 353.83it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=143.161, player_2/loss=115.242, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 355.28it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=88.633, player_2/loss=76.084, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 352.26it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=100.719, player_2/loss=107.365, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 354.23it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=149.018, player_2/loss=79.358, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 356.00it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_1/loss=148.829, player_2/loss=95.200, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 351.03it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=133.991, player_2/loss=86.161, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.08it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_2/loss=95.687, rew=-15.00]         


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.01it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=79.721, player_2/loss=105.408, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.68it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=98.762, player_2/loss=136.847, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 356.06it/s, env_step=5120, len=20, n/ep=4, n/st=64, player_1/loss=99.108, player_2/loss=127.542, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 352.08it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=83.050, player_2/loss=127.561, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 354.73it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=50.232, player_2/loss=127.419, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 352.79it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=30.715, player_2/loss=101.436, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 349.88it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=51.695, player_2/loss=110.576, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 350.08it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=72.544, player_2/loss=182.559, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 351.41it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=58.709, player_2/loss=230.412, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 351.99it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=37.280, player_2/loss=214.684, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 350.70it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=61.702, player_2/loss=219.605, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 353.28it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=91.142, player_2/loss=195.140, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 353.31it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=69.745, player_2/loss=144.289, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 347.20it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=48.632, player_2/loss=189.100, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 352.44it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=27.962, player_2/loss=208.108, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 353.41it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=15.756, player_2/loss=232.637, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 350.54it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=49.681, player_2/loss=237.080, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 354.17it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=93.208, player_2/loss=218.903, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.25it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=114.257, player_2/loss=178.298, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 352.99it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=138.257, player_2/loss=148.141, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 351.60it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=86.846, player_2/loss=162.616, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 354.60it/s, env_step=5120, len=10, n/ep=5, n/st=64, player_1/loss=37.654, player_2/loss=178.571, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 351.92it/s, env_step=6144, len=16, n/ep=3, n/st=64, player_1/loss=87.066, player_2/loss=126.841, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 352.30it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=120.012, player_2/loss=61.800, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 351.91it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=128.673, player_2/loss=85.305, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 350.21it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=180.789, player_2/loss=109.404, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 352.01it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=180.196, player_2/loss=49.117, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 360.35it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=174.642, player_2/loss=60.766, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 354.68it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=132.481, player_2/loss=86.293, rew=-15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 352.99it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=105.159, player_2/loss=132.861, rew=-12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 354.48it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=118.661, player_2/loss=132.508, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 353.14it/s, env_step=15360, len=17, n/ep=3, n/st=64, player_1/loss=103.570, player_2/loss=69.069, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 353.21it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=161.434, player_2/loss=27.180, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 352.30it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_2/loss=28.560, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 353.43it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_2/loss=35.225, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 355.28it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=173.886, player_2/loss=26.397, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 351.19it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=113.648, player_2/loss=13.531, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.31it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=134.736, player_2/loss=127.791, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 351.21it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=122.316, player_2/loss=207.728, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 351.19it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=95.756, player_2/loss=320.377, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 348.69it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=102.581, player_2/loss=409.668, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 351.26it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=69.495, player_2/loss=398.438, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 349.96it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=47.120, player_2/loss=330.113, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 354.54it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=79.696, player_2/loss=305.372, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 353.96it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=76.601, player_2/loss=476.865, rew=12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 352.70it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=33.178, player_2/loss=455.275, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 350.08it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=84.041, player_2/loss=374.660, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 351.86it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=71.445, player_2/loss=362.440, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 351.92it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=62.500, player_2/loss=430.152, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 352.14it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=36.674, player_2/loss=441.869, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 350.99it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=20.692, player_2/loss=419.744, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 355.06it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=37.968, player_2/loss=344.446, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 348.56it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=29.073, player_2/loss=379.506, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 351.77it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=25.855, player_2/loss=426.461, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 352.55it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=20.978, player_2/loss=386.587, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 352.10it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=27.945, player_2/loss=351.361, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.61it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=25.087, player_2/loss=268.690, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.37it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=39.305, player_2/loss=181.260, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 352.09it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=89.758, player_2/loss=175.534, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 352.17it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=135.156, player_2/loss=128.706, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 352.44it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=96.346, player_2/loss=121.694, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 353.86it/s, env_step=7168, len=15, n/ep=3, n/st=64, player_1/loss=92.531, player_2/loss=117.423, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 352.60it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=78.958, player_2/loss=81.596, rew=-15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 353.35it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=102.096, player_2/loss=59.897, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 353.29it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=127.175, player_2/loss=46.119, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 349.57it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=109.310, player_2/loss=42.818, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 351.26it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=120.936, player_2/loss=54.843, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 348.95it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=106.115, player_2/loss=50.276, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 350.96it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=122.321, player_2/loss=48.344, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 352.65it/s, env_step=15360, len=17, n/ep=3, n/st=64, player_1/loss=182.067, player_2/loss=77.600, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 353.55it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=147.416, player_2/loss=130.373, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 349.57it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_2/loss=152.869, rew=-15.00]      


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 354.40it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=77.504, player_2/loss=129.714, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 350.34it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=136.250, player_2/loss=116.328, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 350.37it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=146.752, player_2/loss=36.659, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.97it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=107.285, player_2/loss=71.316, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 347.69it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=212.490, player_2/loss=243.868, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 350.32it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=186.438, player_2/loss=461.128, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 345.19it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=88.147, player_2/loss=599.145, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 350.42it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=175.982, player_2/loss=537.520, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 349.30it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=114.611, player_2/loss=559.868, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 348.91it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=84.751, player_2/loss=631.109, rew=2.78]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 347.24it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=99.159, player_2/loss=663.483, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 347.11it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=35.298, player_2/loss=630.702, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 347.85it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=30.314, rew=13.89]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.31it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=43.607, player_2/loss=593.301, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 350.14it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=31.361, player_2/loss=699.794, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 348.84it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=76.746, player_2/loss=599.767, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 348.06it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=72.832, player_2/loss=532.284, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 348.74it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=23.716, player_2/loss=480.488, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 350.66it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=18.808, player_2/loss=505.238, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 348.92it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=35.475, player_2/loss=557.493, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 350.20it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=77.225, player_2/loss=596.667, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 349.80it/s, env_step=1024, len=7, n/ep=10, n/st=64, player_1/loss=96.937, player_2/loss=409.014, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.97it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=108.902, player_2/loss=385.718, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.57it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=106.240, player_2/loss=352.317, rew=-18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 350.91it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=92.824, player_2/loss=319.264, rew=-19.44]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 353.20it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=84.277, player_2/loss=276.082, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 347.83it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=98.350, player_2/loss=300.205, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 353.03it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=238.724, player_2/loss=280.818, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 351.50it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=322.685, player_2/loss=223.312, rew=-17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 352.30it/s, env_step=9216, len=23, n/ep=3, n/st=64, player_1/loss=192.497, player_2/loss=186.567, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 353.50it/s, env_step=10240, len=24, n/ep=3, n/st=64, player_1/loss=112.587, player_2/loss=135.905, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 350.96it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=217.395, player_2/loss=103.307, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 351.55it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=347.300, player_2/loss=85.294, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 349.17it/s, env_step=13312, len=12, n/ep=6, n/st=64, player_1/loss=421.457, player_2/loss=55.096, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 353.13it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=423.480, player_2/loss=47.374, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 354.01it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=382.945, rew=25.00]       


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 358.58it/s, env_step=16384, len=19, n/ep=4, n/st=64, player_1/loss=369.256, player_2/loss=27.072, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 350.20it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=310.735, player_2/loss=32.310, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 340.91it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=306.293, player_2/loss=23.137, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 350.56it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=340.047, player_2/loss=19.633, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 354.14it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=394.824, player_2/loss=98.793, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.32it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=251.022, player_2/loss=89.799, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.00it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=183.661, player_2/loss=205.673, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.88it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=183.856, player_2/loss=325.795, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 352.90it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=106.573, player_2/loss=319.365, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 349.47it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=75.114, player_2/loss=412.088, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.93it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=49.135, player_2/loss=458.685, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.77it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=64.665, player_2/loss=431.069, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 352.77it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=63.386, player_2/loss=387.283, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 350.49it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=18.559, player_2/loss=367.021, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 352.12it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=57.834, player_2/loss=376.012, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 351.14it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=108.417, player_2/loss=382.343, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 352.33it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=65.807, player_2/loss=372.016, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 353.01it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=32.851, player_2/loss=351.892, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 352.48it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=44.311, player_2/loss=349.242, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 352.51it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=40.298, player_2/loss=440.549, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 350.90it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=25.337, player_2/loss=483.289, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 352.94it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=16.903, player_2/loss=404.581, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 353.21it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=15.414, player_2/loss=406.537, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 351.21it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=13.254, player_2/loss=297.237, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 353.05it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=60.712, player_2/loss=251.383, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 351.54it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=115.890, player_2/loss=142.608, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 354.63it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=198.804, player_2/loss=88.754, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 352.46it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=246.332, player_2/loss=68.651, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 349.56it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=308.482, player_2/loss=46.776, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 356.05it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=255.409, player_2/loss=49.176, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 351.47it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=202.460, player_2/loss=49.625, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 355.66it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=183.289, rew=-25.00]        


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 353.06it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=98.507, player_2/loss=78.691, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 356.20it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=68.273, player_2/loss=125.252, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 349.10it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=80.292, player_2/loss=129.068, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 354.50it/s, env_step=13312, len=27, n/ep=3, n/st=64, player_1/loss=132.296, player_2/loss=138.650, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 352.79it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=152.496, player_2/loss=114.186, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 354.25it/s, env_step=15360, len=33, n/ep=2, n/st=64, player_1/loss=165.455, player_2/loss=108.681, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 353.74it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=162.032, player_2/loss=115.397, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 352.16it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=136.791, player_2/loss=101.848, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 353.09it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=136.557, player_2/loss=121.274, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 365.46it/s, env_step=19456, len=28, n/ep=3, n/st=64, player_1/loss=135.135, player_2/loss=170.225, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 352.65it/s, env_step=1024, len=32, n/ep=2, n/st=64, player_1/loss=69.294, player_2/loss=60.819, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 353.27it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_2/loss=55.332, rew=25.00]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 354.60it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=58.013, player_2/loss=76.219, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.30it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=67.147, player_2/loss=139.095, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 354.37it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=67.161, player_2/loss=213.112, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 350.35it/s, env_step=6144, len=9, n/ep=6, n/st=64, player_1/loss=61.199, player_2/loss=207.668, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.75it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=83.909, player_2/loss=177.447, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 350.41it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=74.648, player_2/loss=171.451, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 355.00it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=53.569, player_2/loss=180.430, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 351.83it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=57.132, player_2/loss=157.759, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 350.22it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=47.383, player_2/loss=182.032, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 353.52it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=49.442, player_2/loss=198.775, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 351.22it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=80.387, player_2/loss=205.587, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.39it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=66.935, player_2/loss=210.392, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 352.72it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=94.806, player_2/loss=187.568, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 351.24it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=167.858, player_2/loss=228.843, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 350.49it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=154.788, player_2/loss=226.215, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 352.56it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=60.893, player_2/loss=229.304, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 349.60it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=126.584, player_2/loss=219.821, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 346.55it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=151.821, player_2/loss=227.396, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.28it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=98.629, player_2/loss=202.245, rew=-17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.71it/s, env_step=3072, len=16, n/ep=3, n/st=64, player_1/loss=162.731, player_2/loss=155.790, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 352.21it/s, env_step=4096, len=25, n/ep=3, n/st=64, player_1/loss=174.225, player_2/loss=125.398, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 351.55it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=66.976, player_2/loss=76.956, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 355.14it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=101.490, player_2/loss=75.381, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 351.39it/s, env_step=7168, len=30, n/ep=2, n/st=64, player_1/loss=149.381, player_2/loss=114.165, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 353.88it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=152.940, player_2/loss=108.543, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 351.18it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=175.707, player_2/loss=129.836, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 353.70it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=193.335, player_2/loss=73.625, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 352.09it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=247.156, player_2/loss=27.141, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 355.92it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=277.315, player_2/loss=78.296, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 354.92it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=260.677, player_2/loss=84.772, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 349.02it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=221.597, player_2/loss=43.409, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 354.34it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=176.279, player_2/loss=43.392, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 349.93it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=198.553, player_2/loss=30.469, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 352.67it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=254.205, player_2/loss=27.741, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 349.25it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=264.958, player_2/loss=24.093, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 355.03it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=263.043, player_2/loss=14.816, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 348.72it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=164.066, player_2/loss=11.765, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.01it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=189.950, player_2/loss=183.428, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 355.94it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=144.307, player_2/loss=301.378, rew=15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 354.35it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=153.661, player_2/loss=201.703, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 355.66it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=144.939, player_2/loss=68.457, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 350.59it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=72.857, player_2/loss=24.164, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 354.59it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=62.267, player_2/loss=7.338, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 350.45it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=134.354, player_2/loss=76.108, rew=-5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 353.36it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=169.134, player_2/loss=234.836, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 348.02it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=108.170, player_2/loss=323.298, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 353.02it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=103.566, player_2/loss=347.875, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.87it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=96.700, player_2/loss=384.322, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 353.20it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=97.704, player_2/loss=382.816, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 349.69it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=94.619, player_2/loss=347.274, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 351.02it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=43.809, player_2/loss=372.448, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 348.63it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=12.859, player_2/loss=429.758, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 351.54it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=32.579, player_2/loss=391.140, rew=10.71]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 348.47it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=47.067, player_2/loss=381.264, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 353.23it/s, env_step=19456, len=9, n/ep=8, n/st=64, player_1/loss=26.581, player_2/loss=411.503, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 349.30it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=31.087, player_2/loss=368.046, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 353.24it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=22.212, player_2/loss=290.698, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 350.86it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=72.822, player_2/loss=177.490, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 353.46it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=105.643, player_2/loss=93.162, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 347.59it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=109.657, player_2/loss=48.305, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 353.70it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=105.932, player_2/loss=20.664, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 350.56it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=98.544, player_2/loss=38.734, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 351.21it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_2/loss=31.990, rew=25.00]          


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 352.82it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=101.651, player_2/loss=39.550, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 347.69it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=115.243, player_2/loss=24.989, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 351.06it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=143.341, player_2/loss=51.414, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 352.61it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=93.768, player_2/loss=36.714, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 353.40it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=40.075, player_2/loss=47.077, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 349.60it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=53.844, player_2/loss=19.067, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 352.82it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=74.752, player_2/loss=6.492, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 348.71it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=63.160, player_2/loss=8.284, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 350.20it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=59.310, player_2/loss=44.585, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 351.20it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=78.526, player_2/loss=64.515, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 353.64it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=61.235, player_2/loss=68.544, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 351.56it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=80.772, player_2/loss=12.467, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.12it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=56.943, player_2/loss=9.033, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.91it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=23.689, player_2/loss=7.923, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.00it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=22.486, player_2/loss=10.572, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 352.30it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=110.423, player_2/loss=32.390, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 349.74it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=127.706, player_2/loss=30.001, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.35it/s, env_step=7168, len=16, n/ep=3, n/st=64, player_1/loss=99.819, player_2/loss=22.851, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.90it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=68.312, player_2/loss=57.886, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 352.93it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=71.087, player_2/loss=89.476, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 350.08it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=47.542, player_2/loss=95.725, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 351.72it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=53.356, player_2/loss=79.511, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #12: 1025it [00:02, 349.34it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=132.232, player_2/loss=206.336, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #13: 1025it [00:02, 350.76it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_2/loss=392.716, rew=12.50]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #14: 1025it [00:02, 348.08it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=138.547, player_2/loss=453.528, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #15: 1025it [00:02, 349.80it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=62.783, player_2/loss=433.266, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #16: 1025it [00:02, 347.30it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=65.669, player_2/loss=433.134, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #17: 1025it [00:02, 350.70it/s, env_step=17408, len=7, n/ep=10, n/st=64, player_1/loss=65.534, player_2/loss=427.089, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #18: 1025it [00:02, 347.68it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=82.732, player_2/loss=374.314, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #19: 1025it [00:02, 350.67it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=46.950, player_2/loss=383.343, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #1: 1025it [00:02, 349.49it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=22.049, player_2/loss=332.559, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.02it/s, env_step=2048, len=16, n/ep=5, n/st=64, player_1/loss=69.213, player_2/loss=220.598, rew=15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.38it/s, env_step=3072, len=32, n/ep=2, n/st=64, player_1/loss=98.281, player_2/loss=90.974, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 352.49it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=122.845, player_2/loss=63.602, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 351.28it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=159.666, player_2/loss=60.346, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 351.39it/s, env_step=6144, len=28, n/ep=2, n/st=64, player_1/loss=145.008, player_2/loss=74.924, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 352.96it/s, env_step=7168, len=29, n/ep=2, n/st=64, player_1/loss=167.954, player_2/loss=165.233, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 352.30it/s, env_step=8192, len=33, n/ep=2, n/st=64, player_1/loss=150.905, player_2/loss=159.027, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 353.33it/s, env_step=9216, len=36, n/ep=1, n/st=64, player_1/loss=116.058, player_2/loss=115.680, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 351.96it/s, env_step=10240, len=33, n/ep=2, n/st=64, player_1/loss=89.150, player_2/loss=73.716, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 351.55it/s, env_step=11264, len=32, n/ep=2, n/st=64, player_1/loss=80.793, rew=25.00]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 351.87it/s, env_step=12288, len=19, n/ep=2, n/st=64, player_1/loss=97.779, player_2/loss=97.026, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 354.59it/s, env_step=13312, len=29, n/ep=2, n/st=64, player_1/loss=112.476, player_2/loss=74.875, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 351.55it/s, env_step=14336, len=24, n/ep=2, n/st=64, player_1/loss=93.004, player_2/loss=36.896, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 355.16it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=109.516, player_2/loss=89.884, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 348.82it/s, env_step=16384, len=20, n/ep=4, n/st=64, player_1/loss=143.774, player_2/loss=118.230, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 352.34it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=174.342, player_2/loss=82.853, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 349.34it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=160.174, player_2/loss=46.632, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 353.59it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=179.573, player_2/loss=48.701, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 348.40it/s, env_step=1024, len=28, n/ep=3, n/st=64, player_1/loss=75.352, player_2/loss=24.089, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.55it/s, env_step=2048, len=26, n/ep=2, n/st=64, player_1/loss=58.390, player_2/loss=16.595, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.59it/s, env_step=3072, len=28, n/ep=2, n/st=64, player_1/loss=38.166, player_2/loss=67.179, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 351.16it/s, env_step=4096, len=27, n/ep=3, n/st=64, player_1/loss=65.146, player_2/loss=88.244, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 350.33it/s, env_step=5120, len=29, n/ep=3, n/st=64, player_1/loss=79.047, player_2/loss=39.887, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 348.99it/s, env_step=6144, len=22, n/ep=2, n/st=64, player_1/loss=58.153, player_2/loss=42.522, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 359.12it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=86.903, player_2/loss=102.563, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 356.83it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=142.800, rew=-25.00]        


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 348.81it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=138.787, player_2/loss=37.314, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 352.29it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=81.625, player_2/loss=120.345, rew=-8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 348.66it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=103.248, player_2/loss=193.385, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 351.13it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=149.345, player_2/loss=215.279, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 350.78it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=175.362, player_2/loss=197.441, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 345.80it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=169.014, rew=19.44]        


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 349.46it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=110.911, player_2/loss=326.577, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 345.70it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=97.368, player_2/loss=349.625, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 349.61it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=68.001, player_2/loss=297.137, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 347.65it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=115.734, player_2/loss=236.715, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 347.73it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=96.680, player_2/loss=285.594, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 350.27it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=71.045, player_2/loss=362.199, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.94it/s, env_step=2048, len=7, n/ep=7, n/st=64, player_1/loss=70.463, player_2/loss=324.965, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.69it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=88.822, player_2/loss=234.837, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 347.02it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=173.331, player_2/loss=149.528, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 353.07it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=270.642, player_2/loss=115.640, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 348.89it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=294.773, player_2/loss=68.042, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 348.02it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=251.986, player_2/loss=13.722, rew=-5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 346.93it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=243.134, player_2/loss=36.020, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 351.84it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=235.586, player_2/loss=36.868, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 350.24it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=206.962, player_2/loss=37.516, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 350.29it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=183.017, player_2/loss=37.704, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 348.33it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=172.106, player_2/loss=35.790, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 349.55it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=188.459, player_2/loss=33.286, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 349.98it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=209.935, player_2/loss=47.176, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 348.79it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=199.262, player_2/loss=56.114, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 352.06it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=215.200, player_2/loss=72.394, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 350.47it/s, env_step=17408, len=13, n/ep=4, n/st=64, player_1/loss=221.025, player_2/loss=50.406, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 347.63it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=229.803, player_2/loss=24.682, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 350.45it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=205.301, player_2/loss=28.605, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 347.95it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=156.641, player_2/loss=83.763, rew=5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.53it/s, env_step=2048, len=24, n/ep=3, n/st=64, player_1/loss=141.408, player_2/loss=93.078, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.70it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=84.525, player_2/loss=197.122, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 347.18it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=88.160, player_2/loss=251.739, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 351.01it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=91.804, player_2/loss=237.713, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 346.65it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=113.235, player_2/loss=183.229, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 349.73it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=114.496, player_2/loss=301.658, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 347.67it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=78.611, player_2/loss=444.039, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 347.32it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=93.183, player_2/loss=408.781, rew=6.25]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 355.38it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=70.454, player_2/loss=501.803, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 345.65it/s, env_step=11264, len=11, n/ep=7, n/st=64, player_1/loss=40.460, player_2/loss=565.807, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 349.92it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=39.061, player_2/loss=526.635, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 345.08it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=41.851, player_2/loss=521.115, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 345.78it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=39.515, player_2/loss=533.449, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 346.11it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=31.995, player_2/loss=592.288, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 340.51it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=37.188, player_2/loss=511.329, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 344.68it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=65.996, player_2/loss=526.360, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 345.88it/s, env_step=18432, len=9, n/ep=9, n/st=64, player_1/loss=64.026, player_2/loss=419.624, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 349.59it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=45.733, player_2/loss=414.666, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 348.70it/s, env_step=1024, len=9, n/ep=8, n/st=64, player_1/loss=28.042, player_2/loss=391.661, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.88it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=39.905, player_2/loss=355.168, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.36it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=47.129, player_2/loss=273.708, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.29it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=43.266, player_2/loss=117.238, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 349.68it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=94.708, player_2/loss=102.829, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 351.47it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=152.648, player_2/loss=165.506, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 348.36it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=214.690, player_2/loss=159.994, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 349.97it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=277.282, player_2/loss=116.893, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 350.33it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=237.445, player_2/loss=59.157, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 352.75it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=274.588, player_2/loss=47.523, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 350.04it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=265.891, player_2/loss=60.556, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 350.45it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_2/loss=84.375, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 349.62it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=266.610, player_2/loss=57.382, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 352.52it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=272.130, player_2/loss=39.386, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 349.11it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=267.525, player_2/loss=17.808, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 349.84it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=264.830, player_2/loss=14.474, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 350.17it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=253.737, player_2/loss=24.548, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 351.64it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=288.618, player_2/loss=36.326, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 349.04it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=329.423, player_2/loss=28.639, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 351.11it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=205.480, player_2/loss=86.844, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.32it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=145.369, player_2/loss=47.483, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.53it/s, env_step=3072, len=10, n/ep=5, n/st=64, player_1/loss=103.193, player_2/loss=48.968, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 353.18it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=111.100, player_2/loss=12.204, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 348.36it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=96.487, player_2/loss=33.528, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 354.94it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=86.577, player_2/loss=95.023, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.42it/s, env_step=7168, len=37, n/ep=2, n/st=64, player_1/loss=78.755, player_2/loss=84.227, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 353.86it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=104.756, player_2/loss=156.728, rew=-15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 350.14it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=165.335, player_2/loss=304.641, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 351.70it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=156.370, player_2/loss=328.694, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 349.99it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=142.212, player_2/loss=218.065, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #12: 1025it [00:02, 357.48it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=115.364, player_2/loss=417.361, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #13: 1025it [00:02, 348.29it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=74.014, player_2/loss=487.619, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #14: 1025it [00:02, 351.65it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=56.954, player_2/loss=535.077, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #15: 1025it [00:02, 346.38it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=17.427, player_2/loss=504.569, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #16: 1025it [00:02, 350.14it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=10.605, player_2/loss=478.565, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #17: 1025it [00:02, 347.62it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=8.709, player_2/loss=511.591, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #18: 1025it [00:02, 350.77it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=26.069, player_2/loss=590.307, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #19: 1025it [00:02, 346.06it/s, env_step=19456, len=8, n/ep=9, n/st=64, player_1/loss=33.565, rew=19.44]         


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #1: 1025it [00:02, 352.12it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=9.991, player_2/loss=534.240, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 350.29it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=88.732, player_2/loss=324.208, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 352.33it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=137.777, player_2/loss=229.783, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 349.78it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=100.059, player_2/loss=246.437, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 355.42it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=85.639, player_2/loss=212.134, rew=-5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 350.01it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=107.446, player_2/loss=199.072, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 352.85it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=142.311, player_2/loss=188.312, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 349.49it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=220.616, player_2/loss=152.299, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 356.10it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=198.498, player_2/loss=104.813, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 348.92it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=221.375, player_2/loss=64.341, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 350.61it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=416.756, player_2/loss=115.726, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 351.83it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=533.284, player_2/loss=135.953, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 348.80it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=446.199, player_2/loss=79.343, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 351.51it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=351.520, player_2/loss=65.138, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 348.96it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=400.462, player_2/loss=52.900, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 351.12it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=360.683, player_2/loss=29.323, rew=15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 346.76it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=285.464, player_2/loss=23.458, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 352.49it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=379.230, player_2/loss=58.103, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 350.24it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=481.543, player_2/loss=73.391, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 352.21it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=267.904, player_2/loss=10.551, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.53it/s, env_step=2048, len=24, n/ep=2, n/st=64, player_1/loss=247.319, player_2/loss=35.579, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 353.18it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=149.384, player_2/loss=79.775, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.74it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=89.574, player_2/loss=126.941, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 353.25it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=99.681, player_2/loss=171.156, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.96it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=92.327, player_2/loss=196.488, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.01it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=50.751, player_2/loss=366.756, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 347.70it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=38.302, player_2/loss=540.568, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 352.26it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=21.272, player_2/loss=827.364, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 347.97it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=19.245, player_2/loss=837.591, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 349.65it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=20.830, player_2/loss=766.045, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 348.47it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=15.252, player_2/loss=783.723, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 347.46it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=10.498, player_2/loss=821.331, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 347.40it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=12.278, player_2/loss=759.778, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 347.78it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=15.227, player_2/loss=769.044, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 349.80it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=14.643, player_2/loss=783.424, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 348.73it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=13.952, player_2/loss=722.249, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 349.46it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=26.432, player_2/loss=661.602, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 353.48it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=29.877, player_2/loss=810.715, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 348.36it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=6.153, player_2/loss=568.007, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.31it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=9.340, player_2/loss=460.934, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.76it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=19.293, player_2/loss=326.369, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.81it/s, env_step=4096, len=20, n/ep=4, n/st=64, player_1/loss=49.418, player_2/loss=269.009, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 347.66it/s, env_step=5120, len=23, n/ep=2, n/st=64, player_1/loss=75.692, player_2/loss=166.147, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 351.00it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=151.315, player_2/loss=103.761, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.07it/s, env_step=7168, len=20, n/ep=4, n/st=64, player_1/loss=153.484, player_2/loss=138.243, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 350.13it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=179.726, player_2/loss=168.752, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 348.77it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=170.471, player_2/loss=120.421, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 352.04it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=137.577, player_2/loss=84.314, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 347.91it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=158.649, player_2/loss=48.716, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 351.35it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=180.082, player_2/loss=41.245, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 350.58it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=204.631, player_2/loss=41.890, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 351.60it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=248.117, player_2/loss=37.468, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 350.62it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=245.291, player_2/loss=25.347, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 353.40it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=225.463, player_2/loss=39.723, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 351.68it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=222.950, player_2/loss=66.572, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 350.78it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=209.063, player_2/loss=86.389, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 348.13it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=215.112, player_2/loss=81.498, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 349.81it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=256.213, player_2/loss=434.180, rew=16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.28it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=190.576, player_2/loss=415.516, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 350.97it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=163.440, player_2/loss=423.998, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 346.41it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=142.156, player_2/loss=335.487, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 350.14it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=62.472, player_2/loss=209.014, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 347.88it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=45.836, player_2/loss=307.532, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 347.94it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=33.440, player_2/loss=398.800, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 348.07it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=38.874, player_2/loss=451.412, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 349.32it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=29.404, player_2/loss=384.158, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 347.74it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=14.599, player_2/loss=465.082, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 347.59it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=23.496, player_2/loss=488.038, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.73it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=30.878, player_2/loss=439.717, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 346.37it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=40.004, player_2/loss=436.675, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 349.02it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=24.518, player_2/loss=394.340, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 347.32it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=9.579, player_2/loss=380.557, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 346.99it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=8.586, player_2/loss=558.047, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 345.90it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=4.567, player_2/loss=511.817, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 347.16it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=6.219, player_2/loss=518.762, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 344.14it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=10.744, player_2/loss=448.059, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 349.70it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=11.594, player_2/loss=373.197, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.39it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=16.689, rew=-16.67]         


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.27it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=71.173, player_2/loss=249.353, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 346.03it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=262.286, player_2/loss=211.789, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 345.18it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=417.955, player_2/loss=139.338, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 351.26it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=372.237, player_2/loss=87.901, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 347.97it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=273.778, player_2/loss=56.738, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 350.79it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=284.404, player_2/loss=106.137, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 348.46it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=238.622, player_2/loss=114.554, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 352.26it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=270.129, player_2/loss=71.152, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 354.62it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=317.089, player_2/loss=55.409, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 349.00it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=244.046, player_2/loss=47.156, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 348.68it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=206.178, player_2/loss=43.535, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 348.00it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=202.325, player_2/loss=60.653, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 350.33it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=195.947, player_2/loss=61.607, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 350.48it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=210.046, player_2/loss=12.644, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 350.42it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_2/loss=9.192, rew=15.00]         


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 348.22it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=264.528, player_2/loss=43.417, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 350.38it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=171.172, player_2/loss=56.325, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 348.41it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=252.251, player_2/loss=257.630, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.33it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=166.236, player_2/loss=302.645, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 347.92it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=92.469, rew=12.50]           


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 344.90it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=53.065, rew=25.00]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 347.08it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=68.392, player_2/loss=390.892, rew=10.71]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 347.84it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=72.295, player_2/loss=436.671, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 346.51it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=50.314, player_2/loss=411.961, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 345.50it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=46.045, player_2/loss=435.387, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 348.56it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=29.576, player_2/loss=512.799, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 344.91it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=20.813, player_2/loss=469.935, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 348.62it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=19.399, player_2/loss=469.413, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.30it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=31.646, player_2/loss=441.698, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 349.21it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=39.263, player_2/loss=468.408, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 334.22it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=28.293, player_2/loss=458.115, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 345.77it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=52.700, player_2/loss=417.859, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 346.08it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=41.919, player_2/loss=434.453, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 345.47it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=27.164, player_2/loss=415.832, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 351.16it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=8.949, player_2/loss=400.722, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.93it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=17.380, player_2/loss=428.890, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 349.75it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=42.805, player_2/loss=390.134, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.81it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=130.912, player_2/loss=340.227, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 351.27it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=212.271, player_2/loss=261.907, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 348.67it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=228.488, player_2/loss=178.113, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 351.91it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=243.517, player_2/loss=64.121, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 349.00it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=260.438, player_2/loss=56.387, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 350.69it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=244.630, player_2/loss=64.694, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 351.64it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=251.772, player_2/loss=58.732, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 349.79it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=285.411, player_2/loss=58.559, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 348.72it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=284.186, player_2/loss=55.577, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 351.58it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=211.519, player_2/loss=141.624, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 350.52it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=163.725, player_2/loss=174.881, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 349.56it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=113.153, player_2/loss=186.290, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 348.06it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=129.932, player_2/loss=154.891, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 353.74it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=158.573, player_2/loss=96.041, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 351.12it/s, env_step=16384, len=23, n/ep=2, n/st=64, player_1/loss=103.058, rew=0.00]        


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 351.86it/s, env_step=17408, len=24, n/ep=2, n/st=64, player_1/loss=131.720, player_2/loss=58.874, rew=0.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 350.50it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=144.030, player_2/loss=72.166, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 350.51it/s, env_step=19456, len=30, n/ep=2, n/st=64, player_1/loss=149.906, player_2/loss=83.748, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 348.48it/s, env_step=1024, len=22, n/ep=2, n/st=64, player_1/loss=99.860, player_2/loss=38.680, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.80it/s, env_step=2048, len=35, n/ep=2, n/st=64, player_1/loss=96.065, player_2/loss=89.134, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.86it/s, env_step=3072, len=26, n/ep=3, n/st=64, player_1/loss=79.105, player_2/loss=77.214, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.71it/s, env_step=4096, len=20, n/ep=4, n/st=64, player_1/loss=73.583, player_2/loss=43.469, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.08it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=83.551, player_2/loss=103.269, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 350.08it/s, env_step=6144, len=23, n/ep=3, n/st=64, player_1/loss=99.803, player_2/loss=138.751, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.05it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=102.781, player_2/loss=96.110, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 350.47it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=105.565, player_2/loss=121.574, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 350.90it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=110.371, player_2/loss=133.535, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 349.46it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=83.928, player_2/loss=144.100, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 348.09it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=59.979, player_2/loss=171.945, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 348.61it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=51.380, player_2/loss=205.949, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 347.49it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=71.393, player_2/loss=198.145, rew=5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 346.84it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=51.041, player_2/loss=218.461, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 352.34it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=23.525, player_2/loss=219.659, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 348.46it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=45.443, player_2/loss=193.233, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 350.53it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=51.086, player_2/loss=149.416, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 347.42it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=25.030, player_2/loss=200.406, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 350.76it/s, env_step=19456, len=10, n/ep=7, n/st=64, player_1/loss=37.919, player_2/loss=266.854, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 351.92it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=74.637, player_2/loss=171.569, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 345.86it/s, env_step=2048, len=20, n/ep=4, n/st=64, player_1/loss=83.255, player_2/loss=120.409, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 360.00it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=77.575, rew=0.00]           


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 348.90it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=93.965, player_2/loss=27.755, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 350.01it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=94.957, rew=-15.00]         


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 347.92it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=70.667, player_2/loss=93.189, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 351.69it/s, env_step=7168, len=34, n/ep=1, n/st=64, player_1/loss=76.303, player_2/loss=77.942, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 348.79it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=85.222, player_2/loss=51.063, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 347.22it/s, env_step=9216, len=16, n/ep=3, n/st=64, player_1/loss=102.886, player_2/loss=39.388, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 349.36it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=87.525, player_2/loss=45.501, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 350.68it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=84.170, player_2/loss=64.660, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 350.33it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=89.935, player_2/loss=46.853, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 349.26it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=95.919, player_2/loss=54.618, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 353.30it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=91.357, player_2/loss=51.589, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 348.99it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=77.584, player_2/loss=26.137, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 351.41it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=88.077, player_2/loss=48.696, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 347.67it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=104.013, player_2/loss=57.286, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 352.19it/s, env_step=18432, len=20, n/ep=2, n/st=64, player_1/loss=111.399, player_2/loss=52.452, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 348.40it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=117.923, player_2/loss=70.785, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 350.71it/s, env_step=1024, len=28, n/ep=3, n/st=64, player_1/loss=79.891, player_2/loss=102.965, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.35it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=66.568, player_2/loss=149.900, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 349.11it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=38.908, player_2/loss=157.762, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 349.26it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=32.822, player_2/loss=148.446, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 349.65it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=71.248, player_2/loss=124.431, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 348.85it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=75.899, player_2/loss=109.576, rew=5.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 347.69it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=46.323, player_2/loss=155.902, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 345.80it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=37.660, player_2/loss=174.055, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 350.60it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=64.444, player_2/loss=217.198, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 349.09it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=58.375, player_2/loss=199.456, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 349.06it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_2/loss=196.608, rew=25.00]       


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.59it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=121.165, player_2/loss=158.651, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 350.36it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=77.611, player_2/loss=198.763, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 349.30it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=33.822, player_2/loss=193.295, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 351.00it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=28.979, player_2/loss=168.165, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 347.65it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=46.557, player_2/loss=203.930, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 350.65it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=42.473, player_2/loss=212.669, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 347.43it/s, env_step=18432, len=15, n/ep=6, n/st=64, player_1/loss=51.373, player_2/loss=213.935, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.86it/s, env_step=19456, len=13, n/ep=4, n/st=64, player_1/loss=72.724, player_2/loss=187.051, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 350.88it/s, env_step=1024, len=23, n/ep=3, n/st=64, player_1/loss=128.235, player_2/loss=78.191, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.70it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=65.932, player_2/loss=123.057, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.34it/s, env_step=3072, len=24, n/ep=2, n/st=64, player_1/loss=43.462, player_2/loss=162.728, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.46it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=80.966, player_2/loss=177.915, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 349.56it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=142.111, rew=25.00]         


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 347.66it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=140.871, player_2/loss=109.394, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.08it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=160.410, player_2/loss=69.966, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.57it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=223.963, player_2/loss=36.700, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 350.71it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=263.969, player_2/loss=47.573, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 349.69it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=212.361, player_2/loss=63.318, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 349.74it/s, env_step=11264, len=19, n/ep=4, n/st=64, player_1/loss=194.755, player_2/loss=58.501, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 348.61it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=150.176, player_2/loss=53.357, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 350.04it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=105.756, player_2/loss=64.686, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 347.70it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=108.878, player_2/loss=48.504, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 351.46it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=157.143, player_2/loss=40.645, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 347.23it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=187.473, player_2/loss=29.146, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 350.53it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=159.391, player_2/loss=42.030, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 349.51it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=144.919, player_2/loss=60.119, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 350.25it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=153.082, player_2/loss=78.155, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 346.38it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=116.985, player_2/loss=91.569, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.10it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=120.716, player_2/loss=97.329, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 346.59it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=115.685, player_2/loss=129.249, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 349.38it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=99.883, player_2/loss=137.034, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 347.53it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=75.935, player_2/loss=238.049, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 347.23it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=104.827, player_2/loss=280.041, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 350.33it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=87.504, player_2/loss=263.946, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 349.31it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=55.518, player_2/loss=178.364, rew=12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 349.00it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=48.505, player_2/loss=173.815, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 349.43it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=19.223, player_2/loss=192.808, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 350.71it/s, env_step=11264, len=13, n/ep=6, n/st=64, player_1/loss=75.837, player_2/loss=195.810, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.42it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=89.313, player_2/loss=203.543, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 349.02it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=62.991, player_2/loss=242.168, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 352.46it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=43.210, player_2/loss=215.160, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 345.74it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=16.565, player_2/loss=207.478, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 351.12it/s, env_step=16384, len=19, n/ep=4, n/st=64, player_1/loss=31.426, player_2/loss=201.728, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 348.18it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=46.911, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 351.84it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=41.809, player_2/loss=238.893, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.04it/s, env_step=19456, len=11, n/ep=4, n/st=64, player_1/loss=31.433, player_2/loss=248.855, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 352.61it/s, env_step=1024, len=30, n/ep=2, n/st=64, player_1/loss=71.975, player_2/loss=118.430, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.71it/s, env_step=2048, len=36, n/ep=2, n/st=64, player_1/loss=113.221, player_2/loss=100.483, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.57it/s, env_step=3072, len=32, n/ep=2, n/st=64, player_1/loss=113.782, player_2/loss=75.356, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 349.38it/s, env_step=4096, len=31, n/ep=2, n/st=64, player_1/loss=103.809, rew=25.00]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 352.76it/s, env_step=5120, len=31, n/ep=2, n/st=64, player_1/loss=159.580, player_2/loss=45.873, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 351.58it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=140.775, player_2/loss=83.033, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 343.44it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=137.040, player_2/loss=139.180, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 350.14it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=151.315, player_2/loss=127.345, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 354.65it/s, env_step=9216, len=29, n/ep=2, n/st=64, player_1/loss=224.241, player_2/loss=77.827, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 352.80it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=231.141, rew=-25.00]      


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 349.89it/s, env_step=11264, len=21, n/ep=4, n/st=64, player_1/loss=153.453, player_2/loss=69.134, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 352.49it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=129.104, player_2/loss=74.102, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 349.47it/s, env_step=13312, len=20, n/ep=4, n/st=64, player_1/loss=146.913, player_2/loss=39.413, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 351.76it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=148.008, player_2/loss=17.335, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 349.64it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=112.269, player_2/loss=10.917, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 351.64it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=123.300, player_2/loss=26.821, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 347.85it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=126.543, player_2/loss=41.746, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 353.75it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=133.665, player_2/loss=50.879, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 348.91it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=150.373, player_2/loss=62.075, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 351.75it/s, env_step=1024, len=22, n/ep=2, n/st=64, player_1/loss=149.717, player_2/loss=83.784, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 349.52it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=117.514, player_2/loss=59.818, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 347.61it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=121.966, player_2/loss=95.174, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 350.66it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=140.950, player_2/loss=83.292, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 349.77it/s, env_step=5120, len=12, n/ep=4, n/st=64, player_1/loss=121.287, player_2/loss=74.344, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 351.38it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=144.491, player_2/loss=59.726, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 346.21it/s, env_step=7168, len=12, n/ep=4, n/st=64, player_1/loss=153.645, player_2/loss=90.789, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 350.47it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=156.189, player_2/loss=169.164, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 347.61it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=149.935, player_2/loss=226.526, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 351.05it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=121.571, player_2/loss=147.574, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 348.33it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=107.597, player_2/loss=127.914, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 349.72it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=119.490, player_2/loss=154.827, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 348.89it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=125.812, player_2/loss=145.821, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 350.41it/s, env_step=14336, len=28, n/ep=2, n/st=64, player_1/loss=108.416, player_2/loss=153.262, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 347.30it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=68.064, player_2/loss=205.343, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 347.31it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=56.149, player_2/loss=245.450, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 350.38it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_2/loss=273.749, rew=-8.33]       


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 347.83it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=136.956, player_2/loss=262.293, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 348.11it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=112.636, player_2/loss=319.320, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 350.87it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=113.116, player_2/loss=365.471, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.29it/s, env_step=2048, len=7, n/ep=7, n/st=64, player_1/loss=103.478, player_2/loss=244.570, rew=-17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.11it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=153.517, player_2/loss=163.216, rew=-16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 348.07it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=172.889, player_2/loss=140.706, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 352.75it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=149.029, player_2/loss=265.169, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 350.99it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_2/loss=229.265, rew=25.00]         


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 350.29it/s, env_step=7168, len=25, n/ep=2, n/st=64, player_1/loss=101.840, player_2/loss=52.233, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 349.70it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=108.113, player_2/loss=62.972, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 351.04it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=134.577, player_2/loss=52.596, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 349.31it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=104.608, player_2/loss=43.631, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 351.67it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=80.201, player_2/loss=69.079, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 352.12it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=104.953, player_2/loss=91.294, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 349.18it/s, env_step=13312, len=27, n/ep=2, n/st=64, player_1/loss=98.402, player_2/loss=81.487, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 351.87it/s, env_step=14336, len=31, n/ep=2, n/st=64, player_1/loss=97.085, player_2/loss=89.458, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 350.08it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=136.687, player_2/loss=134.208, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 352.62it/s, env_step=16384, len=31, n/ep=2, n/st=64, player_1/loss=97.492, player_2/loss=104.739, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 349.18it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=43.752, player_2/loss=64.879, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 351.89it/s, env_step=18432, len=22, n/ep=3, n/st=64, player_1/loss=93.903, player_2/loss=63.198, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 349.45it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=110.678, player_2/loss=71.476, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 350.21it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=109.873, player_2/loss=73.178, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.89it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=105.555, player_2/loss=82.381, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.68it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=102.429, player_2/loss=113.990, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.11it/s, env_step=4096, len=23, n/ep=2, n/st=64, player_1/loss=76.594, player_2/loss=100.787, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 352.62it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=39.556, player_2/loss=93.054, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 350.59it/s, env_step=6144, len=19, n/ep=2, n/st=64, player_1/loss=46.161, player_2/loss=93.640, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.90it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=51.133, player_2/loss=88.484, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.71it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=43.252, player_2/loss=107.008, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 349.32it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=56.107, player_2/loss=112.278, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 348.27it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=70.982, player_2/loss=138.058, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 351.16it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=66.922, player_2/loss=127.426, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 349.96it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=41.902, player_2/loss=156.580, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 350.22it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=34.631, player_2/loss=149.937, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 349.91it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=33.755, player_2/loss=156.479, rew=15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 353.61it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=45.873, player_2/loss=171.663, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 348.75it/s, env_step=16384, len=19, n/ep=4, n/st=64, player_1/loss=73.101, player_2/loss=124.557, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 352.64it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=61.910, player_2/loss=103.502, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 349.14it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=26.863, player_2/loss=118.207, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 349.74it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=32.283, player_2/loss=113.338, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 349.72it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=16.898, player_2/loss=77.833, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 354.87it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=81.621, player_2/loss=113.734, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 348.39it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=140.144, player_2/loss=128.377, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 349.93it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=191.416, player_2/loss=90.217, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 347.95it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=264.367, player_2/loss=68.411, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 351.40it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=246.234, player_2/loss=65.875, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 349.56it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=231.957, player_2/loss=60.251, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 351.66it/s, env_step=8192, len=9, n/ep=8, n/st=64, player_1/loss=261.505, player_2/loss=35.764, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 350.88it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=258.472, player_2/loss=50.625, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 350.27it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=291.123, player_2/loss=27.106, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 344.24it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=301.645, player_2/loss=11.451, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 347.34it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=313.760, player_2/loss=81.345, rew=16.67]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 349.64it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=322.983, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 351.52it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=249.526, player_2/loss=29.848, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 348.67it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=222.316, player_2/loss=42.103, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 351.01it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=237.334, player_2/loss=41.185, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 347.49it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=259.898, player_2/loss=22.340, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 351.24it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=266.542, player_2/loss=51.486, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 348.33it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=242.419, player_2/loss=71.758, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 352.58it/s, env_step=1024, len=9, n/ep=9, n/st=64, player_1/loss=150.935, player_2/loss=129.399, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 349.85it/s, env_step=2048, len=8, n/ep=10, n/st=64, player_2/loss=375.432, rew=25.00]         


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 350.68it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=60.368, player_2/loss=539.605, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 347.92it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=79.797, player_2/loss=509.728, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 348.96it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=74.930, player_2/loss=533.409, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 347.33it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=54.674, rew=13.89]           


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 348.13it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=50.794, player_2/loss=619.546, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 346.72it/s, env_step=8192, len=7, n/ep=7, n/st=64, player_1/loss=61.767, player_2/loss=573.767, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 352.48it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=70.815, player_2/loss=495.159, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 347.77it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=78.989, player_2/loss=529.493, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 350.34it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=80.445, player_2/loss=560.822, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 349.92it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=40.529, player_2/loss=433.256, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 347.48it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=24.945, player_2/loss=511.351, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 348.03it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=18.037, player_2/loss=538.273, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 347.36it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=8.674, player_2/loss=553.709, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 350.69it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=6.441, player_2/loss=634.656, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 348.17it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=12.223, player_2/loss=606.975, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 350.23it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=31.391, player_2/loss=554.177, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 348.02it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=23.412, player_2/loss=543.130, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 352.06it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=42.239, player_2/loss=489.149, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.00it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=30.638, player_2/loss=401.436, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.05it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=11.721, player_2/loss=290.295, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.44it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=38.893, player_2/loss=226.951, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.04it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=87.865, player_2/loss=187.458, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 353.03it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=125.931, player_2/loss=135.386, rew=-16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 351.15it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=106.246, player_2/loss=98.784, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 351.73it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=136.946, player_2/loss=111.630, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 350.53it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=105.031, rew=8.33]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 351.07it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=73.737, player_2/loss=108.467, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 348.81it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=57.230, player_2/loss=115.788, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 353.29it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=97.290, player_2/loss=89.208, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 349.63it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=110.634, player_2/loss=100.256, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 350.11it/s, env_step=14336, len=28, n/ep=2, n/st=64, player_1/loss=195.280, player_2/loss=97.570, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 352.25it/s, env_step=15360, len=29, n/ep=3, n/st=64, player_1/loss=207.093, player_2/loss=82.109, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 350.88it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=176.729, player_2/loss=59.402, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 352.15it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=133.830, rew=-8.33]       


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 348.76it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=139.265, player_2/loss=93.662, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 354.49it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=135.345, player_2/loss=110.603, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 350.53it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=44.027, player_2/loss=132.635, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.61it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=48.671, player_2/loss=79.425, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.17it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=46.769, player_2/loss=45.731, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.50it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=45.469, player_2/loss=71.811, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 353.45it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=57.523, player_2/loss=102.452, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.36it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=38.568, player_2/loss=76.074, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.15it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=18.435, player_2/loss=66.508, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 348.16it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=45.489, player_2/loss=96.625, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 352.44it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_2/loss=113.969, rew=25.00]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 347.88it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=16.493, rew=12.50]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 352.80it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=16.074, player_2/loss=86.566, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 349.79it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=13.492, player_2/loss=85.181, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 349.65it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=11.835, player_2/loss=100.178, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 348.40it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=34.294, player_2/loss=112.659, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 347.00it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=75.855, player_2/loss=122.871, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 351.99it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=102.244, player_2/loss=123.481, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 350.77it/s, env_step=17408, len=10, n/ep=7, n/st=64, player_1/loss=64.829, player_2/loss=150.298, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 350.43it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=44.454, player_2/loss=145.487, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 348.77it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=25.687, player_2/loss=149.889, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 350.58it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=14.269, player_2/loss=116.318, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.41it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=32.865, player_2/loss=116.106, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.78it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=73.263, player_2/loss=124.878, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 348.45it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=62.209, player_2/loss=121.992, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.26it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=56.913, player_2/loss=76.948, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 350.56it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=24.751, rew=-12.50]         


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 347.45it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=27.741, player_2/loss=93.766, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 343.00it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=24.472, player_2/loss=60.547, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 348.35it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=81.275, player_2/loss=89.132, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 352.51it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=83.964, player_2/loss=82.390, rew=-12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 349.98it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=129.624, rew=-12.50]      


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 352.96it/s, env_step=12288, len=15, n/ep=5, n/st=64, player_1/loss=188.068, player_2/loss=81.433, rew=-15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 349.73it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=253.390, player_2/loss=96.514, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 348.16it/s, env_step=14336, len=21, n/ep=2, n/st=64, player_1/loss=338.800, player_2/loss=94.080, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 348.17it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=259.300, player_2/loss=81.435, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 348.28it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_2/loss=87.438, rew=-25.00]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 350.31it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=199.090, player_2/loss=82.852, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 352.23it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=113.131, player_2/loss=107.451, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:02, 352.33it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=65.019, player_2/loss=87.147, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 349.56it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=59.869, player_2/loss=36.909, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.34it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=57.378, player_2/loss=50.960, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.59it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=54.856, player_2/loss=44.996, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.94it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=30.442, rew=12.50]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 349.48it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=8.790, player_2/loss=27.110, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 349.93it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=8.011, player_2/loss=26.199, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 351.87it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=15.553, player_2/loss=27.731, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 350.09it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=13.606, player_2/loss=23.160, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 351.74it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=16.887, player_2/loss=48.159, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 349.21it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=16.065, player_2/loss=45.267, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 348.79it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=11.310, player_2/loss=24.128, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 350.09it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=6.840, player_2/loss=25.015, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 347.93it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=5.831, player_2/loss=31.786, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 343.14it/s, env_step=14336, len=17, n/ep=3, n/st=64, player_1/loss=8.325, player_2/loss=28.411, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 352.23it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=8.018, player_2/loss=19.809, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 347.16it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=6.876, player_2/loss=24.178, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 348.91it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=5.780, player_2/loss=30.469, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 344.32it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=2.843, player_2/loss=28.807, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 346.98it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=39.723, player_2/loss=37.530, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 346.73it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=13.926, player_2/loss=31.090, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.84it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=34.577, player_2/loss=36.184, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.05it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=55.417, player_2/loss=32.225, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.80it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=31.441, player_2/loss=36.186, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 353.04it/s, env_step=5120, len=25, n/ep=3, n/st=64, player_1/loss=23.452, player_2/loss=42.118, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 349.49it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=40.386, player_2/loss=42.776, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.45it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=17.320, player_2/loss=40.664, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 348.09it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=16.566, player_2/loss=45.819, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 352.37it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=48.240, player_2/loss=50.393, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 347.18it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=105.847, player_2/loss=72.140, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 347.99it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=216.504, player_2/loss=77.064, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 349.27it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=184.900, player_2/loss=73.683, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 350.27it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=78.257, player_2/loss=96.189, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 347.15it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=57.983, player_2/loss=89.889, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 352.25it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=53.039, player_2/loss=50.235, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 348.06it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=135.675, player_2/loss=53.190, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 349.77it/s, env_step=17408, len=24, n/ep=3, n/st=64, player_1/loss=166.508, player_2/loss=52.400, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 346.46it/s, env_step=18432, len=31, n/ep=2, n/st=64, player_1/loss=93.204, player_2/loss=48.387, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:02, 348.69it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=57.939, player_2/loss=45.148, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 348.01it/s, env_step=1024, len=28, n/ep=2, n/st=64, player_1/loss=60.866, player_2/loss=48.359, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.47it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=76.864, player_2/loss=92.621, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.47it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=55.500, player_2/loss=112.221, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.49it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=63.635, player_2/loss=87.067, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 348.37it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=90.375, player_2/loss=83.537, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 352.98it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=118.705, player_2/loss=171.322, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 347.80it/s, env_step=7168, len=27, n/ep=2, n/st=64, player_1/loss=113.587, player_2/loss=182.449, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 350.78it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=83.767, player_2/loss=84.531, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 349.84it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=91.549, player_2/loss=62.944, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 350.51it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=61.111, player_2/loss=52.581, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 346.93it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=58.253, player_2/loss=65.943, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 351.14it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=57.211, player_2/loss=56.331, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 349.22it/s, env_step=13312, len=21, n/ep=2, n/st=64, player_1/loss=42.407, player_2/loss=37.760, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 349.87it/s, env_step=14336, len=31, n/ep=2, n/st=64, player_1/loss=68.466, player_2/loss=62.819, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 348.45it/s, env_step=15360, len=23, n/ep=2, n/st=64, player_1/loss=78.510, player_2/loss=70.025, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 351.61it/s, env_step=16384, len=25, n/ep=2, n/st=64, player_1/loss=63.568, player_2/loss=78.856, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 349.12it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=83.299, player_2/loss=61.341, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 352.51it/s, env_step=18432, len=26, n/ep=3, n/st=64, player_1/loss=112.829, player_2/loss=82.468, rew=-8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 346.88it/s, env_step=19456, len=23, n/ep=3, n/st=64, player_1/loss=122.734, player_2/loss=123.399, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 351.85it/s, env_step=1024, len=23, n/ep=2, n/st=64, player_1/loss=34.211, player_2/loss=27.194, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.70it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=34.718, player_2/loss=28.671, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.99it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=151.657, player_2/loss=61.221, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 348.75it/s, env_step=4096, len=31, n/ep=2, n/st=64, player_1/loss=147.119, player_2/loss=129.383, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.03it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=66.929, player_2/loss=128.768, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 349.65it/s, env_step=6144, len=31, n/ep=2, n/st=64, player_1/loss=53.881, player_2/loss=104.906, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.94it/s, env_step=7168, len=26, n/ep=3, n/st=64, player_1/loss=69.142, player_2/loss=98.781, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 345.56it/s, env_step=8192, len=29, n/ep=2, n/st=64, player_1/loss=73.485, player_2/loss=55.619, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 352.40it/s, env_step=9216, len=25, n/ep=3, n/st=64, player_1/loss=71.134, player_2/loss=47.992, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 351.10it/s, env_step=10240, len=25, n/ep=2, n/st=64, player_1/loss=20.856, player_2/loss=53.058, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 352.69it/s, env_step=11264, len=27, n/ep=3, n/st=64, player_1/loss=74.677, player_2/loss=94.712, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 349.23it/s, env_step=12288, len=30, n/ep=2, n/st=64, player_1/loss=80.718, player_2/loss=90.747, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 353.03it/s, env_step=13312, len=29, n/ep=2, n/st=64, player_1/loss=69.735, player_2/loss=70.167, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 354.01it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=77.338, player_2/loss=66.379, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 350.19it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=94.116, player_2/loss=48.762, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 350.96it/s, env_step=16384, len=27, n/ep=2, n/st=64, player_1/loss=117.151, player_2/loss=44.954, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 348.52it/s, env_step=17408, len=32, n/ep=2, n/st=64, player_1/loss=70.134, player_2/loss=74.020, rew=0.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 349.81it/s, env_step=18432, len=27, n/ep=2, n/st=64, player_1/loss=126.136, player_2/loss=104.634, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 348.14it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=167.064, player_2/loss=102.673, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 347.77it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=253.573, player_2/loss=123.595, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 350.44it/s, env_step=2048, len=29, n/ep=2, n/st=64, player_1/loss=173.282, player_2/loss=77.080, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 346.87it/s, env_step=3072, len=26, n/ep=2, n/st=64, player_1/loss=55.998, player_2/loss=55.306, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 347.64it/s, env_step=4096, len=31, n/ep=2, n/st=64, player_1/loss=85.398, player_2/loss=65.564, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 350.14it/s, env_step=5120, len=26, n/ep=2, n/st=64, player_1/loss=123.891, player_2/loss=58.143, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 349.45it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=85.860, player_2/loss=66.566, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 351.02it/s, env_step=7168, len=20, n/ep=4, n/st=64, player_1/loss=80.753, player_2/loss=85.476, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 349.88it/s, env_step=8192, len=25, n/ep=3, n/st=64, player_1/loss=98.291, player_2/loss=90.163, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 350.87it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=57.400, player_2/loss=59.570, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 349.13it/s, env_step=10240, len=25, n/ep=3, n/st=64, player_1/loss=48.497, player_2/loss=74.175, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 351.83it/s, env_step=11264, len=21, n/ep=2, n/st=64, player_1/loss=39.441, player_2/loss=81.708, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 347.54it/s, env_step=12288, len=25, n/ep=2, n/st=64, player_1/loss=58.806, player_2/loss=67.239, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 350.93it/s, env_step=13312, len=25, n/ep=2, n/st=64, player_1/loss=78.647, player_2/loss=117.484, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 348.59it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=51.144, player_2/loss=100.816, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 348.19it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=64.145, player_2/loss=101.607, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 345.80it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_2/loss=100.297, rew=15.00]       


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 349.43it/s, env_step=17408, len=13, n/ep=4, n/st=64, player_1/loss=117.309, player_2/loss=105.160, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 346.21it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=57.070, player_2/loss=88.306, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 349.72it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=54.761, player_2/loss=103.693, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 350.47it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=65.630, player_2/loss=145.981, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.97it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=86.626, player_2/loss=102.884, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.85it/s, env_step=3072, len=7, n/ep=10, n/st=64, player_1/loss=89.509, player_2/loss=56.289, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 346.37it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=108.575, player_2/loss=89.879, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 348.95it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=143.749, player_2/loss=134.704, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 349.69it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=171.268, player_2/loss=155.696, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 349.48it/s, env_step=7168, len=7, n/ep=10, n/st=64, player_1/loss=148.764, player_2/loss=163.677, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 349.98it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=114.625, player_2/loss=112.217, rew=-19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 348.01it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=123.328, player_2/loss=77.363, rew=-13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 348.58it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=198.712, player_2/loss=117.569, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 346.72it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=169.609, player_2/loss=122.284, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 348.80it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=220.122, player_2/loss=118.208, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 347.56it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=196.254, player_2/loss=159.941, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 352.22it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=210.600, player_2/loss=171.977, rew=-19.44]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 349.80it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=421.392, player_2/loss=182.279, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 350.53it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=501.476, player_2/loss=152.741, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 349.80it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=535.837, player_2/loss=76.222, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 346.73it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=505.096, player_2/loss=58.069, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 348.88it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=360.017, player_2/loss=123.131, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 347.00it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=105.852, player_2/loss=318.915, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 347.81it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=82.429, player_2/loss=278.685, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 349.88it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=109.963, player_2/loss=329.716, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 349.94it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=126.075, rew=25.00]         


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 354.88it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=129.553, player_2/loss=292.160, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 348.04it/s, env_step=6144, len=16, n/ep=3, n/st=64, player_1/loss=106.703, player_2/loss=246.570, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 350.57it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=40.005, player_2/loss=301.805, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 347.57it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=22.923, player_2/loss=369.372, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 348.93it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=47.013, player_2/loss=381.686, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 347.19it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=77.080, player_2/loss=336.635, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 345.31it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=94.748, player_2/loss=317.010, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 346.51it/s, env_step=12288, len=7, n/ep=7, n/st=64, player_1/loss=54.431, player_2/loss=302.076, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 347.75it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=82.271, player_2/loss=312.471, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 349.86it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=59.991, player_2/loss=282.820, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 345.05it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=12.590, rew=12.50]         


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 347.14it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=6.597, player_2/loss=303.573, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 347.63it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=12.318, player_2/loss=354.328, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 349.63it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=99.076, player_2/loss=348.373, rew=19.44]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 344.78it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=92.248, player_2/loss=317.956, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 352.06it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=92.108, player_2/loss=246.325, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.17it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=61.847, player_2/loss=233.334, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.66it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=54.947, player_2/loss=193.119, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 350.42it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=74.950, player_2/loss=190.382, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 345.74it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=49.402, player_2/loss=168.943, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 347.95it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=34.163, player_2/loss=153.734, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 346.58it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=47.681, player_2/loss=138.045, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 351.34it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=68.292, player_2/loss=178.974, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 348.06it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=102.119, player_2/loss=175.548, rew=-13.89]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 350.99it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=130.901, player_2/loss=150.033, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 348.83it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=149.923, player_2/loss=111.629, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 351.12it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=169.813, player_2/loss=118.163, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 348.30it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=179.251, player_2/loss=138.035, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 353.37it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=148.356, player_2/loss=129.875, rew=-13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 348.07it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_2/loss=89.952, rew=0.00]         


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 352.62it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=125.068, player_2/loss=81.502, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 347.43it/s, env_step=17408, len=20, n/ep=2, n/st=64, player_2/loss=47.929, rew=0.00]         


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 350.59it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=169.276, player_2/loss=48.106, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 351.49it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=163.043, player_2/loss=75.306, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 351.31it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=161.554, player_2/loss=57.435, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.47it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=117.913, player_2/loss=128.126, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 351.28it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=97.798, player_2/loss=177.631, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 347.94it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=105.081, player_2/loss=166.417, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 354.46it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=111.582, player_2/loss=164.843, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 347.43it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=116.677, player_2/loss=170.813, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 362.98it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=85.387, player_2/loss=193.164, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 348.52it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=75.514, player_2/loss=201.026, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 351.84it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=82.445, player_2/loss=186.831, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 346.71it/s, env_step=10240, len=27, n/ep=2, n/st=64, player_1/loss=65.342, player_2/loss=185.424, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 349.40it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=53.796, player_2/loss=225.574, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 348.98it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=42.190, player_2/loss=165.755, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 347.13it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=81.621, player_2/loss=142.817, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 347.44it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=90.408, player_2/loss=230.873, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 353.05it/s, env_step=15360, len=20, n/ep=4, n/st=64, player_1/loss=66.364, player_2/loss=228.221, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 347.52it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=56.018, player_2/loss=164.675, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 350.49it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=34.134, player_2/loss=140.322, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 347.82it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=23.214, player_2/loss=243.813, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 347.56it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_1/loss=80.071, player_2/loss=256.687, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 348.08it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=38.788, player_2/loss=157.776, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 354.42it/s, env_step=2048, len=22, n/ep=4, n/st=64, player_1/loss=32.553, player_2/loss=144.041, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 350.10it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=24.946, player_2/loss=129.520, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 350.42it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=27.743, player_2/loss=95.779, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 346.00it/s, env_step=5120, len=16, n/ep=3, n/st=64, player_1/loss=34.509, player_2/loss=65.593, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 352.96it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=31.388, player_2/loss=54.110, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 343.44it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=25.329, player_2/loss=94.982, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 352.13it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=49.273, player_2/loss=107.416, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 350.12it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=43.037, player_2/loss=109.527, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 351.17it/s, env_step=10240, len=19, n/ep=4, n/st=64, player_1/loss=56.964, player_2/loss=73.234, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 347.70it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=85.355, player_2/loss=86.377, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 353.21it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=57.631, player_2/loss=64.793, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 352.81it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=57.895, player_2/loss=60.973, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 349.41it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=71.922, player_2/loss=77.416, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 352.17it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=97.014, player_2/loss=76.801, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 350.95it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=139.503, player_2/loss=124.377, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 353.55it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=155.434, rew=-12.50]      


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 350.02it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=137.370, player_2/loss=135.231, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 352.88it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=161.205, player_2/loss=134.280, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 347.68it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=54.881, player_2/loss=83.351, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.92it/s, env_step=2048, len=25, n/ep=3, n/st=64, player_1/loss=115.774, player_2/loss=81.850, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.28it/s, env_step=3072, len=14, n/ep=3, n/st=64, player_1/loss=128.603, player_2/loss=104.477, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.76it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=85.534, rew=10.71]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 351.04it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=86.912, player_2/loss=211.803, rew=5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 345.85it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=57.517, player_2/loss=248.899, rew=13.89]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 349.86it/s, env_step=7168, len=9, n/ep=9, n/st=64, player_1/loss=19.125, player_2/loss=264.802, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 345.78it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=23.175, player_2/loss=267.104, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 357.11it/s, env_step=9216, len=7, n/ep=7, n/st=64, player_1/loss=55.439, player_2/loss=228.468, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 347.55it/s, env_step=10240, len=8, n/ep=9, n/st=64, player_1/loss=56.766, player_2/loss=205.795, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 350.42it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=31.068, player_2/loss=196.317, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 349.15it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=85.997, player_2/loss=191.556, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 349.85it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=101.294, player_2/loss=203.341, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 349.05it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=38.723, player_2/loss=198.173, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 345.40it/s, env_step=15360, len=10, n/ep=7, n/st=64, player_1/loss=38.845, player_2/loss=163.540, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 347.08it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=65.392, player_2/loss=171.690, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 349.10it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=57.728, player_2/loss=203.991, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 347.21it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=55.856, player_2/loss=200.770, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 347.11it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=34.165, player_2/loss=163.233, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 350.83it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=93.191, player_2/loss=89.284, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.24it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=79.729, player_2/loss=94.332, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.12it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=93.645, player_2/loss=91.581, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.01it/s, env_step=4096, len=30, n/ep=2, n/st=64, player_1/loss=60.276, player_2/loss=67.128, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.98it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=50.068, player_2/loss=56.714, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.54it/s, env_step=6144, len=23, n/ep=2, n/st=64, player_1/loss=51.070, player_2/loss=57.467, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.64it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=69.242, player_2/loss=79.426, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.13it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=80.329, player_2/loss=95.542, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 351.34it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=100.101, player_2/loss=73.366, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 348.91it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=96.598, player_2/loss=76.509, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 350.38it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=63.815, player_2/loss=53.456, rew=-16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 347.77it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=51.246, player_2/loss=47.605, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 349.29it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_2/loss=36.183, rew=-15.00]       


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 351.40it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=91.090, player_2/loss=67.466, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 348.69it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=82.808, player_2/loss=88.656, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 350.35it/s, env_step=16384, len=13, n/ep=6, n/st=64, player_1/loss=35.158, player_2/loss=67.216, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 350.70it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=46.457, player_2/loss=36.977, rew=-15.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 351.37it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=52.646, player_2/loss=29.471, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 351.36it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=108.329, player_2/loss=32.101, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 349.58it/s, env_step=1024, len=16, n/ep=3, n/st=64, player_1/loss=79.765, player_2/loss=32.293, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.32it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=80.164, player_2/loss=109.666, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.49it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=74.506, player_2/loss=110.458, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.32it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=64.372, player_2/loss=68.703, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 351.38it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=75.716, player_2/loss=94.765, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 346.08it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=64.673, player_2/loss=81.876, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 349.52it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=41.226, player_2/loss=49.088, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 351.57it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=25.628, player_2/loss=68.353, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 346.40it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=36.566, player_2/loss=67.232, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 349.27it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=45.687, player_2/loss=64.887, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 353.40it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=43.401, player_2/loss=116.693, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 356.32it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=15.690, player_2/loss=116.564, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 361.77it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=17.715, player_2/loss=87.055, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 349.14it/s, env_step=14336, len=12, n/ep=4, n/st=64, player_1/loss=21.836, player_2/loss=76.506, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 347.26it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=77.891, player_2/loss=114.248, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 353.07it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=101.371, player_2/loss=129.664, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 349.11it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=96.637, player_2/loss=92.098, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 346.20it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=30.634, player_2/loss=57.970, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 349.24it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=21.294, player_2/loss=51.794, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 348.76it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=77.410, player_2/loss=29.948, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.94it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=73.511, player_2/loss=56.593, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 346.25it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=55.561, player_2/loss=61.806, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.75it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=32.071, player_2/loss=37.026, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 351.72it/s, env_step=5120, len=9, n/ep=8, n/st=64, player_1/loss=74.454, player_2/loss=59.612, rew=-18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 347.55it/s, env_step=6144, len=25, n/ep=3, n/st=64, player_1/loss=124.938, player_2/loss=111.325, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.87it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=135.261, player_2/loss=138.759, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 347.98it/s, env_step=8192, len=15, n/ep=3, n/st=64, player_1/loss=162.354, player_2/loss=176.607, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 351.70it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=143.016, player_2/loss=167.643, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 351.13it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=72.475, player_2/loss=94.313, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 346.82it/s, env_step=11264, len=15, n/ep=5, n/st=64, player_1/loss=45.360, player_2/loss=45.760, rew=-15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 352.15it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=53.158, player_2/loss=45.418, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 349.22it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=64.433, player_2/loss=54.166, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 346.09it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=74.413, player_2/loss=66.473, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 350.91it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=65.898, player_2/loss=53.284, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 348.69it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=44.964, player_2/loss=35.830, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 351.59it/s, env_step=17408, len=20, n/ep=4, n/st=64, player_1/loss=50.391, player_2/loss=33.186, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 346.35it/s, env_step=18432, len=26, n/ep=2, n/st=64, player_1/loss=55.906, player_2/loss=43.383, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 348.42it/s, env_step=19456, len=28, n/ep=2, n/st=64, player_1/loss=65.548, player_2/loss=65.960, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 347.57it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=86.114, player_2/loss=90.245, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 347.96it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=92.637, player_2/loss=108.900, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 348.94it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=53.791, player_2/loss=129.225, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 344.67it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=30.698, player_2/loss=125.016, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 349.02it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=44.653, player_2/loss=120.590, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 348.67it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=45.386, player_2/loss=129.965, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 347.05it/s, env_step=7168, len=9, n/ep=5, n/st=64, player_1/loss=54.538, player_2/loss=116.328, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 348.30it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=62.382, player_2/loss=123.534, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 348.31it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=32.111, player_2/loss=134.611, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 348.72it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=20.132, player_2/loss=129.731, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 348.58it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=38.505, player_2/loss=142.114, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 347.39it/s, env_step=12288, len=13, n/ep=6, n/st=64, player_1/loss=45.623, player_2/loss=130.662, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 348.21it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=26.656, player_2/loss=120.839, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 345.15it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=24.837, player_2/loss=112.349, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 345.29it/s, env_step=15360, len=9, n/ep=8, n/st=64, player_1/loss=12.450, player_2/loss=135.148, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 352.99it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=10.630, player_2/loss=158.484, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 349.20it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=51.065, player_2/loss=144.334, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 345.02it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=105.683, rew=25.00]       


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 349.38it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=52.339, player_2/loss=140.085, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 349.46it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=29.534, player_2/loss=56.319, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.30it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=34.608, player_2/loss=51.416, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 349.42it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=79.744, player_2/loss=77.983, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 347.35it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=117.261, player_2/loss=131.780, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 350.44it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=142.128, player_2/loss=163.095, rew=-15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 346.33it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=168.518, player_2/loss=120.652, rew=-5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 348.34it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=198.075, player_2/loss=110.143, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 350.70it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=201.387, player_2/loss=125.979, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 347.01it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=203.551, player_2/loss=79.820, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 351.54it/s, env_step=10240, len=16, n/ep=5, n/st=64, player_1/loss=179.458, player_2/loss=95.829, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 345.18it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=223.072, player_2/loss=109.166, rew=15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 351.14it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=287.003, player_2/loss=101.252, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 350.23it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=243.239, player_2/loss=85.335, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 345.74it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=171.234, player_2/loss=93.521, rew=5.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 347.98it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=285.477, player_2/loss=60.696, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 346.14it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_2/loss=91.152, rew=0.00]         


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 352.79it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=230.345, player_2/loss=116.744, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 349.87it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=249.724, player_2/loss=96.140, rew=15.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 350.33it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=270.106, player_2/loss=72.221, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 350.92it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=126.773, player_2/loss=106.008, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 349.11it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=98.488, player_2/loss=126.594, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 351.22it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=48.471, player_2/loss=145.808, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 350.49it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=14.065, player_2/loss=207.288, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 348.19it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=11.708, player_2/loss=294.735, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 350.11it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=23.279, player_2/loss=254.138, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 347.65it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=44.371, player_2/loss=269.697, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 350.22it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=58.225, player_2/loss=333.239, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 348.26it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=43.427, player_2/loss=317.486, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 352.35it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=36.279, player_2/loss=317.089, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 347.57it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=44.486, player_2/loss=322.317, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 338.10it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=41.028, player_2/loss=293.689, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 347.26it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=28.234, player_2/loss=216.621, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 346.99it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=34.693, player_2/loss=173.605, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 347.83it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=25.713, player_2/loss=259.499, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 356.66it/s, env_step=16384, len=17, n/ep=5, n/st=64, player_1/loss=27.660, player_2/loss=296.808, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 346.89it/s, env_step=17408, len=16, n/ep=5, n/st=64, player_1/loss=42.522, player_2/loss=297.838, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 349.21it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=31.207, player_2/loss=292.138, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 350.63it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=22.311, player_2/loss=287.702, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 346.40it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=22.078, player_2/loss=151.771, rew=-15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 351.91it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=30.658, player_2/loss=140.857, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 348.15it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=33.066, player_2/loss=107.000, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 350.09it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=49.936, player_2/loss=108.276, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 351.63it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=56.856, player_2/loss=105.516, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 345.11it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=41.057, player_2/loss=126.956, rew=-16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 349.84it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=49.527, player_2/loss=130.699, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 347.28it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=139.526, player_2/loss=90.240, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 349.78it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=136.530, player_2/loss=58.404, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 345.13it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=80.821, player_2/loss=20.087, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 350.26it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=26.352, player_2/loss=10.112, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 347.05it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=30.694, player_2/loss=12.129, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 352.89it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=39.993, player_2/loss=15.186, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 351.52it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=57.946, player_2/loss=30.362, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 350.12it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=88.523, player_2/loss=43.298, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 348.67it/s, env_step=16384, len=19, n/ep=4, n/st=64, player_1/loss=61.314, player_2/loss=29.549, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 349.06it/s, env_step=17408, len=19, n/ep=4, n/st=64, player_1/loss=20.833, player_2/loss=27.151, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 347.27it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=21.552, player_2/loss=5.551, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 351.33it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=72.024, player_2/loss=31.804, rew=15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 349.08it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=106.840, player_2/loss=137.257, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.79it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=60.813, player_2/loss=96.479, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.60it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=34.848, player_2/loss=93.178, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.49it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=49.057, player_2/loss=123.015, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 348.97it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=56.504, player_2/loss=116.462, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.04it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=60.960, player_2/loss=80.173, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 351.86it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=56.661, player_2/loss=79.964, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 348.63it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=75.646, player_2/loss=67.225, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 349.75it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=81.294, player_2/loss=96.145, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 350.74it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=52.983, player_2/loss=117.554, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 346.40it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=31.152, player_2/loss=87.676, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 349.14it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=20.052, player_2/loss=68.463, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 346.06it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=26.185, player_2/loss=60.135, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 350.36it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=18.222, player_2/loss=75.746, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 347.27it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=59.475, player_2/loss=82.746, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 349.65it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=91.990, player_2/loss=86.387, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 350.69it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=104.327, player_2/loss=94.066, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 358.76it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=100.765, player_2/loss=101.783, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 364.65it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=68.374, player_2/loss=105.723, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 355.41it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=126.931, player_2/loss=131.888, rew=5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.55it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=149.720, player_2/loss=142.578, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 349.65it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=156.955, player_2/loss=153.390, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 349.86it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=194.736, player_2/loss=117.426, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 348.73it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=217.243, player_2/loss=151.458, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 342.99it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=232.166, player_2/loss=142.565, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 353.21it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=225.911, player_2/loss=78.144, rew=-12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 343.70it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=201.751, player_2/loss=76.958, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 351.56it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=157.554, player_2/loss=84.792, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 345.21it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=159.118, player_2/loss=59.920, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 350.52it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=197.695, player_2/loss=55.033, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 346.97it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=208.262, player_2/loss=50.894, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 350.05it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=181.751, player_2/loss=57.033, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 350.45it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=173.942, player_2/loss=124.266, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 350.91it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=160.865, player_2/loss=116.195, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 351.59it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=188.143, player_2/loss=88.017, rew=5.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 347.51it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=218.985, player_2/loss=27.622, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 349.73it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=179.572, player_2/loss=29.299, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 350.97it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=180.448, player_2/loss=47.057, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 348.29it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=183.201, player_2/loss=23.745, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.38it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=174.873, player_2/loss=30.304, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.05it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=184.054, player_2/loss=26.455, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 350.62it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=171.884, player_2/loss=17.179, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 353.01it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=167.511, player_2/loss=40.527, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 347.59it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=149.492, player_2/loss=43.221, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 348.77it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=164.432, player_2/loss=163.482, rew=16.67]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 348.17it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=201.541, player_2/loss=312.571, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 351.37it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=193.051, player_2/loss=426.818, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 346.79it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=143.351, player_2/loss=444.205, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 346.55it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=51.473, player_2/loss=410.120, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 345.11it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=30.130, player_2/loss=362.058, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 347.91it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=37.662, player_2/loss=348.077, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 347.22it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=32.951, player_2/loss=376.860, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 345.30it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=23.874, player_2/loss=431.262, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 350.26it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=17.384, player_2/loss=410.566, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 346.96it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=13.838, player_2/loss=366.207, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 348.13it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=12.051, player_2/loss=330.537, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 343.94it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=7.438, player_2/loss=359.332, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 349.17it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=4.927, player_2/loss=267.646, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.54it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=10.841, player_2/loss=228.123, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.48it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=141.674, player_2/loss=181.899, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 350.65it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=302.219, player_2/loss=112.448, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 347.22it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=359.974, player_2/loss=54.806, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 349.52it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=381.364, player_2/loss=40.394, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 350.60it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=338.802, player_2/loss=30.691, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 345.51it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=347.556, rew=25.00]         


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 348.78it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=336.299, player_2/loss=74.186, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 351.72it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=366.638, player_2/loss=128.707, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 349.02it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=336.405, player_2/loss=91.494, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 350.02it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=222.938, player_2/loss=127.880, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 345.20it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_2/loss=77.622, rew=25.00]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 353.23it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=351.223, player_2/loss=51.490, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 350.06it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=252.369, player_2/loss=35.326, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 348.22it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=248.424, player_2/loss=35.006, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 353.49it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=299.417, player_2/loss=63.172, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 346.08it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=257.601, player_2/loss=71.229, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 350.34it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=263.100, player_2/loss=38.980, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 348.01it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=230.211, player_2/loss=23.560, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.54it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=173.839, player_2/loss=22.133, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.27it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=126.165, player_2/loss=47.941, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 347.80it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=134.952, player_2/loss=165.556, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 349.48it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=141.137, player_2/loss=316.704, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 346.07it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=80.413, player_2/loss=392.580, rew=13.89]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 348.76it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=31.238, player_2/loss=361.695, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 344.31it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=88.806, player_2/loss=372.476, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 350.19it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=110.058, player_2/loss=380.777, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 345.02it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=67.196, player_2/loss=362.018, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 349.09it/s, env_step=11264, len=9, n/ep=6, n/st=64, player_1/loss=33.170, player_2/loss=347.377, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 348.88it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=19.065, player_2/loss=295.829, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 346.38it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=24.220, player_2/loss=324.134, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 349.47it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=14.765, player_2/loss=339.129, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 345.92it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=77.680, player_2/loss=309.218, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 350.40it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=92.048, player_2/loss=365.029, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 347.42it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=43.473, player_2/loss=401.263, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 346.18it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=70.635, player_2/loss=413.334, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 349.33it/s, env_step=19456, len=8, n/ep=9, n/st=64, player_1/loss=79.426, player_2/loss=392.983, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 350.50it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=133.207, player_2/loss=66.789, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 350.73it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=128.262, player_2/loss=52.632, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 350.25it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=131.991, player_2/loss=24.799, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 351.60it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=130.245, player_2/loss=60.591, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 361.63it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=107.029, player_2/loss=85.828, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 347.00it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=107.097, player_2/loss=42.488, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 350.24it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=152.707, player_2/loss=12.597, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 351.22it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=137.149, player_2/loss=7.090, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 346.62it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=120.448, player_2/loss=21.595, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 349.42it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=138.753, player_2/loss=24.244, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 346.85it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=119.846, player_2/loss=12.931, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 350.55it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=145.132, player_2/loss=11.357, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 351.29it/s, env_step=13312, len=12, n/ep=6, n/st=64, player_1/loss=148.551, player_2/loss=14.518, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 346.46it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=138.446, player_2/loss=28.807, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 352.86it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=152.357, rew=12.50]       


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 353.08it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=162.994, player_2/loss=52.689, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 347.38it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=185.991, player_2/loss=43.300, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 350.46it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=165.508, player_2/loss=33.336, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 344.92it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=168.450, player_2/loss=34.439, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 348.90it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=136.699, player_2/loss=33.337, rew=-15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 349.92it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=118.220, player_2/loss=99.614, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 343.74it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=110.383, player_2/loss=163.794, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 347.18it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=79.651, player_2/loss=200.356, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 346.82it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=132.020, player_2/loss=215.780, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 350.42it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=109.250, player_2/loss=203.571, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 351.69it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=59.585, player_2/loss=193.356, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 345.61it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=68.534, player_2/loss=182.980, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 349.50it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=59.635, player_2/loss=208.730, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 349.29it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=41.870, player_2/loss=193.360, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 347.49it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=35.597, player_2/loss=215.971, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 346.63it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=14.180, player_2/loss=179.227, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 343.91it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=13.226, player_2/loss=178.702, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 349.81it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=32.751, player_2/loss=151.104, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 349.92it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=27.225, player_2/loss=129.669, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 345.94it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=21.922, player_2/loss=159.148, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 348.34it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=26.244, player_2/loss=184.700, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 347.38it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=16.118, player_2/loss=191.118, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 348.41it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=13.293, player_2/loss=209.276, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 353.93it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=20.096, player_2/loss=129.790, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.16it/s, env_step=2048, len=16, n/ep=3, n/st=64, player_1/loss=33.321, player_2/loss=153.941, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 354.37it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=32.756, player_2/loss=121.436, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.91it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=16.969, player_2/loss=144.641, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 354.63it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=18.931, player_2/loss=89.459, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.58it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=14.586, player_2/loss=63.302, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 361.98it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=44.377, player_2/loss=56.447, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 352.69it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=135.877, player_2/loss=73.601, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 347.21it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=183.904, player_2/loss=72.533, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 352.58it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=89.020, player_2/loss=88.302, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 352.59it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=144.609, player_2/loss=74.551, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 349.49it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=147.164, player_2/loss=25.777, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 350.79it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=166.331, player_2/loss=26.694, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 350.12it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=149.812, player_2/loss=60.395, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 351.75it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=167.813, player_2/loss=63.878, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 348.43it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=191.704, player_2/loss=15.726, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 351.85it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=190.790, player_2/loss=6.188, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 352.13it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=184.207, player_2/loss=7.726, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 349.71it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=141.785, player_2/loss=34.896, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 347.77it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=161.748, player_2/loss=58.768, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.23it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=165.464, player_2/loss=70.874, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 354.75it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=191.466, player_2/loss=107.894, rew=-6.25]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 347.40it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=183.569, player_2/loss=229.093, rew=13.89]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 349.56it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=109.389, player_2/loss=321.155, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 348.84it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=101.290, player_2/loss=327.223, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 346.90it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=101.581, player_2/loss=422.700, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 350.41it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=99.577, player_2/loss=437.168, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 349.50it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=47.064, player_2/loss=386.544, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 348.78it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=49.647, player_2/loss=297.389, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 349.46it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=97.489, player_2/loss=317.393, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 348.30it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_2/loss=360.437, rew=6.25]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 345.53it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=27.993, player_2/loss=390.906, rew=-6.25]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 347.62it/s, env_step=14336, len=7, n/ep=10, n/st=64, player_1/loss=52.767, player_2/loss=395.888, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 347.59it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=119.584, player_2/loss=351.835, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 347.92it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=106.187, player_2/loss=360.573, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 348.73it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=60.861, player_2/loss=394.145, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 346.29it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=84.337, player_2/loss=401.290, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 346.76it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=111.084, player_2/loss=384.248, rew=13.89]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 348.67it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=50.540, player_2/loss=277.191, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.25it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=70.614, player_2/loss=233.813, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 350.50it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=155.524, player_2/loss=130.520, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 344.65it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=183.808, player_2/loss=35.785, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 350.12it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=173.851, player_2/loss=42.263, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 348.12it/s, env_step=6144, len=13, n/ep=6, n/st=64, player_1/loss=196.039, player_2/loss=75.068, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 346.53it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=227.035, player_2/loss=58.402, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 347.46it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=245.456, player_2/loss=35.238, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 348.76it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=240.665, rew=25.00]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 345.65it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=242.069, player_2/loss=16.362, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 347.61it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=211.179, player_2/loss=14.375, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 350.54it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=196.336, player_2/loss=5.869, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 349.55it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=277.117, rew=8.33]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 350.24it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=231.440, player_2/loss=36.073, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 348.88it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=216.121, player_2/loss=29.797, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 345.65it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=249.811, player_2/loss=5.944, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 353.17it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=246.800, player_2/loss=16.746, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 348.84it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=253.951, player_2/loss=21.772, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 346.76it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=257.544, player_2/loss=31.966, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 351.04it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=134.765, player_2/loss=8.021, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.68it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=100.161, player_2/loss=11.343, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.00it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=53.300, player_2/loss=16.736, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 348.81it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=99.237, player_2/loss=35.174, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 347.14it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=165.004, player_2/loss=110.278, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 349.40it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=161.483, player_2/loss=284.583, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 348.87it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=55.918, player_2/loss=458.022, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 347.34it/s, env_step=8192, len=7, n/ep=10, n/st=64, player_1/loss=42.836, player_2/loss=453.038, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 350.65it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=32.701, player_2/loss=429.483, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 345.13it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=34.947, player_2/loss=485.400, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 346.07it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=44.196, player_2/loss=456.141, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 348.11it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=19.071, player_2/loss=487.061, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 345.35it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=5.817, player_2/loss=527.057, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 350.19it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=17.969, player_2/loss=501.547, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 350.17it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=18.995, player_2/loss=557.676, rew=17.86]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 346.23it/s, env_step=16384, len=7, n/ep=10, n/st=64, player_1/loss=21.981, player_2/loss=491.759, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 348.51it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=36.372, player_2/loss=462.527, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 344.91it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=40.215, player_2/loss=482.556, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 347.58it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=49.027, player_2/loss=506.641, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 351.89it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=2.493, player_2/loss=404.388, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.66it/s, env_step=2048, len=7, n/ep=7, n/st=64, player_1/loss=7.938, player_2/loss=337.195, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.86it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=18.727, player_2/loss=284.906, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.64it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=27.510, player_2/loss=275.831, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.18it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=13.812, player_2/loss=231.718, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 350.31it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=12.436, player_2/loss=206.293, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.18it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=16.033, rew=-25.00]          


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.80it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=68.181, player_2/loss=175.467, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 351.40it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=139.633, player_2/loss=122.362, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 349.91it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=138.016, player_2/loss=116.329, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:02, 353.75it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=110.832, player_2/loss=118.658, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:02, 357.20it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=125.900, player_2/loss=97.491, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:02, 350.55it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=123.953, player_2/loss=70.531, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:02, 351.30it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=113.795, player_2/loss=105.806, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:02, 352.35it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=127.389, player_2/loss=98.176, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:02, 353.42it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=150.139, player_2/loss=125.440, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:02, 347.30it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=122.246, player_2/loss=120.865, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:02, 346.04it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=103.489, player_2/loss=88.957, rew=-12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:02, 350.60it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=111.544, player_2/loss=59.407, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:02, 348.06it/s, env_step=1024, len=18, n/ep=3, n/st=64, player_1/loss=135.466, player_2/loss=126.716, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.27it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_2/loss=122.773, rew=15.00]         


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.91it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=182.921, player_2/loss=149.215, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 352.14it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=177.339, player_2/loss=182.923, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 351.38it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=116.111, player_2/loss=237.579, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 346.34it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=100.711, player_2/loss=276.858, rew=19.44]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 348.38it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=84.852, rew=19.44]           


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 349.50it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=146.851, player_2/loss=281.008, rew=19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 344.88it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=111.060, rew=19.44]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 349.86it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=36.242, player_2/loss=297.678, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 345.18it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=20.784, player_2/loss=258.440, rew=19.44]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 351.05it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=49.595, player_2/loss=229.049, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 350.27it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=53.316, player_2/loss=263.039, rew=19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 344.91it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=41.479, player_2/loss=269.741, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 343.38it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=50.019, player_2/loss=271.579, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 345.89it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=49.348, player_2/loss=253.217, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 349.49it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=58.942, player_2/loss=273.592, rew=10.71]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 348.16it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=47.612, player_2/loss=284.027, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 345.21it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=23.097, player_2/loss=274.151, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 350.12it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=6.129, player_2/loss=208.232, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.65it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=61.623, player_2/loss=164.761, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.59it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=101.735, player_2/loss=127.822, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.37it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=51.672, player_2/loss=91.295, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 352.49it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=38.361, player_2/loss=119.302, rew=-19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 346.57it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=61.235, player_2/loss=166.323, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 349.79it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=26.286, player_2/loss=149.047, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 349.89it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=43.698, player_2/loss=131.789, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 349.10it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=51.598, player_2/loss=68.889, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 349.20it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=63.796, player_2/loss=45.066, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 346.37it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=37.882, player_2/loss=45.438, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 348.31it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=24.380, player_2/loss=31.747, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 349.33it/s, env_step=13312, len=8, n/ep=9, n/st=64, player_1/loss=70.384, player_2/loss=31.534, rew=-19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 351.50it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=64.611, player_2/loss=35.490, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 347.11it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=43.372, player_2/loss=65.108, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 351.01it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=57.046, player_2/loss=42.243, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 349.03it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=43.871, player_2/loss=18.974, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 348.80it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=130.665, player_2/loss=106.444, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 350.83it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=265.091, player_2/loss=154.216, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 354.96it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=239.944, player_2/loss=78.474, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.38it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=192.555, player_2/loss=89.474, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.02it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=130.044, player_2/loss=160.013, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.36it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=123.356, player_2/loss=233.127, rew=5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 348.35it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=78.631, player_2/loss=243.393, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 353.86it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=60.753, player_2/loss=300.678, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 352.63it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=36.908, player_2/loss=277.915, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 349.36it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=18.949, player_2/loss=235.030, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 347.94it/s, env_step=9216, len=27, n/ep=2, n/st=64, player_1/loss=52.972, player_2/loss=200.617, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 347.55it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=67.960, player_2/loss=187.157, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 352.88it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=100.412, player_2/loss=129.097, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 340.76it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=105.324, player_2/loss=117.367, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 347.85it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=86.701, player_2/loss=201.223, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 347.03it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=39.426, player_2/loss=297.271, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:03, 338.29it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=62.436, player_2/loss=282.513, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 344.44it/s, env_step=16384, len=23, n/ep=3, n/st=64, player_1/loss=101.806, player_2/loss=220.677, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 349.43it/s, env_step=17408, len=24, n/ep=3, n/st=64, player_1/loss=64.754, player_2/loss=190.650, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 350.86it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=45.758, player_2/loss=185.535, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 346.54it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=53.232, player_2/loss=200.195, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 352.71it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=54.115, player_2/loss=169.578, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.33it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=32.872, player_2/loss=124.428, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.77it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=46.061, player_2/loss=102.572, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 353.21it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=92.207, player_2/loss=118.237, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.55it/s, env_step=5120, len=19, n/ep=4, n/st=64, player_1/loss=113.721, player_2/loss=191.462, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 351.01it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=101.716, player_2/loss=180.092, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 354.33it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=87.056, player_2/loss=122.919, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 347.19it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=238.806, player_2/loss=106.167, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 349.55it/s, env_step=9216, len=21, n/ep=2, n/st=64, player_1/loss=258.143, player_2/loss=65.558, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 348.30it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=136.593, player_2/loss=32.619, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 350.33it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=92.014, player_2/loss=103.897, rew=-12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 352.66it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=71.392, player_2/loss=158.946, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 349.81it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=68.047, player_2/loss=163.485, rew=-17.86]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 350.88it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=208.658, player_2/loss=129.265, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 349.16it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=348.225, player_2/loss=90.680, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 353.28it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=336.827, player_2/loss=94.394, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 357.37it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=236.708, player_2/loss=68.608, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 348.30it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=275.701, player_2/loss=21.686, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 348.60it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=304.491, player_2/loss=43.174, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 349.68it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=232.557, player_2/loss=137.626, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.86it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=158.226, player_2/loss=285.451, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.47it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=81.746, player_2/loss=407.448, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.53it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=56.001, player_2/loss=363.880, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.39it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=60.946, player_2/loss=344.830, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.20it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=114.055, player_2/loss=311.316, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 349.90it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=128.137, player_2/loss=551.956, rew=19.44]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.45it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=90.392, player_2/loss=778.410, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 349.82it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=56.342, player_2/loss=746.211, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 349.37it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=27.867, rew=25.00]         


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 346.69it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=54.087, player_2/loss=647.617, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 349.06it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=79.817, player_2/loss=587.038, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 349.14it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=95.342, player_2/loss=613.353, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 344.35it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=100.136, player_2/loss=640.571, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 350.10it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=67.344, player_2/loss=553.197, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 345.46it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=43.555, player_2/loss=532.211, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 351.29it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=34.580, player_2/loss=586.742, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 350.10it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=39.373, player_2/loss=540.862, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 348.05it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=19.456, player_2/loss=572.116, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 347.21it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=33.518, player_2/loss=455.978, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.04it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=61.552, player_2/loss=386.624, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.11it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=140.234, player_2/loss=239.080, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.96it/s, env_step=4096, len=26, n/ep=2, n/st=64, player_1/loss=222.701, player_2/loss=125.211, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 344.94it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=226.728, player_2/loss=130.753, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 350.18it/s, env_step=6144, len=20, n/ep=4, n/st=64, player_1/loss=192.213, player_2/loss=132.562, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 351.41it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=157.442, player_2/loss=113.318, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 347.27it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=88.810, player_2/loss=101.432, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 350.34it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=136.379, player_2/loss=85.407, rew=12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 346.82it/s, env_step=10240, len=24, n/ep=3, n/st=64, player_1/loss=179.967, player_2/loss=93.695, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 351.16it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=140.331, player_2/loss=124.199, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 351.84it/s, env_step=12288, len=21, n/ep=2, n/st=64, player_1/loss=96.678, player_2/loss=103.058, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 347.37it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=97.963, player_2/loss=102.639, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 351.02it/s, env_step=14336, len=25, n/ep=3, n/st=64, player_1/loss=104.933, player_2/loss=119.187, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 351.71it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=115.080, player_2/loss=104.224, rew=-8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 348.58it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=118.314, player_2/loss=111.601, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 349.44it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=100.781, player_2/loss=122.238, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 350.19it/s, env_step=18432, len=17, n/ep=3, n/st=64, player_1/loss=105.641, player_2/loss=132.389, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 349.38it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=168.999, player_2/loss=141.699, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 348.05it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=83.975, player_2/loss=91.216, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.03it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=76.078, player_2/loss=131.147, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.87it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=40.240, player_2/loss=156.451, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.09it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=49.046, player_2/loss=150.390, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 352.16it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=38.978, player_2/loss=142.898, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 350.29it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=21.279, player_2/loss=142.119, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 349.54it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=27.917, player_2/loss=156.139, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 351.31it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=38.671, player_2/loss=176.883, rew=3.57]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 350.04it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=64.639, player_2/loss=207.817, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 349.70it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=86.213, player_2/loss=186.752, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 350.83it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=65.913, player_2/loss=153.699, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 349.67it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=25.948, player_2/loss=140.415, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 350.94it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=24.206, player_2/loss=124.743, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 348.21it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=6.467, rew=25.00]         


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 351.60it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=6.230, player_2/loss=120.921, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 351.89it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=21.031, player_2/loss=124.059, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 347.00it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=21.893, player_2/loss=124.039, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 350.95it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=6.990, player_2/loss=145.729, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 347.64it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=5.501, player_2/loss=130.374, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 350.82it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=6.536, player_2/loss=132.330, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.48it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=18.339, player_2/loss=105.560, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.09it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=16.897, player_2/loss=94.892, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.46it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=16.973, player_2/loss=95.476, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 346.15it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=50.388, player_2/loss=100.356, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 351.35it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=56.028, player_2/loss=111.871, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 349.89it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=79.308, player_2/loss=77.439, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 351.67it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=161.272, player_2/loss=86.967, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 349.47it/s, env_step=9216, len=23, n/ep=3, n/st=64, player_1/loss=177.074, player_2/loss=84.561, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 353.02it/s, env_step=10240, len=28, n/ep=2, n/st=64, player_1/loss=170.862, player_2/loss=76.901, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 351.37it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=111.155, player_2/loss=65.111, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 348.53it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=119.996, player_2/loss=78.918, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 351.86it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=149.614, player_2/loss=99.653, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #14: 1025it [00:02, 351.12it/s, env_step=14336, len=25, n/ep=3, n/st=64, player_1/loss=95.820, player_2/loss=89.746, rew=8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #15: 1025it [00:02, 350.48it/s, env_step=15360, len=26, n/ep=2, n/st=64, player_1/loss=169.944, player_2/loss=81.462, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #16: 1025it [00:02, 352.17it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=236.028, player_2/loss=64.306, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #17: 1025it [00:02, 349.27it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=178.576, player_2/loss=45.010, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #18: 1025it [00:02, 352.57it/s, env_step=18432, len=25, n/ep=2, n/st=64, player_1/loss=172.215, player_2/loss=43.779, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #19: 1025it [00:02, 351.19it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=225.635, player_2/loss=45.189, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #1: 1025it [00:02, 350.14it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=35.923, player_2/loss=87.504, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 354.34it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=95.008, player_2/loss=68.749, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.62it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=126.234, player_2/loss=69.541, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.29it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=88.037, player_2/loss=97.696, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 347.27it/s, env_step=5120, len=15, n/ep=3, n/st=64, player_1/loss=75.622, player_2/loss=117.461, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 350.47it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=91.949, player_2/loss=178.824, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 347.61it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=60.481, player_2/loss=216.230, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 347.60it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=76.309, player_2/loss=246.603, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 348.01it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=86.695, player_2/loss=233.401, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 347.18it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=99.903, player_2/loss=256.611, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 349.14it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=122.427, player_2/loss=236.177, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 351.59it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=60.276, player_2/loss=197.502, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 345.99it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=56.402, player_2/loss=194.531, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 351.53it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=36.010, player_2/loss=188.446, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 351.28it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=36.451, player_2/loss=176.747, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 347.04it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=91.691, player_2/loss=219.004, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 350.77it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=116.399, player_2/loss=221.876, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 347.76it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=65.293, player_2/loss=162.017, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 350.19it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=24.638, player_2/loss=134.938, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 352.05it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=61.611, player_2/loss=193.990, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.63it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=141.028, player_2/loss=169.297, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 348.91it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=399.850, player_2/loss=117.602, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 347.69it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=494.239, player_2/loss=40.748, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 350.55it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=449.556, player_2/loss=45.695, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 349.19it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=460.616, player_2/loss=46.475, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 350.34it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=463.563, player_2/loss=53.160, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 352.09it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=375.445, player_2/loss=44.164, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 354.08it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=444.497, player_2/loss=24.593, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 349.89it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=461.341, player_2/loss=24.662, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 349.78it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=422.813, player_2/loss=12.977, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 348.49it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=465.847, player_2/loss=7.091, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 349.39it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=399.936, player_2/loss=22.544, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 352.42it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=433.720, player_2/loss=58.506, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 349.44it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=407.115, player_2/loss=55.700, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 352.03it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=509.276, player_2/loss=23.006, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 352.59it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=490.386, player_2/loss=15.727, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 348.49it/s, env_step=18432, len=11, n/ep=7, n/st=64, player_1/loss=408.336, player_2/loss=24.051, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 351.13it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=454.939, player_2/loss=20.189, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 346.10it/s, env_step=1024, len=28, n/ep=2, n/st=64, player_1/loss=344.845, player_2/loss=55.368, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.87it/s, env_step=2048, len=27, n/ep=2, n/st=64, player_1/loss=213.606, player_2/loss=151.819, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.90it/s, env_step=3072, len=25, n/ep=3, n/st=64, player_1/loss=99.299, player_2/loss=227.908, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 354.78it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=98.268, player_2/loss=227.441, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 355.29it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=45.534, player_2/loss=248.661, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 345.44it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=40.507, player_2/loss=260.209, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 354.18it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=38.645, player_2/loss=236.077, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 346.43it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=6.446, player_2/loss=247.083, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 351.24it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=4.267, player_2/loss=227.322, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 349.57it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=10.725, player_2/loss=250.426, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 340.14it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=13.254, player_2/loss=240.445, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 351.52it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=22.901, player_2/loss=182.725, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 347.31it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=34.476, player_2/loss=193.197, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 347.96it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=20.262, player_2/loss=210.031, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 350.59it/s, env_step=15360, len=17, n/ep=3, n/st=64, player_1/loss=8.863, player_2/loss=244.244, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 345.77it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=20.146, player_2/loss=288.637, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 349.19it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=22.859, player_2/loss=278.624, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 346.09it/s, env_step=18432, len=15, n/ep=5, n/st=64, player_1/loss=22.833, player_2/loss=288.699, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 352.39it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=9.637, player_2/loss=284.979, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 349.02it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=5.597, player_2/loss=161.561, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.68it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=11.726, player_2/loss=170.982, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 351.31it/s, env_step=3072, len=22, n/ep=2, n/st=64, player_1/loss=20.490, player_2/loss=156.667, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 347.52it/s, env_step=4096, len=15, n/ep=3, n/st=64, player_1/loss=113.188, player_2/loss=95.828, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 340.27it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=175.031, player_2/loss=28.318, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 355.16it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=128.178, player_2/loss=45.946, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 350.13it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=210.473, player_2/loss=61.006, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 352.30it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=237.833, player_2/loss=37.414, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 349.10it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=192.166, player_2/loss=37.596, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 351.29it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=194.903, player_2/loss=84.951, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 353.68it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=171.254, player_2/loss=88.934, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 348.26it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=152.794, player_2/loss=95.872, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 349.25it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=215.506, player_2/loss=50.561, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 352.88it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=209.001, player_2/loss=76.663, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 345.99it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=233.277, player_2/loss=52.606, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 351.87it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=255.494, player_2/loss=10.879, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 351.15it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=234.895, player_2/loss=17.060, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 350.27it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=223.748, player_2/loss=27.650, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 350.89it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=246.402, player_2/loss=53.780, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 347.08it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=135.969, player_2/loss=45.958, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.13it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=110.010, player_2/loss=126.297, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 349.02it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=52.043, player_2/loss=196.220, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 345.08it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=22.860, player_2/loss=236.052, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 349.45it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=71.610, player_2/loss=253.351, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 343.42it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=86.877, player_2/loss=244.118, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 349.01it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=62.470, player_2/loss=258.479, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 345.28it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=14.939, player_2/loss=264.314, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 350.63it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=7.464, player_2/loss=262.420, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 347.11it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=14.392, player_2/loss=262.865, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 347.54it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=16.739, player_2/loss=236.536, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 349.35it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=8.795, player_2/loss=226.914, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 344.82it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=7.576, player_2/loss=229.162, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 348.94it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=4.392, player_2/loss=249.132, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 349.51it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=24.511, player_2/loss=239.284, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 345.43it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=40.673, player_2/loss=250.954, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 348.16it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=21.910, player_2/loss=232.274, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 345.59it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=18.087, player_2/loss=202.898, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.32it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=29.085, player_2/loss=217.733, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 349.06it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=9.640, player_2/loss=208.599, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.95it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=11.107, player_2/loss=211.255, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 352.21it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=191.035, player_2/loss=158.867, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 352.51it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=302.128, player_2/loss=89.435, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 348.17it/s, env_step=5120, len=16, n/ep=3, n/st=64, player_1/loss=207.185, player_2/loss=91.661, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 349.61it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=199.527, rew=12.50]         


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 347.04it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=266.090, player_2/loss=110.825, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 351.26it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=275.209, player_2/loss=57.065, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 350.58it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=246.974, player_2/loss=27.333, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 348.32it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=169.467, player_2/loss=56.508, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 353.39it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=183.381, player_2/loss=61.104, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 351.41it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=244.541, player_2/loss=21.381, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 346.67it/s, env_step=13312, len=22, n/ep=4, n/st=64, player_1/loss=185.125, player_2/loss=20.050, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 349.74it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=209.735, player_2/loss=16.044, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 347.24it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=227.496, player_2/loss=22.803, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 348.36it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=188.789, player_2/loss=55.806, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 350.70it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=282.221, player_2/loss=75.235, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 347.80it/s, env_step=18432, len=22, n/ep=3, n/st=64, player_1/loss=294.131, player_2/loss=61.216, rew=-8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 352.94it/s, env_step=19456, len=22, n/ep=2, n/st=64, player_1/loss=363.745, player_2/loss=82.395, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 348.33it/s, env_step=1024, len=26, n/ep=2, n/st=64, player_1/loss=166.116, player_2/loss=31.546, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.65it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=155.962, player_2/loss=63.739, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 352.09it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=93.061, rew=12.50]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 346.90it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=64.987, player_2/loss=189.356, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 354.05it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=64.079, player_2/loss=183.434, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 347.27it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=22.013, player_2/loss=166.341, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 350.83it/s, env_step=7168, len=17, n/ep=5, n/st=64, player_1/loss=16.859, player_2/loss=200.181, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 350.42it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=14.175, player_2/loss=217.860, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 347.14it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=7.032, player_2/loss=188.736, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 344.89it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=18.291, player_2/loss=226.085, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 352.38it/s, env_step=11264, len=15, n/ep=3, n/st=64, player_1/loss=22.987, player_2/loss=272.024, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 349.89it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=12.274, player_2/loss=293.713, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 349.45it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=9.040, player_2/loss=274.229, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 350.43it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=14.387, player_2/loss=252.763, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 350.59it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=12.489, player_2/loss=222.347, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 347.91it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=4.616, rew=25.00]         


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 351.52it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=35.601, player_2/loss=323.542, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 345.74it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=38.061, player_2/loss=293.330, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.97it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=9.223, player_2/loss=247.140, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 350.21it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=21.747, player_2/loss=146.175, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.47it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=25.261, player_2/loss=156.308, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.26it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=72.562, player_2/loss=121.334, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.04it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=109.502, player_2/loss=118.433, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 351.54it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=128.291, player_2/loss=138.365, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 352.55it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=149.106, player_2/loss=150.315, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 350.65it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=154.283, player_2/loss=139.384, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 351.01it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=181.718, player_2/loss=78.443, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 349.92it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=153.301, player_2/loss=40.249, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 351.45it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=175.343, player_2/loss=44.011, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 351.89it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=212.291, player_2/loss=94.491, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 349.08it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=192.961, player_2/loss=81.969, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 349.28it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=184.511, player_2/loss=89.978, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 347.36it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=199.193, player_2/loss=147.457, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 350.33it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=181.997, player_2/loss=121.499, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 351.82it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=145.100, player_2/loss=108.647, rew=-8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 343.15it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=161.173, player_2/loss=77.821, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 352.83it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=181.066, player_2/loss=87.008, rew=-5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 352.07it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=138.839, player_2/loss=96.843, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 348.25it/s, env_step=1024, len=18, n/ep=3, n/st=64, player_1/loss=119.272, player_2/loss=70.971, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.07it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=90.858, player_2/loss=101.725, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.92it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=81.306, player_2/loss=105.902, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 351.22it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=66.952, rew=25.00]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.71it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=68.815, player_2/loss=169.096, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.31it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=76.652, player_2/loss=187.430, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 352.82it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=30.445, player_2/loss=137.083, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 350.55it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=18.299, player_2/loss=115.018, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 349.94it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=23.344, player_2/loss=116.257, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 353.29it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=18.084, player_2/loss=111.812, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 346.42it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=33.823, player_2/loss=86.993, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 351.55it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=39.869, player_2/loss=110.556, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 346.49it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=50.431, player_2/loss=134.525, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 352.98it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=40.499, player_2/loss=138.799, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 347.93it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=7.489, player_2/loss=150.843, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 346.87it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_2/loss=165.006, rew=25.00]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 353.39it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=25.265, player_2/loss=152.173, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 351.64it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=10.087, player_2/loss=151.373, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 348.80it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=16.882, player_2/loss=170.282, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 350.73it/s, env_step=1024, len=23, n/ep=2, n/st=64, player_1/loss=33.124, player_2/loss=210.848, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 350.04it/s, env_step=2048, len=31, n/ep=2, n/st=64, player_1/loss=95.906, player_2/loss=126.138, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 347.62it/s, env_step=3072, len=29, n/ep=2, n/st=64, player_1/loss=111.631, player_2/loss=69.244, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 350.14it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=114.030, player_2/loss=69.572, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 345.75it/s, env_step=5120, len=19, n/ep=4, n/st=64, player_1/loss=136.914, player_2/loss=116.205, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 350.43it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=97.065, player_2/loss=119.787, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 350.59it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=80.178, player_2/loss=67.755, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 346.91it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=104.957, player_2/loss=45.420, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 350.43it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=138.305, player_2/loss=35.278, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 349.46it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=218.044, player_2/loss=21.918, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 350.03it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=210.131, player_2/loss=21.840, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 349.56it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=162.656, player_2/loss=45.127, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 348.03it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=189.613, player_2/loss=55.180, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 350.65it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=234.606, player_2/loss=63.753, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 350.04it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=248.979, rew=25.00]       


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 346.06it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=223.814, player_2/loss=77.015, rew=-12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 353.66it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=178.288, player_2/loss=89.182, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 350.68it/s, env_step=18432, len=26, n/ep=3, n/st=64, player_1/loss=168.151, player_2/loss=98.140, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 350.14it/s, env_step=19456, len=22, n/ep=4, n/st=64, player_1/loss=139.184, player_2/loss=74.403, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 345.96it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=59.172, player_2/loss=193.657, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 351.51it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=78.219, player_2/loss=131.523, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.47it/s, env_step=3072, len=28, n/ep=2, n/st=64, player_1/loss=106.622, player_2/loss=127.585, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.23it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=90.597, player_2/loss=144.143, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 344.93it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=41.873, player_2/loss=192.181, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.87it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=48.672, player_2/loss=187.649, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 347.27it/s, env_step=7168, len=10, n/ep=7, n/st=64, player_1/loss=88.699, player_2/loss=176.051, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.57it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=66.601, player_2/loss=194.974, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 345.80it/s, env_step=9216, len=11, n/ep=7, n/st=64, player_1/loss=16.747, player_2/loss=230.357, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 349.35it/s, env_step=10240, len=8, n/ep=6, n/st=64, player_1/loss=23.890, player_2/loss=244.926, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.98it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=33.508, rew=25.00]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 350.35it/s, env_step=12288, len=9, n/ep=5, n/st=64, player_1/loss=38.417, player_2/loss=236.506, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 342.07it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=52.217, player_2/loss=233.594, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 347.86it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=52.800, player_2/loss=189.740, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 348.79it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=28.779, player_2/loss=185.605, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 348.83it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=57.082, player_2/loss=232.877, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 343.87it/s, env_step=17408, len=10, n/ep=8, n/st=64, player_1/loss=59.409, player_2/loss=213.119, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 347.78it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=21.612, player_2/loss=201.724, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 349.35it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=17.369, player_2/loss=239.472, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 353.14it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=19.827, player_2/loss=152.416, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.21it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=81.826, player_2/loss=194.247, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 352.98it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=154.196, player_2/loss=222.387, rew=-18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 350.83it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=192.687, player_2/loss=280.089, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 352.18it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=155.268, player_2/loss=250.686, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 348.02it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=116.158, player_2/loss=174.185, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 348.04it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=89.716, player_2/loss=132.657, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 351.40it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=85.165, player_2/loss=85.264, rew=-5.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 351.82it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=57.320, player_2/loss=63.688, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 348.47it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=40.752, player_2/loss=35.992, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 352.04it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=55.189, player_2/loss=58.547, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 351.61it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=55.341, player_2/loss=58.670, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 351.28it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=65.262, player_2/loss=76.681, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 349.97it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=109.952, player_2/loss=144.189, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 347.71it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=185.504, player_2/loss=229.434, rew=-10.71]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 350.81it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=236.843, player_2/loss=193.476, rew=-8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 351.81it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=189.191, player_2/loss=178.867, rew=-16.67]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 352.15it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=119.845, player_2/loss=142.491, rew=-16.67]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 351.75it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=85.862, player_2/loss=110.330, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 345.37it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=71.197, player_2/loss=95.996, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 352.53it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=63.215, player_2/loss=80.382, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.64it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=48.325, player_2/loss=83.097, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 348.94it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=69.411, player_2/loss=105.661, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 347.64it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=79.976, player_2/loss=118.522, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.84it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=101.831, player_2/loss=111.480, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 351.04it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=94.932, player_2/loss=135.876, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 350.71it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=106.692, player_2/loss=142.158, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 349.77it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=149.349, player_2/loss=144.287, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 343.88it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=100.049, player_2/loss=134.469, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 351.58it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=53.764, player_2/loss=114.434, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 351.24it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=76.518, player_2/loss=107.015, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 351.04it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=52.423, player_2/loss=92.025, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 343.48it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=31.589, player_2/loss=85.229, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 349.37it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=36.138, player_2/loss=63.071, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 350.31it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=20.682, player_2/loss=47.754, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 341.55it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=13.763, player_2/loss=60.379, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 350.20it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=32.658, player_2/loss=67.475, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 346.66it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=40.357, player_2/loss=60.598, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 351.50it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=87.857, player_2/loss=77.675, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.18it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=93.277, player_2/loss=58.153, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 346.41it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=99.365, player_2/loss=138.609, rew=10.71]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 345.69it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=164.367, player_2/loss=193.664, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 348.47it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=184.240, player_2/loss=121.187, rew=10.71]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 347.72it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=163.260, player_2/loss=206.740, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 347.87it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=178.170, player_2/loss=213.226, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 345.41it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=184.191, player_2/loss=118.554, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 352.22it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=139.161, player_2/loss=73.003, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 348.76it/s, env_step=10240, len=10, n/ep=5, n/st=64, player_2/loss=54.962, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 348.39it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=164.032, player_2/loss=78.144, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 349.50it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=190.209, player_2/loss=75.025, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 345.44it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=174.747, player_2/loss=34.097, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 350.02it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=175.458, player_2/loss=49.709, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 350.68it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=179.484, player_2/loss=49.664, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 349.49it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=155.564, player_2/loss=19.978, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 348.55it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=131.879, player_2/loss=34.720, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 349.55it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=157.562, player_2/loss=63.447, rew=16.67]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 348.79it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=135.240, player_2/loss=68.964, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 351.52it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=172.264, player_2/loss=182.458, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 345.46it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=114.368, player_2/loss=266.555, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 345.30it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=47.337, player_2/loss=364.876, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 348.08it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=39.413, player_2/loss=329.585, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 346.50it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=30.476, player_2/loss=336.897, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 348.09it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=44.510, player_2/loss=310.692, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 343.88it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=38.140, player_2/loss=331.369, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 348.07it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=21.070, player_2/loss=361.581, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 348.61it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=22.828, player_2/loss=422.746, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 348.22it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=23.495, player_2/loss=408.273, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 346.51it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=13.185, player_2/loss=363.260, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 348.97it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=18.524, player_2/loss=398.161, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 348.06it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=21.525, player_2/loss=406.427, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 346.36it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=10.990, player_2/loss=368.738, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 346.10it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=11.558, player_2/loss=367.146, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 347.31it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=5.183, player_2/loss=385.490, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 347.47it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=16.174, player_2/loss=365.788, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 343.12it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=24.864, player_2/loss=409.059, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 347.06it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=18.311, player_2/loss=379.731, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 349.56it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=191.527, player_2/loss=229.913, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 349.85it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=227.394, player_2/loss=204.574, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 350.35it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=222.220, player_2/loss=141.765, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 351.71it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=172.185, player_2/loss=92.776, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 349.21it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=215.472, player_2/loss=67.094, rew=5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 348.20it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=214.402, player_2/loss=42.882, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 348.44it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=214.326, player_2/loss=38.672, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 349.88it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=277.527, rew=25.00]         


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 346.08it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=267.949, player_2/loss=40.636, rew=16.67]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 350.66it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=299.235, player_2/loss=43.190, rew=15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 350.44it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=278.857, player_2/loss=34.508, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 351.60it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=226.398, player_2/loss=28.698, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 350.62it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=208.451, player_2/loss=65.849, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 344.36it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=219.284, player_2/loss=57.876, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 349.03it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=210.825, player_2/loss=13.481, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 347.44it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=237.763, player_2/loss=25.933, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 347.68it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=250.041, player_2/loss=29.941, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 349.02it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=197.842, player_2/loss=47.163, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 350.66it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=232.715, player_2/loss=48.673, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 350.13it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=135.121, player_2/loss=2.195, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.32it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=116.793, player_2/loss=44.542, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.61it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=141.965, player_2/loss=94.281, rew=10.71]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.29it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=157.593, rew=25.00]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 348.24it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=60.328, player_2/loss=431.006, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 346.58it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=57.513, player_2/loss=483.395, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 349.59it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=46.018, player_2/loss=479.617, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 344.78it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=23.081, player_2/loss=485.312, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 348.62it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=19.762, rew=25.00]           


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 346.97it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=13.648, player_2/loss=530.035, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 347.30it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=8.414, player_2/loss=510.871, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 346.69it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=6.407, player_2/loss=527.962, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 349.68it/s, env_step=13312, len=8, n/ep=9, n/st=64, player_1/loss=8.488, player_2/loss=534.885, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 348.14it/s, env_step=14336, len=8, n/ep=9, n/st=64, player_1/loss=7.016, player_2/loss=555.939, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 348.10it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=6.613, player_2/loss=509.984, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 344.56it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=8.706, player_2/loss=574.020, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 346.87it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=17.605, player_2/loss=519.354, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 348.96it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=35.162, player_2/loss=493.293, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 350.34it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=27.873, player_2/loss=508.961, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 345.10it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=6.135, player_2/loss=421.539, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.39it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=136.277, player_2/loss=327.400, rew=15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 353.44it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=258.448, player_2/loss=192.035, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.46it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=298.603, player_2/loss=120.256, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 346.70it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=220.381, player_2/loss=134.565, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 348.32it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_2/loss=170.338, rew=-25.00]        


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 351.44it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=119.515, player_2/loss=122.349, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 351.53it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=81.932, player_2/loss=100.864, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 353.07it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=107.034, player_2/loss=104.396, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 348.50it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=177.263, player_2/loss=129.552, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 351.90it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=208.705, player_2/loss=155.362, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 350.45it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=215.421, player_2/loss=105.575, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 352.83it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=196.551, player_2/loss=82.150, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 346.47it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=168.791, player_2/loss=90.296, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 349.00it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=188.362, player_2/loss=94.684, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 351.57it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=217.892, player_2/loss=115.870, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 348.61it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=238.556, player_2/loss=90.494, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 351.45it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=217.029, player_2/loss=73.674, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 346.74it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=228.770, player_2/loss=58.703, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 348.91it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=139.572, player_2/loss=77.460, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.69it/s, env_step=2048, len=27, n/ep=2, n/st=64, player_1/loss=127.666, player_2/loss=111.072, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.31it/s, env_step=3072, len=29, n/ep=2, n/st=64, player_1/loss=151.063, player_2/loss=112.427, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 346.34it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=158.157, player_2/loss=97.209, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 351.26it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=122.388, player_2/loss=105.536, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 349.87it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=72.603, player_2/loss=115.567, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 349.29it/s, env_step=7168, len=16, n/ep=5, n/st=64, player_1/loss=106.490, player_2/loss=180.430, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 348.38it/s, env_step=8192, len=22, n/ep=2, n/st=64, player_1/loss=99.391, player_2/loss=164.095, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 346.34it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=78.316, player_2/loss=81.083, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 348.56it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=56.964, player_2/loss=131.460, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 348.74it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=66.265, player_2/loss=303.256, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 348.31it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=100.922, player_2/loss=382.298, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 343.92it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=107.452, player_2/loss=426.867, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 345.77it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=43.511, player_2/loss=338.267, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 346.90it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=16.439, player_2/loss=321.016, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 349.36it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=23.511, player_2/loss=332.183, rew=6.25]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 343.95it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=24.822, player_2/loss=304.629, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 348.28it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=46.318, rew=25.00]         


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 348.36it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=73.585, player_2/loss=282.429, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 350.37it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=27.475, player_2/loss=218.280, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.42it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=53.950, player_2/loss=215.948, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 347.93it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_2/loss=203.863, rew=0.00]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 348.20it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=115.476, player_2/loss=137.234, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 349.96it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=197.668, player_2/loss=70.928, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 348.12it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=253.913, player_2/loss=51.591, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 349.02it/s, env_step=7168, len=13, n/ep=3, n/st=64, player_1/loss=267.562, player_2/loss=55.797, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 347.56it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=202.957, player_2/loss=55.317, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 349.38it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=164.657, player_2/loss=59.151, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 348.51it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=178.827, player_2/loss=55.101, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 349.72it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=198.893, player_2/loss=49.496, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 346.62it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=189.246, player_2/loss=42.665, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 348.32it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_2/loss=8.056, rew=25.00]         


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 350.80it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=199.511, player_2/loss=9.317, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 348.58it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=174.804, player_2/loss=29.887, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 348.18it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=186.641, player_2/loss=27.986, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 349.00it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=238.385, player_2/loss=33.328, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 351.66it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=230.922, player_2/loss=37.172, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.49it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=179.786, player_2/loss=50.155, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 350.16it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=243.445, player_2/loss=38.711, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.31it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=202.006, player_2/loss=183.318, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 348.89it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=72.897, player_2/loss=248.131, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 348.84it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=76.820, player_2/loss=364.079, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 349.13it/s, env_step=5120, len=10, n/ep=5, n/st=64, player_1/loss=103.461, player_2/loss=409.840, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 351.77it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=119.453, player_2/loss=276.647, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 348.31it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=134.907, player_2/loss=328.970, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 349.31it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=121.486, player_2/loss=427.863, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 348.47it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=108.969, player_2/loss=415.266, rew=-5.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 346.77it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=83.825, rew=15.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 348.88it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=107.322, player_2/loss=295.396, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.37it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=91.605, player_2/loss=331.647, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 349.74it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=37.642, player_2/loss=400.112, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 348.45it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=24.471, player_2/loss=420.072, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 348.34it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=36.903, player_2/loss=397.970, rew=12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 347.33it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=42.108, player_2/loss=281.915, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 344.97it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=103.979, player_2/loss=243.648, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 348.34it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=109.973, player_2/loss=270.717, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.26it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=67.929, player_2/loss=288.326, rew=-5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 350.62it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=130.189, player_2/loss=318.647, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.12it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=158.117, player_2/loss=233.955, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 350.04it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=131.594, player_2/loss=96.101, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 350.63it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=158.521, player_2/loss=71.420, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 350.31it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=171.347, player_2/loss=71.446, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 343.68it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=190.619, player_2/loss=52.357, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 351.21it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=182.310, player_2/loss=50.398, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 352.43it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=149.555, player_2/loss=67.219, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 346.07it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=110.217, player_2/loss=54.264, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 351.64it/s, env_step=10240, len=20, n/ep=4, n/st=64, player_1/loss=114.964, player_2/loss=41.848, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 350.92it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=121.891, player_2/loss=37.152, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 351.91it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=140.285, player_2/loss=20.229, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 349.31it/s, env_step=13312, len=16, n/ep=3, n/st=64, player_1/loss=136.748, player_2/loss=46.873, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 345.87it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=160.786, player_2/loss=60.455, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 352.12it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=183.624, player_2/loss=53.773, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 352.57it/s, env_step=16384, len=23, n/ep=3, n/st=64, player_1/loss=140.703, player_2/loss=37.624, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 350.59it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=138.784, player_2/loss=13.818, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 346.76it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=144.083, player_2/loss=11.936, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.63it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=120.959, player_2/loss=19.370, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 350.20it/s, env_step=1024, len=29, n/ep=2, n/st=64, player_1/loss=114.650, player_2/loss=53.794, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.67it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=115.284, player_2/loss=63.688, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 346.68it/s, env_step=3072, len=19, n/ep=2, n/st=64, player_1/loss=85.145, player_2/loss=72.409, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 349.84it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=67.378, player_2/loss=67.205, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 347.90it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=76.349, player_2/loss=96.227, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 350.56it/s, env_step=6144, len=23, n/ep=2, n/st=64, player_1/loss=122.400, player_2/loss=114.595, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 345.96it/s, env_step=7168, len=24, n/ep=2, n/st=64, player_1/loss=125.866, player_2/loss=145.736, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 348.07it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=62.017, player_2/loss=218.177, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 351.39it/s, env_step=9216, len=21, n/ep=3, n/st=64, player_1/loss=46.648, player_2/loss=181.538, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 350.67it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=36.297, player_2/loss=111.633, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 348.16it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=23.351, player_2/loss=172.914, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 348.00it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=64.030, player_2/loss=251.702, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 349.36it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=69.311, player_2/loss=197.821, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 348.61it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=56.184, player_2/loss=148.978, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 347.72it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=76.279, player_2/loss=170.909, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 347.81it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=110.697, player_2/loss=242.644, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 350.32it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=77.428, player_2/loss=261.742, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 347.19it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=55.969, player_2/loss=261.311, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 345.15it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=67.325, player_2/loss=256.252, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 350.91it/s, env_step=1024, len=18, n/ep=3, n/st=64, player_1/loss=167.422, player_2/loss=186.319, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 350.59it/s, env_step=2048, len=16, n/ep=3, n/st=64, player_1/loss=114.390, player_2/loss=150.329, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 348.97it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=87.428, player_2/loss=102.824, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 350.90it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=87.399, player_2/loss=84.574, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 348.65it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=84.914, player_2/loss=70.626, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 348.87it/s, env_step=6144, len=24, n/ep=2, n/st=64, player_1/loss=58.221, player_2/loss=69.602, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 353.52it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=74.979, rew=-25.00]         


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 343.69it/s, env_step=8192, len=25, n/ep=3, n/st=64, player_1/loss=98.780, player_2/loss=137.377, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 349.11it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=86.179, player_2/loss=124.541, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 352.24it/s, env_step=10240, len=16, n/ep=5, n/st=64, player_1/loss=55.143, player_2/loss=88.098, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 349.91it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=43.276, player_2/loss=63.311, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 348.91it/s, env_step=12288, len=15, n/ep=5, n/st=64, player_1/loss=46.421, player_2/loss=31.458, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 348.81it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=123.950, player_2/loss=107.732, rew=6.25]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 349.77it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=211.691, player_2/loss=158.403, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 342.30it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_2/loss=108.156, rew=17.86]        


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 347.02it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=222.557, player_2/loss=56.127, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 344.25it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=244.401, player_2/loss=72.069, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 347.90it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=208.583, player_2/loss=61.869, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.89it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=190.081, player_2/loss=97.180, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 345.41it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=180.910, player_2/loss=417.554, rew=2.78]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.83it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=227.184, player_2/loss=603.702, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.56it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=181.194, player_2/loss=786.785, rew=10.71]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 350.81it/s, env_step=4096, len=8, n/ep=9, n/st=64, player_1/loss=126.312, player_2/loss=774.574, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 349.69it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=122.796, player_2/loss=722.049, rew=13.89]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 343.27it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=100.506, player_2/loss=749.826, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 344.30it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=90.920, player_2/loss=734.612, rew=13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 344.73it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=51.044, player_2/loss=648.297, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 345.89it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=54.914, player_2/loss=655.950, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 348.42it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=65.442, player_2/loss=569.821, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 343.16it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=82.409, player_2/loss=550.914, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 348.97it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=135.735, player_2/loss=577.504, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 347.44it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=138.683, player_2/loss=613.593, rew=19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 347.99it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=95.500, player_2/loss=757.484, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 345.78it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=53.765, player_2/loss=774.365, rew=15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 347.36it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=39.248, player_2/loss=617.244, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 346.46it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=34.860, player_2/loss=648.116, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 348.74it/s, env_step=18432, len=7, n/ep=6, n/st=64, player_1/loss=61.779, player_2/loss=507.163, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 346.85it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=86.444, player_2/loss=523.745, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 347.89it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=206.186, player_2/loss=234.278, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 351.04it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=202.134, player_2/loss=205.073, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 349.41it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=214.959, player_2/loss=160.507, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 348.38it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=196.115, player_2/loss=144.807, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 346.25it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=152.632, player_2/loss=93.165, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 350.02it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=140.914, player_2/loss=69.041, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 347.42it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=165.872, player_2/loss=52.641, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 349.17it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=191.961, player_2/loss=37.532, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 348.42it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=175.950, player_2/loss=43.306, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 347.25it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=180.728, player_2/loss=37.233, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 353.60it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=190.335, player_2/loss=48.213, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 347.22it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=159.215, player_2/loss=56.054, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 345.28it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=148.381, player_2/loss=42.344, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 349.25it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=143.165, player_2/loss=22.108, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 350.77it/s, env_step=15360, len=20, n/ep=4, n/st=64, player_1/loss=152.207, player_2/loss=50.149, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 349.92it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=161.177, player_2/loss=71.780, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 345.86it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=175.421, rew=25.00]       


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 350.61it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=154.295, player_2/loss=27.074, rew=12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 347.00it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=131.047, player_2/loss=58.231, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 349.67it/s, env_step=1024, len=22, n/ep=2, n/st=64, player_1/loss=128.498, player_2/loss=123.186, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 344.31it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=123.199, player_2/loss=139.068, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 350.90it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=116.313, player_2/loss=121.564, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 348.14it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=113.033, player_2/loss=146.955, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 347.59it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=111.189, player_2/loss=226.412, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 347.57it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=58.061, player_2/loss=237.558, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 345.07it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=48.003, player_2/loss=251.114, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 348.67it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=82.929, player_2/loss=278.335, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 347.65it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=98.561, player_2/loss=264.744, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 340.70it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=54.090, player_2/loss=263.485, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 342.79it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=43.034, player_2/loss=224.676, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 345.29it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=21.005, player_2/loss=244.251, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 346.70it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=59.475, player_2/loss=220.554, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 346.66it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=79.821, player_2/loss=198.034, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 342.40it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=50.254, player_2/loss=193.398, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 346.72it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_2/loss=211.214, rew=25.00]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 346.07it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=54.095, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 351.34it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=61.473, player_2/loss=225.604, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 343.90it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=48.521, player_2/loss=271.729, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 348.89it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=66.330, player_2/loss=222.689, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.56it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=67.104, player_2/loss=159.168, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.97it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=55.577, player_2/loss=132.347, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 344.60it/s, env_step=4096, len=11, n/ep=8, n/st=64, player_1/loss=63.091, player_2/loss=123.587, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 349.48it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=122.356, player_2/loss=98.543, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 347.88it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=159.377, player_2/loss=91.755, rew=-16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 350.20it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=127.787, player_2/loss=105.187, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 346.94it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=136.617, player_2/loss=92.600, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 345.02it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=170.674, rew=25.00]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 347.62it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=126.841, player_2/loss=47.799, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 353.32it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=107.590, player_2/loss=55.738, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 344.79it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=129.994, player_2/loss=32.346, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 354.87it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=110.918, player_2/loss=33.089, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 349.79it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=109.601, player_2/loss=26.848, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 349.11it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=112.636, player_2/loss=17.450, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 344.50it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=109.129, player_2/loss=29.353, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 348.27it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=110.537, player_2/loss=29.653, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 348.70it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=87.816, player_2/loss=50.414, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 348.82it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=83.767, player_2/loss=52.318, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 348.06it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=70.231, player_2/loss=42.946, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.16it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=64.496, player_2/loss=36.256, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.19it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=76.952, player_2/loss=79.719, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 348.67it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=79.828, player_2/loss=115.191, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 351.49it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=67.593, player_2/loss=91.393, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 347.65it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=114.150, rew=25.00]         


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 349.18it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=96.852, player_2/loss=223.224, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 349.81it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_2/loss=262.971, rew=15.00]         


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 350.03it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=29.848, player_2/loss=266.050, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 345.20it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=20.663, player_2/loss=254.022, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 348.62it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=39.565, player_2/loss=219.271, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 346.48it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=50.863, player_2/loss=200.268, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 349.17it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=32.607, player_2/loss=256.589, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 350.81it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=54.262, player_2/loss=250.066, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 345.54it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=49.070, player_2/loss=215.307, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 349.35it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=13.777, player_2/loss=247.719, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 347.62it/s, env_step=17408, len=7, n/ep=6, n/st=64, player_1/loss=24.586, player_2/loss=307.418, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 348.62it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=53.121, player_2/loss=307.842, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 343.65it/s, env_step=19456, len=14, n/ep=6, n/st=64, player_1/loss=42.731, player_2/loss=278.254, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 345.82it/s, env_step=1024, len=31, n/ep=2, n/st=64, player_1/loss=67.434, player_2/loss=166.603, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 346.69it/s, env_step=2048, len=20, n/ep=4, n/st=64, player_1/loss=93.716, player_2/loss=125.788, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 350.34it/s, env_step=3072, len=27, n/ep=2, n/st=64, player_1/loss=111.879, player_2/loss=94.598, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 349.95it/s, env_step=4096, len=26, n/ep=2, n/st=64, player_1/loss=121.876, player_2/loss=106.791, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 345.82it/s, env_step=5120, len=27, n/ep=2, n/st=64, player_1/loss=91.714, player_2/loss=111.736, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 350.45it/s, env_step=6144, len=29, n/ep=2, n/st=64, player_1/loss=46.886, player_2/loss=90.200, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 348.59it/s, env_step=7168, len=25, n/ep=3, n/st=64, player_1/loss=36.924, player_2/loss=51.245, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 351.04it/s, env_step=8192, len=30, n/ep=2, n/st=64, player_1/loss=83.375, rew=-25.00]         


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 346.74it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=80.299, player_2/loss=68.378, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 348.72it/s, env_step=10240, len=26, n/ep=2, n/st=64, player_1/loss=60.419, player_2/loss=87.033, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 350.69it/s, env_step=11264, len=26, n/ep=3, n/st=64, player_1/loss=105.741, player_2/loss=91.396, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 349.03it/s, env_step=12288, len=25, n/ep=2, n/st=64, player_1/loss=118.330, player_2/loss=59.148, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 349.12it/s, env_step=13312, len=27, n/ep=3, n/st=64, player_1/loss=110.050, player_2/loss=76.073, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 348.03it/s, env_step=14336, len=28, n/ep=3, n/st=64, player_1/loss=133.626, player_2/loss=68.070, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 352.41it/s, env_step=15360, len=25, n/ep=2, n/st=64, player_1/loss=126.184, player_2/loss=61.949, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 348.76it/s, env_step=16384, len=29, n/ep=3, n/st=64, player_1/loss=93.974, player_2/loss=79.870, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 348.81it/s, env_step=17408, len=27, n/ep=3, n/st=64, player_1/loss=87.127, player_2/loss=76.546, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 347.79it/s, env_step=18432, len=26, n/ep=3, n/st=64, player_1/loss=101.023, player_2/loss=47.569, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 349.65it/s, env_step=19456, len=28, n/ep=2, n/st=64, player_1/loss=95.545, player_2/loss=105.046, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 346.55it/s, env_step=1024, len=27, n/ep=2, n/st=64, player_1/loss=103.613, player_2/loss=62.282, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 349.53it/s, env_step=2048, len=31, n/ep=2, n/st=64, player_1/loss=89.733, player_2/loss=56.503, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.50it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=84.895, player_2/loss=78.434, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 350.54it/s, env_step=4096, len=32, n/ep=2, n/st=64, player_1/loss=96.259, player_2/loss=86.115, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 350.15it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=124.776, player_2/loss=117.621, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 348.11it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=109.872, rew=15.00]         


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 346.15it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=93.411, player_2/loss=136.017, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 342.92it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=69.156, player_2/loss=143.238, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 347.61it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=56.335, player_2/loss=174.180, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 347.82it/s, env_step=10240, len=16, n/ep=5, n/st=64, player_1/loss=43.734, player_2/loss=233.546, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 345.87it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=28.375, player_2/loss=244.629, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 343.80it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=10.076, player_2/loss=230.524, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 349.31it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=48.819, player_2/loss=200.894, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 347.50it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=55.851, player_2/loss=172.862, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 348.13it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=38.280, player_2/loss=175.867, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 347.34it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=50.702, player_2/loss=133.321, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 348.02it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=41.101, player_2/loss=152.881, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 347.69it/s, env_step=18432, len=16, n/ep=3, n/st=64, player_1/loss=37.846, player_2/loss=172.931, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 346.95it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=23.086, player_2/loss=183.984, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 347.67it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=45.400, player_2/loss=217.899, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.22it/s, env_step=2048, len=10, n/ep=5, n/st=64, player_1/loss=61.566, player_2/loss=194.839, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.45it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=92.224, player_2/loss=163.867, rew=-18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 346.29it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=114.743, player_2/loss=122.321, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 346.41it/s, env_step=5120, len=9, n/ep=8, n/st=64, player_1/loss=108.241, player_2/loss=91.503, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 349.43it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=125.276, player_2/loss=119.707, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 345.52it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=125.578, player_2/loss=112.317, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 346.62it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=181.221, player_2/loss=122.725, rew=12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 349.05it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=214.009, player_2/loss=149.161, rew=-13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 346.66it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=172.296, player_2/loss=116.338, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 345.33it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=136.657, player_2/loss=97.063, rew=5.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 348.38it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=181.671, player_2/loss=88.304, rew=5.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 348.16it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=226.661, player_2/loss=72.995, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 347.94it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=165.773, player_2/loss=104.524, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 345.60it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=182.255, player_2/loss=140.265, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 348.16it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=212.645, player_2/loss=225.353, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 356.44it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=188.123, player_2/loss=165.343, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 349.58it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=206.651, player_2/loss=92.461, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 343.14it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=203.037, player_2/loss=99.386, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 347.02it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=223.140, player_2/loss=175.420, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.50it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=241.734, player_2/loss=188.927, rew=-18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.36it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=225.402, player_2/loss=181.596, rew=-18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.14it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_2/loss=153.389, rew=-25.00]         


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 345.07it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=151.027, player_2/loss=152.216, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 347.66it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=130.956, player_2/loss=205.923, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 346.90it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=224.384, player_2/loss=250.579, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 346.08it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=275.366, player_2/loss=265.852, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:02, 344.65it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=220.626, player_2/loss=285.009, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:02, 347.30it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=212.241, player_2/loss=266.850, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:02, 346.16it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=160.699, player_2/loss=262.186, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:02, 349.49it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=199.850, player_2/loss=269.757, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:02, 346.62it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=151.991, player_2/loss=315.356, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:02, 342.91it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=82.010, player_2/loss=381.242, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 345.93it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=50.054, player_2/loss=339.462, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:02, 345.84it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=94.825, player_2/loss=340.530, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:02, 346.29it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=112.438, player_2/loss=301.054, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:03, 340.50it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=62.741, player_2/loss=305.122, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:02, 346.21it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=45.507, player_2/loss=292.537, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 348.60it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=36.424, player_2/loss=301.844, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.21it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=91.243, player_2/loss=270.808, rew=-15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 350.02it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=143.981, player_2/loss=180.640, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 346.55it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=175.227, player_2/loss=161.479, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 347.50it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=305.325, player_2/loss=136.709, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 350.71it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=455.273, player_2/loss=77.464, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 349.87it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=482.212, player_2/loss=65.132, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 343.82it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=389.736, player_2/loss=73.678, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 348.75it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=470.503, player_2/loss=45.229, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 348.16it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=557.843, player_2/loss=17.227, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 347.85it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=470.884, player_2/loss=17.905, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.62it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=361.448, player_2/loss=22.149, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 343.22it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=385.000, player_2/loss=32.493, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 347.27it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=417.656, player_2/loss=18.744, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 349.52it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=425.607, player_2/loss=36.747, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 349.57it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=483.981, player_2/loss=41.425, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 348.65it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=492.752, player_2/loss=33.434, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 348.25it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=388.680, rew=25.00]       


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 346.68it/s, env_step=19456, len=9, n/ep=6, n/st=64, player_1/loss=368.580, player_2/loss=16.039, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 347.29it/s, env_step=1024, len=13, n/ep=4, n/st=64, player_1/loss=208.480, player_2/loss=63.632, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.48it/s, env_step=2048, len=13, n/ep=6, n/st=64, player_1/loss=217.495, player_2/loss=170.448, rew=16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.78it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=198.573, player_2/loss=249.939, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 346.68it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=146.962, player_2/loss=309.699, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 346.68it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=105.371, player_2/loss=268.431, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 347.23it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=80.345, player_2/loss=271.882, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 341.97it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=55.693, player_2/loss=261.659, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 343.90it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=26.717, player_2/loss=280.996, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 348.17it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=45.833, player_2/loss=211.369, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 349.39it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=117.847, player_2/loss=210.773, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 349.93it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=178.770, player_2/loss=271.933, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 334.45it/s, env_step=12288, len=17, n/ep=5, n/st=64, player_1/loss=140.743, player_2/loss=227.840, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 346.22it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=87.511, player_2/loss=227.985, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 348.35it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=39.765, player_2/loss=249.581, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 348.74it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=63.100, player_2/loss=274.429, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 348.82it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=82.110, player_2/loss=280.083, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 346.45it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=67.073, player_2/loss=298.043, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 347.48it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=68.581, player_2/loss=243.627, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 346.15it/s, env_step=19456, len=15, n/ep=3, n/st=64, player_1/loss=75.567, player_2/loss=290.610, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 346.47it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=63.574, player_2/loss=195.124, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 349.50it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=119.929, player_2/loss=121.554, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 349.77it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=158.593, player_2/loss=54.598, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 346.96it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=229.258, player_2/loss=33.259, rew=5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 349.17it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=277.836, player_2/loss=14.569, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 347.90it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=259.842, rew=15.00]         


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 348.55it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=234.751, player_2/loss=89.795, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 349.85it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=215.799, player_2/loss=69.717, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 350.09it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=182.839, player_2/loss=23.658, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 346.15it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=203.382, player_2/loss=37.605, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 349.54it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=181.087, player_2/loss=59.682, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 350.11it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=233.647, player_2/loss=65.222, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 348.61it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=272.264, player_2/loss=38.551, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 348.76it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=264.614, player_2/loss=15.802, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 346.23it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=255.434, player_2/loss=9.051, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 347.26it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=239.918, player_2/loss=36.903, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 349.17it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=212.112, player_2/loss=74.655, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 346.86it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=202.277, player_2/loss=17.865, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 345.56it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=208.791, player_2/loss=37.579, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 352.20it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=158.449, player_2/loss=18.972, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.05it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=126.766, player_2/loss=16.859, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.85it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=82.175, player_2/loss=20.297, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 348.08it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=48.291, player_2/loss=21.966, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 344.53it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=44.941, player_2/loss=17.400, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 347.77it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=44.291, player_2/loss=14.669, rew=-12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 348.63it/s, env_step=7168, len=15, n/ep=5, n/st=64, player_1/loss=67.781, player_2/loss=87.219, rew=15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.06it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=145.685, player_2/loss=155.564, rew=-15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 347.23it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=162.361, player_2/loss=306.978, rew=5.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 348.70it/s, env_step=10240, len=8, n/ep=9, n/st=64, player_1/loss=110.839, player_2/loss=383.587, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 346.99it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=81.231, player_2/loss=385.653, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 347.58it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=68.023, player_2/loss=350.759, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 347.91it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=60.725, player_2/loss=365.048, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 343.80it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=35.729, player_2/loss=397.249, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 346.63it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=14.022, player_2/loss=403.900, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 347.35it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=32.958, player_2/loss=396.334, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 348.12it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=37.468, player_2/loss=396.177, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 346.60it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=18.197, player_2/loss=369.866, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 345.35it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=16.228, player_2/loss=332.700, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 349.82it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=122.135, player_2/loss=288.733, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.11it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=154.348, player_2/loss=242.696, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 348.56it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=174.335, player_2/loss=142.200, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 346.17it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=151.339, player_2/loss=99.332, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 347.21it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=145.735, player_2/loss=102.272, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 348.30it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=234.955, player_2/loss=150.587, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 348.18it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=264.856, player_2/loss=174.249, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 345.91it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=170.361, player_2/loss=170.829, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 349.15it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=143.520, player_2/loss=144.686, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 349.77it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=154.477, player_2/loss=113.146, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 349.92it/s, env_step=11264, len=16, n/ep=5, n/st=64, player_1/loss=169.575, player_2/loss=68.649, rew=5.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 350.02it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=179.746, player_2/loss=52.552, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 345.00it/s, env_step=13312, len=11, n/ep=4, n/st=64, player_1/loss=179.654, player_2/loss=49.454, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 348.18it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=161.281, player_2/loss=44.564, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 349.98it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=162.662, player_2/loss=18.469, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 349.28it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=159.329, player_2/loss=12.505, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 346.46it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=145.771, player_2/loss=22.571, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 348.51it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=157.584, player_2/loss=54.083, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 349.02it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=188.371, player_2/loss=41.024, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 349.14it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=129.998, player_2/loss=3.544, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.39it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=103.783, player_2/loss=10.152, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 348.55it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=124.264, player_2/loss=83.566, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 346.47it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=129.863, player_2/loss=154.200, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 346.73it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=108.918, player_2/loss=181.809, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 343.70it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=105.715, player_2/loss=253.778, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 347.23it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=90.324, player_2/loss=298.901, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 345.47it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=40.738, player_2/loss=402.890, rew=-5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 346.70it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=51.725, player_2/loss=433.507, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 347.42it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=68.469, player_2/loss=340.694, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 344.38it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=90.406, player_2/loss=335.370, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 347.61it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=115.089, player_2/loss=362.496, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 346.22it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=69.177, player_2/loss=425.675, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 343.34it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=42.379, player_2/loss=330.279, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 346.30it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=39.333, player_2/loss=254.041, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 348.13it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=56.341, player_2/loss=303.317, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 346.20it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=79.966, player_2/loss=349.144, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 347.30it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=84.515, player_2/loss=340.411, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 348.48it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=83.868, player_2/loss=274.598, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 344.61it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=102.928, player_2/loss=243.054, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.15it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=133.477, player_2/loss=192.505, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.62it/s, env_step=3072, len=28, n/ep=2, n/st=64, player_1/loss=135.473, player_2/loss=105.539, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.75it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=104.953, player_2/loss=93.981, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 346.51it/s, env_step=5120, len=30, n/ep=2, n/st=64, player_1/loss=85.081, player_2/loss=75.058, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 350.58it/s, env_step=6144, len=25, n/ep=3, n/st=64, player_1/loss=79.767, player_2/loss=77.603, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 347.95it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=107.442, player_2/loss=104.135, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 351.37it/s, env_step=8192, len=19, n/ep=2, n/st=64, player_1/loss=140.946, player_2/loss=111.337, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 349.76it/s, env_step=9216, len=21, n/ep=2, n/st=64, player_1/loss=106.889, player_2/loss=112.867, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 347.50it/s, env_step=10240, len=24, n/ep=3, n/st=64, player_1/loss=104.594, player_2/loss=112.460, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 349.24it/s, env_step=11264, len=27, n/ep=3, n/st=64, player_1/loss=144.016, player_2/loss=105.147, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 345.34it/s, env_step=12288, len=28, n/ep=3, n/st=64, player_1/loss=128.422, player_2/loss=74.989, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 349.20it/s, env_step=13312, len=33, n/ep=2, n/st=64, player_1/loss=130.917, player_2/loss=77.887, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 344.34it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=150.053, player_2/loss=110.243, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 348.00it/s, env_step=15360, len=30, n/ep=2, n/st=64, player_1/loss=112.933, player_2/loss=78.054, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 346.96it/s, env_step=16384, len=27, n/ep=2, n/st=64, player_1/loss=85.384, player_2/loss=42.842, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 348.56it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=78.412, player_2/loss=46.830, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 345.83it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=140.319, player_2/loss=74.617, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 348.21it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=212.333, player_2/loss=109.694, rew=15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 346.46it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=186.223, player_2/loss=86.197, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.81it/s, env_step=2048, len=10, n/ep=7, n/st=64, player_1/loss=145.712, player_2/loss=76.419, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 345.62it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=113.277, player_2/loss=90.468, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 345.06it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=152.678, player_2/loss=129.314, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 349.59it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=154.270, player_2/loss=205.834, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 348.51it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=107.934, player_2/loss=273.059, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 349.90it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=63.020, player_2/loss=277.319, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 344.24it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=33.058, player_2/loss=250.362, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 346.63it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=23.120, player_2/loss=301.359, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 346.74it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=47.457, player_2/loss=290.314, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 345.48it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=50.306, player_2/loss=293.695, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 344.19it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=11.367, player_2/loss=293.145, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 349.43it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=9.729, player_2/loss=274.531, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 346.32it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=19.162, player_2/loss=290.543, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 348.25it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=26.139, player_2/loss=307.373, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 347.28it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=37.662, player_2/loss=269.288, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 342.52it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=34.860, player_2/loss=246.382, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 343.12it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=20.979, player_2/loss=284.388, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 345.57it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=17.291, player_2/loss=311.028, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 347.84it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=44.985, player_2/loss=239.790, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 344.40it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=129.292, player_2/loss=153.531, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 348.47it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=216.045, player_2/loss=89.734, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 345.34it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=228.360, player_2/loss=46.872, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 347.67it/s, env_step=5120, len=10, n/ep=5, n/st=64, player_1/loss=190.037, player_2/loss=42.390, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 342.01it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=199.638, player_2/loss=47.669, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 348.21it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=178.560, player_2/loss=41.026, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 346.20it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=182.066, player_2/loss=33.833, rew=16.67]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 348.60it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=200.921, player_2/loss=24.938, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 342.83it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=188.708, player_2/loss=24.790, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 345.80it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=215.949, player_2/loss=35.073, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 346.68it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=265.283, player_2/loss=34.825, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 347.79it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=314.775, player_2/loss=12.886, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 345.15it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=263.785, player_2/loss=50.329, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 348.06it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=188.147, player_2/loss=44.028, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 347.92it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=202.490, player_2/loss=6.576, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 345.37it/s, env_step=17408, len=10, n/ep=5, n/st=64, player_1/loss=197.543, player_2/loss=5.053, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 345.37it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=209.125, player_2/loss=3.145, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 347.49it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=218.499, player_2/loss=6.086, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 346.27it/s, env_step=1024, len=10, n/ep=5, n/st=64, player_1/loss=136.873, player_2/loss=5.218, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.85it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=89.155, player_2/loss=11.449, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.55it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=101.736, player_2/loss=51.771, rew=-16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 344.69it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=98.383, player_2/loss=56.235, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 347.88it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=72.447, player_2/loss=82.809, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 345.97it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=118.270, player_2/loss=199.210, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 342.92it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=76.245, player_2/loss=281.608, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:03, 336.17it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=40.783, player_2/loss=265.251, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 346.36it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=44.014, player_2/loss=227.973, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 346.00it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=64.885, player_2/loss=243.260, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 345.12it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=66.834, player_2/loss=260.121, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 342.28it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=65.875, player_2/loss=246.549, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 345.80it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=122.338, player_2/loss=194.900, rew=-12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 344.71it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=115.524, player_2/loss=331.471, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 343.61it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=43.021, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 344.58it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=29.738, player_2/loss=569.856, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:03, 341.07it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=45.726, player_2/loss=351.067, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 345.78it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=44.704, player_2/loss=318.612, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 343.31it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=36.329, player_2/loss=353.717, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 348.61it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=57.066, player_2/loss=378.350, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.75it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=60.735, player_2/loss=283.439, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 345.88it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=84.945, player_2/loss=223.474, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 348.25it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=98.448, player_2/loss=221.841, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 347.42it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=72.433, player_2/loss=202.849, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 342.34it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=109.314, player_2/loss=177.001, rew=-5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 346.11it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=206.693, player_2/loss=128.079, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 347.22it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=298.440, player_2/loss=71.752, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 345.01it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=297.860, player_2/loss=25.772, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 344.84it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_2/loss=27.615, rew=25.00]         


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 341.94it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=253.293, player_2/loss=27.934, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 346.57it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=266.319, player_2/loss=14.659, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 346.00it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=277.377, player_2/loss=20.848, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 343.78it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=276.805, player_2/loss=33.017, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 343.60it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=296.023, player_2/loss=28.034, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 344.93it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=308.588, player_2/loss=10.752, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 345.44it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=302.779, player_2/loss=35.589, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 344.75it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=279.708, player_2/loss=84.531, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 343.63it/s, env_step=19456, len=9, n/ep=6, n/st=64, player_1/loss=225.386, player_2/loss=103.200, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 346.32it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=194.622, player_2/loss=30.954, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.83it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=200.169, player_2/loss=82.765, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 347.40it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=171.965, player_2/loss=221.442, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.63it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=128.445, player_2/loss=379.398, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 342.60it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=115.951, player_2/loss=391.591, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 346.58it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=85.064, player_2/loss=386.768, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 346.75it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=51.969, player_2/loss=434.096, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 343.71it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=49.107, player_2/loss=400.153, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 347.97it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=52.581, player_2/loss=413.465, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 344.39it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=53.946, player_2/loss=473.614, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 345.33it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=57.561, player_2/loss=450.065, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 345.55it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=71.704, player_2/loss=492.856, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 345.20it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=54.662, player_2/loss=413.303, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 343.12it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=31.682, player_2/loss=431.335, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 344.55it/s, env_step=15360, len=7, n/ep=10, n/st=64, player_1/loss=17.779, player_2/loss=405.293, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 348.10it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_2/loss=432.921, rew=25.00]        


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 337.58it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=30.450, player_2/loss=441.055, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 347.23it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=63.556, player_2/loss=419.033, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 344.83it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=84.628, player_2/loss=435.031, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 348.29it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=15.751, player_2/loss=316.104, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.11it/s, env_step=2048, len=7, n/ep=7, n/st=64, player_1/loss=30.471, player_2/loss=253.946, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 345.61it/s, env_step=3072, len=31, n/ep=2, n/st=64, player_1/loss=49.262, player_2/loss=148.651, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 350.83it/s, env_step=4096, len=28, n/ep=2, n/st=64, player_1/loss=59.296, player_2/loss=94.998, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 345.27it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=60.144, player_2/loss=89.609, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 348.18it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=78.127, player_2/loss=113.339, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 348.14it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=99.954, player_2/loss=149.855, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 347.81it/s, env_step=8192, len=30, n/ep=2, n/st=64, player_1/loss=88.275, player_2/loss=144.438, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 343.66it/s, env_step=9216, len=24, n/ep=2, n/st=64, player_1/loss=90.295, player_2/loss=109.656, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 348.59it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=89.241, player_2/loss=113.726, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 348.25it/s, env_step=11264, len=42, n/ep=1, n/st=64, player_1/loss=79.488, player_2/loss=118.880, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 349.11it/s, env_step=12288, len=32, n/ep=2, n/st=64, player_1/loss=68.539, player_2/loss=93.387, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 347.52it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=86.920, player_2/loss=83.512, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 344.70it/s, env_step=14336, len=20, n/ep=4, n/st=64, player_1/loss=113.527, player_2/loss=60.417, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 349.67it/s, env_step=15360, len=24, n/ep=3, n/st=64, player_1/loss=61.379, player_2/loss=82.691, rew=-8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 347.45it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=66.147, player_2/loss=111.254, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 349.64it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=128.648, player_2/loss=128.440, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 342.84it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=131.920, player_2/loss=181.290, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 347.59it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=123.946, player_2/loss=120.470, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 349.21it/s, env_step=1024, len=21, n/ep=2, n/st=64, player_1/loss=81.879, player_2/loss=124.914, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.03it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=57.688, player_2/loss=127.420, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.89it/s, env_step=3072, len=26, n/ep=2, n/st=64, player_1/loss=83.827, player_2/loss=127.213, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 346.42it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=95.385, player_2/loss=118.643, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 344.96it/s, env_step=5120, len=29, n/ep=2, n/st=64, player_1/loss=100.147, player_2/loss=127.506, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 346.97it/s, env_step=6144, len=25, n/ep=3, n/st=64, player_1/loss=119.734, player_2/loss=97.112, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 347.99it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=85.184, player_2/loss=80.067, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 344.48it/s, env_step=8192, len=25, n/ep=2, n/st=64, player_1/loss=96.498, player_2/loss=76.036, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 346.35it/s, env_step=9216, len=35, n/ep=2, n/st=64, player_1/loss=68.182, player_2/loss=82.860, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 347.91it/s, env_step=10240, len=29, n/ep=2, n/st=64, player_1/loss=42.873, player_2/loss=82.619, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 351.16it/s, env_step=11264, len=19, n/ep=4, n/st=64, player_1/loss=60.267, player_2/loss=88.963, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 340.02it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=74.128, player_2/loss=81.058, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 346.59it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=109.107, player_2/loss=116.622, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 345.13it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=77.110, player_2/loss=152.335, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 347.65it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=25.988, player_2/loss=142.882, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 343.36it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=12.452, player_2/loss=131.353, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 345.00it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=10.748, player_2/loss=139.593, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 345.15it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=42.829, player_2/loss=146.755, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 346.52it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=49.022, player_2/loss=116.633, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 341.40it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=49.695, player_2/loss=143.668, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.87it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=57.050, player_2/loss=125.621, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 346.14it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=60.963, player_2/loss=118.884, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 346.76it/s, env_step=4096, len=13, n/ep=6, n/st=64, player_1/loss=26.990, player_2/loss=106.580, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 345.95it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=104.129, player_2/loss=131.849, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 345.60it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=171.483, player_2/loss=136.130, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 345.67it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=128.185, player_2/loss=124.879, rew=-15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 344.67it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=106.097, player_2/loss=148.173, rew=10.71]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 346.68it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=139.435, player_2/loss=222.603, rew=-18.75]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 342.03it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=186.169, player_2/loss=237.624, rew=-19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 346.61it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=256.685, player_2/loss=138.033, rew=16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 346.23it/s, env_step=12288, len=10, n/ep=7, n/st=64, player_2/loss=113.270, rew=25.00]       


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 348.08it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=233.517, player_2/loss=115.235, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 347.77it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=217.120, player_2/loss=130.229, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 343.17it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=236.659, player_2/loss=148.815, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 345.84it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=254.451, player_2/loss=165.136, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 347.14it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=330.000, player_2/loss=143.442, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 343.49it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=410.248, player_2/loss=63.725, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 341.93it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=345.443, player_2/loss=29.024, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 346.37it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=176.706, player_2/loss=125.642, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.33it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=169.114, player_2/loss=150.284, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 344.20it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=205.397, rew=25.00]          


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 346.87it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=180.331, player_2/loss=341.775, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 343.10it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=162.158, player_2/loss=442.328, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 345.07it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=134.278, player_2/loss=483.044, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 344.05it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=126.881, player_2/loss=588.131, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 345.84it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_2/loss=485.394, rew=8.33]           


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 343.34it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=82.964, player_2/loss=296.569, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 344.01it/s, env_step=10240, len=14, n/ep=6, n/st=64, player_1/loss=88.071, player_2/loss=356.860, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 343.76it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=93.683, player_2/loss=399.226, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 346.21it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=57.983, player_2/loss=427.664, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 346.72it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=13.695, player_2/loss=492.317, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 339.49it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=65.568, player_2/loss=626.530, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 344.09it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=70.471, player_2/loss=602.934, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 345.50it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=48.560, player_2/loss=631.258, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 344.85it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=52.160, player_2/loss=515.033, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 345.50it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=87.090, player_2/loss=580.433, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 346.74it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=85.749, player_2/loss=711.837, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 345.71it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=45.266, player_2/loss=503.089, rew=-5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 348.12it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=155.860, player_2/loss=304.552, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 341.38it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=242.181, player_2/loss=83.579, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 348.20it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=210.925, player_2/loss=50.175, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 347.72it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=191.914, player_2/loss=40.932, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 349.81it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=198.924, player_2/loss=101.706, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 346.33it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=181.750, player_2/loss=88.665, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 343.30it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=261.064, player_2/loss=30.122, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 347.14it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=260.662, player_2/loss=45.529, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 345.25it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_2/loss=41.879, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 346.31it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=273.697, player_2/loss=40.331, rew=15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 343.45it/s, env_step=12288, len=12, n/ep=6, n/st=64, player_1/loss=251.024, player_2/loss=47.518, rew=16.67]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 346.97it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=221.512, player_2/loss=54.500, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 345.98it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=219.625, player_2/loss=77.124, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 348.41it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=287.560, player_2/loss=46.850, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 344.00it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=251.452, player_2/loss=43.270, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 347.03it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=271.008, player_2/loss=43.331, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 347.94it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=253.367, player_2/loss=42.734, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 346.52it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=235.994, player_2/loss=54.017, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 342.41it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=186.414, player_2/loss=75.441, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.38it/s, env_step=2048, len=20, n/ep=4, n/st=64, player_1/loss=171.467, player_2/loss=93.137, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.00it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=99.367, player_2/loss=109.069, rew=12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 350.09it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=54.024, player_2/loss=163.916, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 347.17it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=50.673, player_2/loss=232.822, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 345.39it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=79.167, player_2/loss=258.486, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 346.78it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_2/loss=307.527, rew=5.00]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 346.24it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=83.005, player_2/loss=245.353, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 347.86it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=73.806, player_2/loss=151.432, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 347.34it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=81.278, player_2/loss=147.129, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 343.74it/s, env_step=11264, len=19, n/ep=4, n/st=64, player_1/loss=78.236, player_2/loss=130.925, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 348.00it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=78.352, player_2/loss=155.841, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 347.38it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=56.358, player_2/loss=177.279, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 346.98it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=45.981, player_2/loss=158.205, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 347.63it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=71.234, player_2/loss=177.476, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 345.98it/s, env_step=16384, len=20, n/ep=4, n/st=64, player_1/loss=80.194, player_2/loss=166.864, rew=12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 346.50it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=107.823, player_2/loss=195.343, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 345.68it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=125.935, player_2/loss=279.227, rew=10.71]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 340.06it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=88.246, player_2/loss=371.003, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 347.45it/s, env_step=1024, len=27, n/ep=2, n/st=64, player_1/loss=50.336, player_2/loss=268.387, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 344.99it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=79.054, player_2/loss=183.734, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 344.66it/s, env_step=3072, len=25, n/ep=2, n/st=64, player_1/loss=91.390, player_2/loss=96.884, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 348.52it/s, env_step=4096, len=22, n/ep=4, n/st=64, player_1/loss=108.242, player_2/loss=83.281, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 343.12it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=120.064, player_2/loss=76.196, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 346.06it/s, env_step=6144, len=31, n/ep=2, n/st=64, player_1/loss=99.049, player_2/loss=100.605, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 348.07it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=90.787, player_2/loss=106.727, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 348.86it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=161.407, player_2/loss=51.261, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 344.36it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=180.436, player_2/loss=59.883, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 345.02it/s, env_step=10240, len=12, n/ep=3, n/st=64, player_1/loss=272.458, player_2/loss=77.009, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 347.40it/s, env_step=11264, len=17, n/ep=3, n/st=64, player_1/loss=233.632, player_2/loss=67.085, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 348.99it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=168.608, player_2/loss=99.596, rew=-8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 344.00it/s, env_step=13312, len=35, n/ep=2, n/st=64, player_1/loss=107.145, player_2/loss=67.970, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 346.56it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=61.326, player_2/loss=38.563, rew=3.57]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 346.70it/s, env_step=15360, len=29, n/ep=2, n/st=64, player_1/loss=100.919, player_2/loss=37.355, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 345.98it/s, env_step=16384, len=28, n/ep=3, n/st=64, player_1/loss=113.690, player_2/loss=30.275, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 345.68it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=138.711, player_2/loss=56.084, rew=8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 347.36it/s, env_step=18432, len=25, n/ep=3, n/st=64, player_1/loss=137.923, player_2/loss=75.890, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 346.66it/s, env_step=19456, len=21, n/ep=4, n/st=64, player_1/loss=156.685, player_2/loss=86.927, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 348.06it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=34.172, player_2/loss=138.993, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.83it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=51.088, player_2/loss=95.894, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 348.60it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=111.477, player_2/loss=116.416, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 347.66it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=103.084, player_2/loss=138.423, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 343.91it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=30.505, player_2/loss=98.868, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 347.22it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=38.076, player_2/loss=115.107, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 342.34it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=22.676, player_2/loss=121.944, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 344.19it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=7.834, rew=25.00]           


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 345.39it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=26.741, player_2/loss=129.842, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 346.23it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=30.143, player_2/loss=128.541, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 348.13it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=23.613, player_2/loss=157.109, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 343.45it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=50.594, player_2/loss=154.894, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 347.43it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=40.096, player_2/loss=145.742, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 339.98it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=22.516, player_2/loss=147.812, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 345.58it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=21.730, rew=25.00]        


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 346.35it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=12.794, player_2/loss=149.129, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 348.12it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=44.360, player_2/loss=136.093, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 341.93it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=67.909, player_2/loss=135.524, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 345.19it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=57.081, player_2/loss=138.868, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 346.45it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=59.057, player_2/loss=100.440, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.11it/s, env_step=2048, len=27, n/ep=2, n/st=64, player_1/loss=77.413, player_2/loss=107.166, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.40it/s, env_step=3072, len=24, n/ep=2, n/st=64, player_1/loss=89.137, player_2/loss=82.124, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 349.05it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=136.703, player_2/loss=74.707, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 346.86it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=173.294, player_2/loss=98.671, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 347.27it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=95.299, player_2/loss=140.751, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 344.81it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=140.460, player_2/loss=149.280, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 344.26it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=246.960, player_2/loss=106.635, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 343.12it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=294.579, player_2/loss=92.445, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 348.15it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=378.331, player_2/loss=82.103, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 346.25it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=450.791, player_2/loss=57.669, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 346.37it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=460.027, player_2/loss=33.856, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 343.63it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=390.212, player_2/loss=71.984, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 345.49it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=299.083, player_2/loss=95.211, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 345.56it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=328.361, player_2/loss=39.976, rew=16.67]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 345.56it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=328.314, player_2/loss=21.524, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 346.34it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=399.771, player_2/loss=18.193, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 343.92it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=393.358, player_2/loss=20.682, rew=10.71]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 345.98it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=374.345, player_2/loss=25.306, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 343.52it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=313.313, player_2/loss=39.751, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 347.50it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=198.463, player_2/loss=123.627, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 346.91it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=72.377, player_2/loss=219.075, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 341.09it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=47.792, player_2/loss=291.404, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 345.08it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=28.413, player_2/loss=301.929, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 346.13it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=49.080, player_2/loss=353.251, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 344.63it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=37.314, player_2/loss=311.465, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 345.11it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=11.378, player_2/loss=286.909, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 342.35it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=9.680, player_2/loss=258.083, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 347.56it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=24.548, player_2/loss=206.304, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 346.75it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=19.840, player_2/loss=230.697, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 347.50it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=3.990, player_2/loss=287.641, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 343.96it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=20.117, player_2/loss=304.386, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 346.32it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=31.754, player_2/loss=256.782, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 344.97it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=21.990, player_2/loss=237.306, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 347.17it/s, env_step=16384, len=17, n/ep=5, n/st=64, player_1/loss=33.714, player_2/loss=250.031, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 345.32it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=53.991, player_2/loss=284.561, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 341.80it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=27.920, player_2/loss=295.248, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 349.05it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=10.697, player_2/loss=280.277, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 349.72it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=89.459, player_2/loss=349.593, rew=-5.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.96it/s, env_step=2048, len=10, n/ep=7, n/st=64, player_1/loss=73.406, player_2/loss=356.633, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.74it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=62.716, player_2/loss=278.144, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 340.22it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=149.697, player_2/loss=141.918, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 344.52it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=145.324, player_2/loss=101.213, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 345.22it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=98.211, player_2/loss=141.818, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 343.87it/s, env_step=7168, len=21, n/ep=4, n/st=64, player_1/loss=96.790, player_2/loss=120.037, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 347.02it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=73.060, player_2/loss=91.011, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 342.30it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=190.977, player_2/loss=70.830, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 344.58it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=191.472, player_2/loss=70.458, rew=-12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 346.20it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=134.315, player_2/loss=101.621, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 346.22it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=86.988, player_2/loss=131.750, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 345.74it/s, env_step=13312, len=26, n/ep=3, n/st=64, player_1/loss=97.639, player_2/loss=132.023, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 345.34it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=75.321, player_2/loss=95.610, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 347.19it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=93.260, player_2/loss=107.072, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 347.60it/s, env_step=16384, len=24, n/ep=2, n/st=64, player_1/loss=100.269, player_2/loss=60.572, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 345.24it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=95.830, player_2/loss=56.990, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:03, 340.89it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=190.529, player_2/loss=149.615, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 345.76it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=339.945, player_2/loss=219.285, rew=6.25]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 346.03it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=161.961, player_2/loss=392.552, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 343.35it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=126.002, player_2/loss=404.085, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 343.10it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=76.447, player_2/loss=411.263, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 340.29it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=148.329, player_2/loss=368.570, rew=-10.71]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 346.72it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=111.422, player_2/loss=416.551, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 342.63it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=99.486, player_2/loss=410.034, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 344.23it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=130.043, player_2/loss=390.911, rew=13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 338.52it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=49.658, player_2/loss=392.865, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 343.62it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=69.666, player_2/loss=379.357, rew=13.89]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 344.64it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=56.882, player_2/loss=333.322, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 344.56it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=23.744, player_2/loss=335.252, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 345.02it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=86.951, player_2/loss=372.480, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 342.80it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=127.047, player_2/loss=375.463, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 345.29it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=98.824, player_2/loss=318.865, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 343.75it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=83.841, player_2/loss=330.952, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 342.73it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=95.279, player_2/loss=393.216, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 346.20it/s, env_step=17408, len=7, n/ep=10, n/st=64, player_1/loss=54.091, player_2/loss=375.096, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 339.29it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=47.307, player_2/loss=361.576, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 343.39it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=25.391, player_2/loss=387.104, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 351.77it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=94.030, player_2/loss=376.034, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.14it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=63.907, player_2/loss=336.169, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.13it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=74.125, player_2/loss=175.578, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 344.44it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=164.128, player_2/loss=71.790, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 346.27it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=190.049, player_2/loss=41.626, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 346.58it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=174.917, player_2/loss=45.319, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 341.05it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=217.475, player_2/loss=30.394, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 345.80it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=222.567, player_2/loss=13.735, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 346.75it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=183.250, player_2/loss=34.791, rew=15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 344.34it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=197.228, player_2/loss=48.284, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 343.51it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=209.924, player_2/loss=24.167, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 342.22it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=256.231, player_2/loss=9.025, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 344.05it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=314.084, player_2/loss=37.402, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 344.07it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=285.912, player_2/loss=35.002, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 346.56it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=212.917, player_2/loss=9.676, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 345.95it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=223.626, player_2/loss=9.951, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 344.92it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=225.343, player_2/loss=12.604, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 344.28it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=229.216, player_2/loss=12.960, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 345.82it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=207.955, player_2/loss=3.259, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 346.12it/s, env_step=1024, len=13, n/ep=4, n/st=64, player_1/loss=101.873, player_2/loss=62.468, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.67it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=154.385, player_2/loss=126.442, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 336.64it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=150.191, player_2/loss=227.437, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 333.26it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=115.580, player_2/loss=340.508, rew=16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 342.51it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=126.753, player_2/loss=409.940, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 338.87it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=95.357, player_2/loss=427.400, rew=5.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 337.70it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=116.199, rew=8.33]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 343.26it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=84.887, player_2/loss=297.470, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 344.24it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=41.885, player_2/loss=318.866, rew=16.67]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 343.87it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_2/loss=305.011, rew=15.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 339.99it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=119.747, player_2/loss=292.139, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 345.89it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=133.495, player_2/loss=298.089, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 342.70it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=129.167, player_2/loss=298.202, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 343.47it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=62.117, player_2/loss=317.688, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 343.96it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_2/loss=340.188, rew=5.00]        


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 340.79it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=99.725, player_2/loss=394.706, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 344.48it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=96.282, player_2/loss=333.480, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 341.74it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=62.890, player_2/loss=362.114, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 342.94it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=49.349, player_2/loss=321.627, rew=-5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 345.68it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=49.697, player_2/loss=235.788, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.47it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=57.666, player_2/loss=242.252, rew=5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.43it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=55.614, player_2/loss=212.591, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 345.07it/s, env_step=4096, len=16, n/ep=3, n/st=64, player_1/loss=83.057, player_2/loss=164.478, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 341.38it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=113.336, player_2/loss=106.467, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 346.40it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=177.711, player_2/loss=96.471, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 345.00it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=216.158, player_2/loss=96.345, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 346.09it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=161.307, player_2/loss=91.656, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 346.70it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=177.307, player_2/loss=88.929, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 342.53it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=223.372, player_2/loss=53.955, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 343.81it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=197.118, player_2/loss=65.403, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 344.90it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_2/loss=32.308, rew=17.86]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 344.60it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=217.322, rew=18.75]        


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 346.13it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=223.303, player_2/loss=13.416, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 341.33it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=229.094, player_2/loss=40.881, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 343.94it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=239.765, player_2/loss=69.381, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 344.06it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=248.477, player_2/loss=43.431, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 345.18it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=292.087, player_2/loss=9.498, rew=3.57]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 345.80it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=220.259, player_2/loss=56.437, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 342.14it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=266.419, player_2/loss=657.923, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.89it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=235.521, player_2/loss=591.047, rew=19.44]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 345.26it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=147.614, player_2/loss=656.650, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 344.69it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=119.685, rew=17.86]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 344.46it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=133.934, player_2/loss=709.674, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 344.27it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=70.111, player_2/loss=654.393, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 345.53it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=66.749, player_2/loss=564.698, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 343.95it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=103.002, player_2/loss=508.950, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 342.72it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=117.624, player_2/loss=508.151, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 340.05it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=118.478, player_2/loss=431.129, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 345.71it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=94.510, player_2/loss=483.237, rew=13.89]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 340.77it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=62.010, player_2/loss=545.060, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 345.87it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=73.751, player_2/loss=519.666, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 346.42it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=48.131, player_2/loss=453.758, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 341.46it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=17.425, player_2/loss=537.986, rew=19.44]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 343.24it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=39.259, player_2/loss=714.775, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 344.70it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=54.797, player_2/loss=665.058, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 345.07it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=78.429, player_2/loss=667.570, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 340.83it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=71.531, player_2/loss=585.245, rew=15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 341.33it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=70.721, player_2/loss=342.745, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.11it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=96.141, player_2/loss=199.690, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 339.83it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=121.730, player_2/loss=74.143, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 341.37it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=109.402, player_2/loss=19.287, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 351.51it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=148.139, player_2/loss=9.514, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 337.72it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=119.730, player_2/loss=14.671, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 344.89it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=83.883, player_2/loss=9.798, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 347.08it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=91.162, player_2/loss=24.831, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 347.14it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=131.760, player_2/loss=30.716, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 343.04it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=192.392, player_2/loss=45.471, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 347.40it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=177.226, player_2/loss=32.672, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 345.31it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=200.366, player_2/loss=69.721, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 346.45it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=194.766, player_2/loss=62.091, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 343.28it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=155.379, player_2/loss=30.135, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 347.24it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=124.736, player_2/loss=38.674, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 348.27it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=126.750, player_2/loss=30.311, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 347.05it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=154.664, player_2/loss=17.701, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 342.43it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=154.772, player_2/loss=19.655, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 346.98it/s, env_step=19456, len=13, n/ep=4, n/st=64, player_1/loss=161.869, player_2/loss=30.005, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 346.20it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=150.830, player_2/loss=18.956, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.25it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=104.399, player_2/loss=16.794, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 346.58it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=64.814, player_2/loss=9.760, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 341.60it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=53.140, player_2/loss=9.612, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 346.08it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=42.431, player_2/loss=13.029, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 344.03it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=50.892, player_2/loss=12.783, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 345.46it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=62.852, player_2/loss=10.644, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:03, 341.29it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=45.342, player_2/loss=10.897, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 343.44it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=26.200, player_2/loss=7.018, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 345.87it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=19.174, player_2/loss=7.552, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 343.59it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=127.910, rew=25.00]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 343.27it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=126.493, player_2/loss=360.019, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 341.90it/s, env_step=13312, len=10, n/ep=5, n/st=64, player_1/loss=63.638, player_2/loss=459.737, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 342.58it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=46.306, player_2/loss=405.229, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 347.57it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=55.873, player_2/loss=382.380, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 343.55it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=60.209, player_2/loss=378.644, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 344.00it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=30.906, player_2/loss=371.359, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:03, 340.29it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=11.548, player_2/loss=353.888, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 344.70it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=12.863, player_2/loss=352.970, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 346.05it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=14.954, player_2/loss=366.610, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.77it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=42.736, player_2/loss=304.513, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.82it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=72.580, player_2/loss=182.121, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 344.37it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=93.647, player_2/loss=96.643, rew=-5.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 344.67it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=159.157, player_2/loss=114.655, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 346.12it/s, env_step=6144, len=16, n/ep=5, n/st=64, player_1/loss=168.333, player_2/loss=111.531, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 350.79it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=134.875, player_2/loss=64.142, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 341.05it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=155.044, player_2/loss=38.574, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 348.19it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=197.734, player_2/loss=48.493, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 347.18it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=191.686, player_2/loss=50.885, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 347.57it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=99.101, player_2/loss=39.964, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 346.43it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=36.239, player_2/loss=31.178, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 345.46it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=67.478, player_2/loss=22.960, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 346.64it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=149.695, player_2/loss=9.110, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 345.74it/s, env_step=15360, len=16, n/ep=3, n/st=64, player_1/loss=166.891, player_2/loss=23.819, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 344.46it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=155.768, player_2/loss=26.694, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 343.28it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=150.232, player_2/loss=41.211, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 345.45it/s, env_step=18432, len=16, n/ep=3, n/st=64, player_1/loss=166.655, player_2/loss=61.878, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 345.46it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=177.552, player_2/loss=49.603, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 343.17it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=152.945, player_2/loss=188.631, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 345.09it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=132.575, player_2/loss=246.383, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 341.29it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=132.540, player_2/loss=316.353, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 343.44it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=138.769, player_2/loss=292.805, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 345.67it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=132.208, player_2/loss=289.809, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 344.75it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=133.786, player_2/loss=304.823, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 342.45it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=106.432, player_2/loss=369.853, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 345.88it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=90.643, player_2/loss=283.838, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 344.91it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=67.942, player_2/loss=192.340, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 342.21it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=85.355, player_2/loss=144.185, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 345.99it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=88.172, player_2/loss=49.461, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 343.33it/s, env_step=12288, len=28, n/ep=2, n/st=64, player_1/loss=92.728, player_2/loss=89.189, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 344.82it/s, env_step=13312, len=26, n/ep=2, n/st=64, player_1/loss=99.475, player_2/loss=123.978, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 346.28it/s, env_step=14336, len=20, n/ep=4, n/st=64, player_1/loss=100.388, player_2/loss=147.059, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 345.68it/s, env_step=15360, len=24, n/ep=3, n/st=64, player_1/loss=64.187, player_2/loss=196.836, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 343.47it/s, env_step=16384, len=22, n/ep=2, n/st=64, player_1/loss=80.963, player_2/loss=143.193, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 346.65it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=79.987, player_2/loss=235.196, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 344.43it/s, env_step=18432, len=9, n/ep=6, n/st=64, player_1/loss=89.535, player_2/loss=265.073, rew=-16.67]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 343.85it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=109.540, player_2/loss=330.009, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 345.68it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=187.972, player_2/loss=99.413, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 338.88it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=220.018, player_2/loss=66.226, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 346.66it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=219.868, player_2/loss=79.969, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 344.11it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=254.659, player_2/loss=77.857, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 345.22it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=246.738, player_2/loss=32.976, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 345.97it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=249.164, player_2/loss=20.797, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.90it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=269.163, player_2/loss=43.549, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 343.89it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=248.997, player_2/loss=68.763, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 348.94it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=240.010, rew=17.86]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 347.22it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=287.494, player_2/loss=108.662, rew=10.71]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 340.45it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=268.619, player_2/loss=112.944, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 344.24it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=260.804, player_2/loss=118.065, rew=17.86]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 346.19it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=277.756, player_2/loss=44.589, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 345.13it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=274.890, player_2/loss=32.277, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 345.61it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=290.197, player_2/loss=8.697, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 335.22it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=315.903, player_2/loss=82.966, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 340.33it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=323.798, player_2/loss=101.306, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 345.25it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=299.051, player_2/loss=27.088, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 343.19it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=240.712, player_2/loss=33.822, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 339.95it/s, env_step=1024, len=8, n/ep=9, n/st=64, player_1/loss=217.312, player_2/loss=262.577, rew=13.89]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.82it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=197.002, player_2/loss=446.353, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.67it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=130.455, player_2/loss=601.742, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 342.97it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=86.605, player_2/loss=598.776, rew=19.44]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 344.01it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=72.301, player_2/loss=526.975, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 342.12it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=53.711, player_2/loss=592.102, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 344.08it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=60.523, player_2/loss=664.055, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 345.87it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=67.068, player_2/loss=534.142, rew=19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 343.56it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=119.635, player_2/loss=486.635, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 342.67it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=122.544, player_2/loss=539.368, rew=13.89]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 338.66it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=61.652, player_2/loss=572.528, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 345.34it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=54.147, player_2/loss=553.242, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 344.02it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=139.690, player_2/loss=552.693, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 343.86it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=153.619, player_2/loss=577.610, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 339.36it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=62.660, player_2/loss=544.445, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 345.11it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=65.315, player_2/loss=543.755, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 344.24it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=66.893, player_2/loss=537.326, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 344.33it/s, env_step=18432, len=7, n/ep=7, n/st=64, player_1/loss=76.908, player_2/loss=490.815, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 342.31it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=117.006, player_2/loss=476.433, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 342.44it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=96.893, player_2/loss=296.368, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.47it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=78.417, player_2/loss=222.707, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 344.55it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=70.134, player_2/loss=167.523, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 346.29it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=105.275, player_2/loss=146.580, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 340.71it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=104.326, player_2/loss=100.952, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 345.88it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=108.471, player_2/loss=111.156, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 347.50it/s, env_step=7168, len=22, n/ep=2, n/st=64, player_1/loss=100.146, player_2/loss=123.990, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 345.00it/s, env_step=8192, len=26, n/ep=3, n/st=64, player_1/loss=62.930, player_2/loss=85.081, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 339.15it/s, env_step=9216, len=18, n/ep=2, n/st=64, player_1/loss=90.965, player_2/loss=65.249, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 353.61it/s, env_step=10240, len=29, n/ep=3, n/st=64, player_1/loss=179.533, player_2/loss=67.959, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 345.62it/s, env_step=11264, len=33, n/ep=2, n/st=64, player_1/loss=155.092, player_2/loss=34.943, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 347.22it/s, env_step=12288, len=29, n/ep=2, n/st=64, player_1/loss=97.697, player_2/loss=35.490, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 342.49it/s, env_step=13312, len=34, n/ep=2, n/st=64, player_1/loss=204.214, player_2/loss=58.154, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 347.21it/s, env_step=14336, len=30, n/ep=2, n/st=64, player_1/loss=136.148, player_2/loss=42.835, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 344.04it/s, env_step=15360, len=31, n/ep=2, n/st=64, player_1/loss=96.389, player_2/loss=30.025, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 346.03it/s, env_step=16384, len=32, n/ep=2, n/st=64, player_1/loss=111.474, player_2/loss=64.389, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 346.37it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=72.892, player_2/loss=113.029, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 342.53it/s, env_step=18432, len=28, n/ep=2, n/st=64, player_1/loss=77.963, player_2/loss=87.544, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 348.25it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=170.001, player_2/loss=36.487, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 347.38it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=66.614, player_2/loss=52.677, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 346.07it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=72.279, player_2/loss=51.297, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 341.03it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=84.079, player_2/loss=66.293, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 345.17it/s, env_step=4096, len=21, n/ep=2, n/st=64, player_1/loss=94.951, player_2/loss=59.043, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 346.80it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=71.053, player_2/loss=48.012, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 345.17it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=83.118, player_2/loss=47.486, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 345.75it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=54.325, player_2/loss=101.663, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 343.08it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=87.362, player_2/loss=135.989, rew=-17.86]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 344.24it/s, env_step=9216, len=20, n/ep=4, n/st=64, player_1/loss=153.243, player_2/loss=113.457, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 344.42it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=124.829, player_2/loss=121.712, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 344.40it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=92.363, player_2/loss=117.794, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 346.00it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=82.135, player_2/loss=96.888, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 341.17it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=71.977, player_2/loss=113.617, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 348.45it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=43.041, player_2/loss=126.412, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 345.63it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=25.954, player_2/loss=119.622, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 345.35it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=26.252, player_2/loss=132.489, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 341.19it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=19.614, player_2/loss=152.804, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 347.22it/s, env_step=18432, len=15, n/ep=5, n/st=64, player_1/loss=34.860, player_2/loss=138.339, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 346.54it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=45.623, player_2/loss=137.892, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 345.29it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=45.285, player_2/loss=162.134, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.33it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=49.173, player_2/loss=140.788, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.68it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=39.773, player_2/loss=83.624, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 346.84it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=25.797, player_2/loss=63.436, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 346.54it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=47.986, player_2/loss=58.886, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 346.56it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=65.477, player_2/loss=74.661, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 339.80it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=26.897, player_2/loss=65.814, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 346.75it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=29.724, player_2/loss=36.713, rew=-5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 346.12it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=107.587, player_2/loss=92.183, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:02, 345.35it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=169.201, player_2/loss=112.661, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 345.09it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=166.454, player_2/loss=86.981, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 343.50it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=113.248, player_2/loss=81.194, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 346.77it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=42.069, player_2/loss=73.529, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 346.60it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=24.753, player_2/loss=56.758, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 346.44it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=50.382, player_2/loss=59.401, rew=5.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:03, 341.35it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=106.563, player_2/loss=99.823, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 343.95it/s, env_step=17408, len=10, n/ep=7, n/st=64, player_1/loss=259.404, player_2/loss=206.197, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 345.66it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=355.080, player_2/loss=226.238, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:03, 341.21it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=275.361, player_2/loss=165.021, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 344.56it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=74.462, player_2/loss=277.913, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.13it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=30.052, player_2/loss=343.760, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 345.42it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=25.511, player_2/loss=367.502, rew=17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 344.43it/s, env_step=4096, len=9, n/ep=6, n/st=64, player_1/loss=25.460, player_2/loss=366.863, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 343.14it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=13.877, player_2/loss=328.685, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 345.12it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=13.205, player_2/loss=335.425, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 340.63it/s, env_step=7168, len=11, n/ep=7, n/st=64, player_1/loss=10.783, player_2/loss=379.825, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 343.54it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=7.552, player_2/loss=423.209, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 344.93it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=7.794, player_2/loss=375.851, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 343.04it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=19.418, player_2/loss=359.620, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 342.59it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=15.064, player_2/loss=427.280, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 343.79it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=38.945, player_2/loss=444.600, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 345.07it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=36.351, player_2/loss=426.367, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 344.11it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_2/loss=452.218, rew=25.00]        


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 342.85it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=2.860, player_2/loss=401.827, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 341.70it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=4.667, player_2/loss=432.155, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 345.40it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=7.241, player_2/loss=386.971, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 343.06it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=8.066, player_2/loss=392.546, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 343.25it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=13.779, player_2/loss=437.275, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 341.45it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=13.694, player_2/loss=315.779, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.69it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_2/loss=272.133, rew=-25.00]        


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 345.72it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=107.725, player_2/loss=180.888, rew=-15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 344.55it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=225.971, player_2/loss=142.910, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 344.02it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=263.207, player_2/loss=123.301, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 345.87it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=298.187, player_2/loss=96.256, rew=15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 345.49it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=289.906, player_2/loss=54.682, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 344.32it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=237.347, player_2/loss=74.569, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 345.73it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=217.111, player_2/loss=108.843, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 343.10it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=255.520, player_2/loss=111.670, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 345.90it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=205.970, player_2/loss=110.006, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 345.64it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=210.499, player_2/loss=72.989, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 347.09it/s, env_step=13312, len=12, n/ep=4, n/st=64, player_1/loss=312.057, player_2/loss=45.701, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 348.25it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=403.264, player_2/loss=28.426, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 344.12it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=382.750, player_2/loss=38.357, rew=-5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 344.97it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=319.894, player_2/loss=65.234, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 346.21it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=351.143, player_2/loss=57.645, rew=-5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 339.65it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=376.201, player_2/loss=29.218, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 345.73it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=375.128, player_2/loss=25.097, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 343.16it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=158.374, player_2/loss=211.031, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.28it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=89.998, player_2/loss=171.039, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.91it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=19.584, player_2/loss=120.117, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 345.95it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=63.546, player_2/loss=98.244, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 347.68it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=60.160, player_2/loss=121.909, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 345.18it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=10.137, player_2/loss=143.652, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 343.53it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=14.992, player_2/loss=146.140, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 340.34it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=12.656, player_2/loss=143.279, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 345.07it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=17.540, player_2/loss=209.415, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 345.95it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=87.396, rew=-5.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 344.90it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=118.309, player_2/loss=152.291, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 344.16it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=66.222, player_2/loss=138.498, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 344.93it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=68.813, player_2/loss=152.824, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 344.63it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=79.060, rew=25.00]        


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 346.28it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=39.539, player_2/loss=141.860, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 347.32it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=40.832, player_2/loss=148.670, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 340.55it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=39.930, player_2/loss=108.368, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 346.44it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=64.597, player_2/loss=110.213, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 343.65it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_1/loss=51.154, rew=25.00]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 344.59it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=27.079, player_2/loss=97.251, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.58it/s, env_step=2048, len=29, n/ep=2, n/st=64, player_1/loss=34.721, player_2/loss=93.340, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.29it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=107.852, player_2/loss=91.802, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 346.08it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=145.402, player_2/loss=114.946, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 345.47it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=87.892, player_2/loss=97.180, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 343.59it/s, env_step=6144, len=36, n/ep=1, n/st=64, player_1/loss=102.682, player_2/loss=52.135, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 343.01it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=292.418, player_2/loss=54.207, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 341.95it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=460.585, player_2/loss=56.877, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 345.80it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=397.973, player_2/loss=58.038, rew=10.71]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 344.83it/s, env_step=10240, len=9, n/ep=6, n/st=64, player_1/loss=395.425, player_2/loss=93.196, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 345.66it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=481.788, player_2/loss=101.493, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 338.68it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=446.699, player_2/loss=54.396, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 345.31it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=376.485, player_2/loss=69.483, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 345.83it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=394.090, player_2/loss=64.864, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 344.15it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=455.397, player_2/loss=24.172, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 343.65it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=453.159, player_2/loss=51.507, rew=5.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 341.89it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=400.464, player_2/loss=52.673, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:03, 336.97it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=446.038, player_2/loss=13.465, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 343.33it/s, env_step=19456, len=10, n/ep=7, n/st=64, player_1/loss=430.568, player_2/loss=55.140, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 344.46it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=297.631, player_2/loss=57.975, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.88it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=192.349, player_2/loss=157.286, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 344.70it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=69.630, player_2/loss=234.199, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 345.06it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=60.119, player_2/loss=251.477, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 344.97it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=46.015, player_2/loss=288.626, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 341.85it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=73.218, player_2/loss=240.439, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 343.81it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=63.196, player_2/loss=205.772, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 343.74it/s, env_step=8192, len=13, n/ep=6, n/st=64, player_1/loss=32.967, player_2/loss=279.261, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 343.05it/s, env_step=9216, len=13, n/ep=6, n/st=64, player_1/loss=33.895, player_2/loss=265.035, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 343.37it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=44.008, player_2/loss=347.234, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 341.21it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=35.408, player_2/loss=333.846, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 344.63it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=30.258, player_2/loss=337.611, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 343.52it/s, env_step=13312, len=13, n/ep=6, n/st=64, player_1/loss=38.631, player_2/loss=269.580, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 342.34it/s, env_step=14336, len=11, n/ep=4, n/st=64, player_1/loss=37.158, player_2/loss=239.584, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 338.71it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=51.768, player_2/loss=222.903, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 344.33it/s, env_step=16384, len=11, n/ep=4, n/st=64, player_1/loss=75.286, player_2/loss=228.369, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 344.22it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_2/loss=249.910, rew=25.00]       


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 344.64it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=30.199, player_2/loss=288.955, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 341.58it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=27.410, player_2/loss=280.490, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 346.65it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=32.244, player_2/loss=193.965, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.30it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=44.029, player_2/loss=153.578, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 346.64it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=87.158, player_2/loss=115.246, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.78it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=123.136, player_2/loss=148.834, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 343.51it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=145.725, player_2/loss=137.075, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 344.19it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=132.077, player_2/loss=103.188, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 345.46it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=127.637, player_2/loss=84.412, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 346.61it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=170.147, player_2/loss=72.882, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 343.96it/s, env_step=9216, len=21, n/ep=3, n/st=64, player_1/loss=164.722, player_2/loss=88.728, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 344.18it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=153.304, player_2/loss=111.012, rew=-8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 344.90it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=164.654, player_2/loss=120.896, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 346.68it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=177.029, player_2/loss=90.568, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 340.65it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=193.730, player_2/loss=97.179, rew=-12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 346.99it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=147.707, player_2/loss=109.879, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 346.42it/s, env_step=15360, len=16, n/ep=3, n/st=64, player_1/loss=144.363, player_2/loss=107.537, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 345.61it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=174.672, player_2/loss=71.595, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 346.14it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=155.095, player_2/loss=68.413, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 342.82it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=116.232, player_2/loss=60.514, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 344.76it/s, env_step=19456, len=19, n/ep=3, n/st=64, player_1/loss=164.022, player_2/loss=58.959, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 343.33it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=111.466, player_2/loss=124.935, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 348.28it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=94.919, player_2/loss=143.156, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 340.16it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=52.650, player_2/loss=147.653, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 342.80it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_2/loss=140.332, rew=25.00]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 344.48it/s, env_step=5120, len=19, n/ep=4, n/st=64, player_1/loss=83.656, player_2/loss=102.690, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 342.83it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=45.778, player_2/loss=102.017, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 346.56it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=56.974, player_2/loss=125.887, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.98it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=46.498, player_2/loss=123.492, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 346.57it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=24.544, player_2/loss=87.062, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 345.79it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=20.522, player_2/loss=64.416, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 344.34it/s, env_step=11264, len=16, n/ep=3, n/st=64, player_1/loss=22.482, player_2/loss=49.749, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 343.94it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=38.021, player_2/loss=50.256, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 343.68it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=63.329, player_2/loss=79.820, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 344.06it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=34.752, player_2/loss=104.894, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 346.71it/s, env_step=15360, len=17, n/ep=3, n/st=64, player_1/loss=33.797, player_2/loss=118.506, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 342.79it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=47.671, player_2/loss=115.308, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 343.97it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=25.616, player_2/loss=68.458, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 344.00it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=70.157, player_2/loss=57.541, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 345.35it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=80.934, player_2/loss=105.866, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 341.99it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=132.019, player_2/loss=120.056, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 342.80it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=166.072, player_2/loss=121.323, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 341.78it/s, env_step=3072, len=16, n/ep=5, n/st=64, player_1/loss=217.437, player_2/loss=109.336, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 344.84it/s, env_step=4096, len=19, n/ep=2, n/st=64, player_1/loss=229.397, player_2/loss=110.295, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 344.19it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=211.269, player_2/loss=102.397, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 343.16it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=193.204, player_2/loss=73.786, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 340.97it/s, env_step=7168, len=15, n/ep=5, n/st=64, player_1/loss=174.182, player_2/loss=68.474, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 346.91it/s, env_step=8192, len=28, n/ep=2, n/st=64, player_1/loss=151.840, player_2/loss=79.293, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 345.23it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=131.077, player_2/loss=79.887, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 344.95it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=173.486, player_2/loss=59.936, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 346.14it/s, env_step=11264, len=16, n/ep=5, n/st=64, player_1/loss=216.339, player_2/loss=64.423, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 339.10it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_2/loss=56.338, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 344.81it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=129.729, player_2/loss=46.932, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 344.11it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=167.908, player_2/loss=43.109, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 342.96it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=135.820, player_2/loss=68.405, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 338.57it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=128.668, player_2/loss=132.135, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 344.71it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=214.928, player_2/loss=106.681, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 343.71it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=209.528, player_2/loss=47.456, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 348.89it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=186.202, player_2/loss=27.342, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 348.28it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=288.122, player_2/loss=61.187, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.16it/s, env_step=2048, len=24, n/ep=3, n/st=64, player_1/loss=300.146, player_2/loss=87.895, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 344.68it/s, env_step=3072, len=25, n/ep=3, n/st=64, player_1/loss=150.128, rew=25.00]         


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 343.99it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=60.926, player_2/loss=104.852, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 345.91it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=132.212, player_2/loss=49.595, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 339.94it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=181.590, player_2/loss=97.897, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 346.47it/s, env_step=7168, len=15, n/ep=3, n/st=64, player_1/loss=132.313, player_2/loss=288.072, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 343.49it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=103.192, player_2/loss=296.558, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 346.11it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=60.914, player_2/loss=212.996, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 338.44it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=25.102, player_2/loss=217.783, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 344.45it/s, env_step=11264, len=17, n/ep=3, n/st=64, player_1/loss=23.852, player_2/loss=233.277, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 344.83it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=39.130, player_2/loss=298.145, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 343.69it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=60.669, player_2/loss=199.046, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 345.04it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=82.447, player_2/loss=102.112, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 340.46it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=41.256, player_2/loss=112.182, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 343.95it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=38.649, player_2/loss=81.559, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 345.54it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=53.046, player_2/loss=92.228, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 343.94it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=72.124, player_2/loss=153.125, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 342.00it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=95.766, player_2/loss=153.576, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 343.85it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=26.326, player_2/loss=88.094, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.25it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=135.789, player_2/loss=127.669, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 341.25it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=253.690, player_2/loss=143.657, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 345.74it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=228.605, player_2/loss=108.206, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 339.58it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=249.936, player_2/loss=79.167, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 343.35it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=244.923, player_2/loss=54.982, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 344.30it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=278.944, player_2/loss=48.845, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 345.38it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=291.285, player_2/loss=66.647, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 339.35it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=284.031, rew=25.00]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 341.84it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=225.888, player_2/loss=32.482, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 343.26it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=201.821, player_2/loss=19.634, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 344.35it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=263.842, player_2/loss=38.946, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 343.98it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=250.335, player_2/loss=40.157, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 341.23it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=149.992, player_2/loss=19.128, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 342.12it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=120.366, player_2/loss=87.953, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 345.32it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=166.875, player_2/loss=88.102, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 344.06it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=265.628, player_2/loss=24.966, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 339.23it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=280.810, player_2/loss=36.961, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 344.16it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=223.192, player_2/loss=26.303, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 345.59it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=142.802, player_2/loss=83.046, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 344.96it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=130.101, player_2/loss=179.628, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 344.21it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=108.080, player_2/loss=337.081, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 338.58it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=117.908, player_2/loss=401.422, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 346.44it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=141.878, player_2/loss=253.335, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 343.42it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=120.918, player_2/loss=167.806, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 343.36it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=68.721, player_2/loss=169.500, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 339.00it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=78.106, player_2/loss=251.684, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 345.37it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=85.751, player_2/loss=426.477, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 345.74it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=51.494, player_2/loss=368.328, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 343.08it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=30.243, player_2/loss=379.151, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 344.96it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=30.884, rew=15.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 340.58it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=20.488, player_2/loss=304.233, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 343.96it/s, env_step=14336, len=16, n/ep=3, n/st=64, player_1/loss=19.309, player_2/loss=295.947, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 343.86it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=16.080, player_2/loss=395.074, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 342.45it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=6.771, player_2/loss=378.208, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 343.86it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=47.217, player_2/loss=365.414, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 341.81it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=55.384, player_2/loss=380.667, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 343.23it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=31.401, player_2/loss=390.745, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 346.02it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=54.237, player_2/loss=327.197, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.65it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=31.419, player_2/loss=210.811, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 344.43it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=17.510, player_2/loss=90.792, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 344.24it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=81.937, player_2/loss=120.790, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 344.70it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=110.009, player_2/loss=146.393, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 343.79it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=68.429, player_2/loss=87.740, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 344.35it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=46.038, player_2/loss=52.152, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 341.51it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=27.463, player_2/loss=55.124, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:02, 344.78it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=91.522, player_2/loss=84.345, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:03, 339.94it/s, env_step=10240, len=23, n/ep=2, n/st=64, player_1/loss=128.251, player_2/loss=91.909, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:02, 343.99it/s, env_step=11264, len=25, n/ep=3, n/st=64, player_1/loss=83.290, player_2/loss=51.828, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:02, 342.90it/s, env_step=12288, len=31, n/ep=2, n/st=64, player_1/loss=72.396, player_2/loss=80.619, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:03, 339.79it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=101.147, player_2/loss=130.554, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 344.42it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=146.150, player_2/loss=122.313, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:02, 342.78it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=183.670, rew=-12.50]      


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:02, 343.80it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=215.242, player_2/loss=154.304, rew=15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:02, 342.19it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=226.667, player_2/loss=147.830, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:02, 344.40it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=218.664, player_2/loss=132.170, rew=5.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:02, 342.59it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=197.300, player_2/loss=115.262, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:02, 342.57it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=118.802, player_2/loss=132.099, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.58it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=131.045, player_2/loss=122.644, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 345.94it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=231.388, player_2/loss=118.342, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 342.54it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=214.417, player_2/loss=119.045, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 342.64it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=123.642, player_2/loss=131.462, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 343.17it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=63.916, player_2/loss=131.401, rew=12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 338.53it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=65.550, rew=8.33]           


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 344.13it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=76.979, player_2/loss=171.759, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 344.96it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=77.614, player_2/loss=134.730, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 345.51it/s, env_step=10240, len=9, n/ep=5, n/st=64, player_1/loss=54.292, player_2/loss=161.783, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 337.27it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=51.208, player_2/loss=173.490, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 341.97it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=33.092, player_2/loss=182.734, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 341.25it/s, env_step=13312, len=9, n/ep=8, n/st=64, player_1/loss=44.076, player_2/loss=170.664, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 341.90it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=43.496, player_2/loss=168.391, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 342.91it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=44.544, player_2/loss=157.885, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 337.94it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=19.772, player_2/loss=150.504, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 342.50it/s, env_step=17408, len=13, n/ep=6, n/st=64, player_1/loss=25.665, player_2/loss=156.638, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 344.83it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=23.191, player_2/loss=180.615, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 343.42it/s, env_step=19456, len=10, n/ep=4, n/st=64, player_1/loss=16.206, player_2/loss=199.887, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 344.28it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=217.794, player_2/loss=97.092, rew=17.86]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 339.50it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=314.126, player_2/loss=128.005, rew=10.71]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 345.98it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=360.835, player_2/loss=140.308, rew=17.86]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 345.02it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=413.344, player_2/loss=94.176, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 344.88it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=411.828, player_2/loss=78.943, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 338.23it/s, env_step=6144, len=8, n/ep=5, n/st=64, player_1/loss=354.213, player_2/loss=176.603, rew=15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 342.37it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=390.092, player_2/loss=161.471, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 344.85it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=392.547, player_2/loss=78.241, rew=17.86]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 345.80it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=308.016, player_2/loss=94.680, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 343.73it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=353.372, player_2/loss=81.216, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 341.46it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=362.839, player_2/loss=99.675, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 342.67it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=366.496, player_2/loss=88.562, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 344.51it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=332.977, player_2/loss=35.587, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 345.93it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=334.856, player_2/loss=52.527, rew=10.71]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 344.19it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=406.034, player_2/loss=59.628, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.98it/s, env_step=16384, len=9, n/ep=6, n/st=64, player_1/loss=356.329, player_2/loss=103.413, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 343.89it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=272.293, player_2/loss=117.369, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 343.02it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=312.088, player_2/loss=72.557, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 343.40it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=324.850, player_2/loss=45.858, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 338.79it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=153.437, player_2/loss=177.476, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.76it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=106.298, player_2/loss=196.849, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.15it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=76.499, player_2/loss=166.477, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 344.45it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=35.223, player_2/loss=176.924, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 351.49it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=10.885, player_2/loss=205.328, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.17it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=18.302, player_2/loss=185.821, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 344.07it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=64.218, player_2/loss=178.721, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 343.38it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=40.201, player_2/loss=204.511, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 344.18it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_2/loss=195.141, rew=0.00]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 339.77it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=54.386, player_2/loss=170.826, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 342.51it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=22.792, player_2/loss=126.364, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 343.83it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=17.470, player_2/loss=151.263, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 344.97it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=49.139, player_2/loss=192.038, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 343.53it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=49.465, player_2/loss=185.005, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 339.93it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=48.119, player_2/loss=155.968, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 342.17it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=47.663, player_2/loss=137.585, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 344.48it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=82.569, player_2/loss=165.266, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 344.80it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=85.761, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 339.49it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=43.800, player_2/loss=199.902, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 342.01it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=90.940, player_2/loss=176.256, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.84it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=52.577, player_2/loss=133.394, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 345.70it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=18.884, player_2/loss=108.739, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 343.53it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=29.076, player_2/loss=88.914, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.27it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=161.408, player_2/loss=138.485, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 345.54it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=261.322, player_2/loss=161.746, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 342.43it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=149.727, player_2/loss=124.997, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 344.11it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=54.855, player_2/loss=89.468, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:03, 338.23it/s, env_step=9216, len=20, n/ep=4, n/st=64, player_1/loss=92.101, player_2/loss=77.424, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 345.62it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=85.345, player_2/loss=77.065, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 344.22it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=65.904, player_2/loss=72.782, rew=-12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 344.44it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=67.424, player_2/loss=69.049, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 343.97it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=42.253, player_2/loss=70.487, rew=-15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:03, 338.33it/s, env_step=14336, len=11, n/ep=4, n/st=64, player_1/loss=28.825, player_2/loss=67.591, rew=-12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:02, 343.57it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=65.563, player_2/loss=77.559, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 344.65it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=106.114, player_2/loss=70.113, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 344.83it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=99.079, player_2/loss=77.664, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:03, 340.73it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=76.512, player_2/loss=59.290, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 345.57it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=41.465, player_2/loss=75.678, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 341.69it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=175.403, player_2/loss=250.583, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.77it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=150.974, player_2/loss=235.231, rew=13.89]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.32it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=127.425, player_2/loss=264.768, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.83it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=159.880, player_2/loss=291.719, rew=6.25]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.58it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=138.181, player_2/loss=282.351, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 349.61it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=145.968, player_2/loss=253.151, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 338.76it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=87.191, player_2/loss=261.959, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 338.44it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=91.946, player_2/loss=277.310, rew=6.25]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 343.83it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=99.855, player_2/loss=294.172, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 342.62it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=178.859, player_2/loss=278.718, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 339.84it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=194.550, rew=8.33]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.06it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=116.856, player_2/loss=297.444, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 344.13it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=138.871, player_2/loss=290.596, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 335.32it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=172.268, player_2/loss=258.225, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 341.29it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=196.492, player_2/loss=250.629, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 342.05it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=153.903, player_2/loss=291.045, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 338.89it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=51.591, player_2/loss=301.782, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 341.06it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=60.894, player_2/loss=303.950, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 342.73it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=114.392, player_2/loss=293.839, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 341.45it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=69.720, player_2/loss=296.669, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.67it/s, env_step=2048, len=10, n/ep=7, n/st=64, player_1/loss=205.852, player_2/loss=261.514, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 340.54it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=434.924, player_2/loss=179.202, rew=15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 342.66it/s, env_step=4096, len=10, n/ep=7, n/st=64, player_1/loss=564.869, player_2/loss=81.513, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 344.82it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=583.318, player_2/loss=54.936, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 341.14it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=520.421, player_2/loss=54.826, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 338.53it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=512.722, player_2/loss=36.066, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 342.12it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=430.050, player_2/loss=48.886, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 342.87it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=478.457, player_2/loss=57.667, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 343.33it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=418.128, player_2/loss=48.816, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 344.10it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=317.009, player_2/loss=52.238, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 338.38it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=389.275, rew=16.67]       


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 342.51it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=399.900, player_2/loss=13.701, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 343.29it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=410.526, player_2/loss=3.431, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 343.46it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=443.005, player_2/loss=53.170, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 343.79it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=372.907, player_2/loss=71.485, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 340.91it/s, env_step=17408, len=10, n/ep=7, n/st=64, player_1/loss=395.895, player_2/loss=39.660, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 345.35it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=463.731, player_2/loss=62.986, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 345.43it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=251.990, player_2/loss=62.830, rew=16.67]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 342.54it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=332.340, player_2/loss=117.405, rew=17.86]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 341.15it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=256.159, player_2/loss=302.201, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 335.46it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=137.766, player_2/loss=335.568, rew=10.71]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 343.63it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=106.201, player_2/loss=386.285, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 341.73it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=148.361, player_2/loss=356.274, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 342.45it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=120.935, player_2/loss=370.483, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 343.26it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=31.705, player_2/loss=363.592, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 342.73it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=56.790, player_2/loss=339.770, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 341.39it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=107.211, player_2/loss=297.957, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 343.37it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=92.342, player_2/loss=331.547, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 343.41it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=52.591, player_2/loss=314.825, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 340.67it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=51.409, player_2/loss=361.024, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 343.77it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=50.179, player_2/loss=325.047, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 344.28it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=42.483, player_2/loss=281.111, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 343.50it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=50.584, player_2/loss=346.625, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 340.33it/s, env_step=16384, len=9, n/ep=5, n/st=64, player_1/loss=76.865, player_2/loss=327.810, rew=5.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 341.92it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=93.941, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 343.04it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=105.110, player_2/loss=247.802, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 343.70it/s, env_step=19456, len=9, n/ep=8, n/st=64, player_1/loss=150.475, player_2/loss=313.634, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 344.23it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=79.123, player_2/loss=206.653, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.73it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=84.520, player_2/loss=156.398, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 345.94it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=109.620, player_2/loss=142.842, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 343.02it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=121.427, player_2/loss=164.305, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 344.44it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=124.581, player_2/loss=245.421, rew=-19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 344.01it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=107.799, player_2/loss=212.838, rew=12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 338.33it/s, env_step=7168, len=9, n/ep=6, n/st=64, player_1/loss=98.334, player_2/loss=162.486, rew=-16.67]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 342.35it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=109.986, player_2/loss=188.580, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 341.08it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=118.351, player_2/loss=153.058, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 338.17it/s, env_step=10240, len=9, n/ep=6, n/st=64, player_1/loss=155.920, player_2/loss=145.752, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 334.35it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=223.769, player_2/loss=125.602, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 335.23it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=299.947, player_2/loss=75.886, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 344.46it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=253.639, player_2/loss=63.668, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 343.30it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=190.473, player_2/loss=63.928, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 344.24it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=250.054, player_2/loss=42.030, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 338.92it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=222.847, player_2/loss=74.482, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 343.18it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=227.137, rew=25.00]       


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 343.43it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=291.897, player_2/loss=33.875, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 343.26it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=310.181, player_2/loss=31.616, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 344.94it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=179.716, player_2/loss=41.593, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 341.62it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=164.824, player_2/loss=112.583, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.28it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=143.111, player_2/loss=119.887, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 345.68it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=129.076, player_2/loss=179.913, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 342.51it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=123.973, player_2/loss=221.347, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:02, 345.05it/s, env_step=6144, len=12, n/ep=4, n/st=64, player_1/loss=104.710, player_2/loss=141.841, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:03, 338.34it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=91.711, player_2/loss=137.656, rew=-15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 342.09it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=116.197, player_2/loss=84.455, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 342.49it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=106.227, player_2/loss=176.597, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 347.68it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=73.448, player_2/loss=278.597, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:02, 346.67it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=49.868, player_2/loss=348.845, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:03, 340.28it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=23.109, player_2/loss=406.647, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:03, 340.57it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=35.531, player_2/loss=412.850, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 342.45it/s, env_step=14336, len=9, n/ep=8, n/st=64, player_1/loss=60.912, player_2/loss=434.715, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:03, 337.12it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=21.855, player_2/loss=441.985, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 343.32it/s, env_step=16384, len=9, n/ep=6, n/st=64, player_1/loss=17.198, player_2/loss=482.893, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 343.82it/s, env_step=17408, len=9, n/ep=6, n/st=64, player_1/loss=21.072, player_2/loss=491.774, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 342.55it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_2/loss=486.991, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 342.38it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=41.181, player_2/loss=522.112, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:03, 341.22it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=262.866, player_2/loss=299.225, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 340.33it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=532.606, player_2/loss=211.588, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 342.12it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=560.871, player_2/loss=149.655, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 342.87it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=617.033, player_2/loss=153.403, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.39it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=526.293, player_2/loss=141.179, rew=3.57]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 342.74it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=498.695, rew=18.75]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 341.75it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=554.564, player_2/loss=68.122, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 343.77it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=688.478, player_2/loss=122.434, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 341.47it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=866.290, player_2/loss=90.622, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.84it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_2/loss=72.507, rew=25.00]         


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 344.50it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=642.864, player_2/loss=96.756, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 342.61it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=685.379, player_2/loss=76.114, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 343.91it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=655.288, player_2/loss=82.086, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 339.51it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=607.801, player_2/loss=78.703, rew=10.71]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 342.51it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=538.887, player_2/loss=66.855, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 342.51it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_2/loss=77.549, rew=25.00]         


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 343.59it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=570.521, player_2/loss=70.746, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 340.14it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=610.897, player_2/loss=81.120, rew=3.57]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 339.41it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=557.967, player_2/loss=78.122, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 341.95it/s, env_step=1024, len=13, n/ep=4, n/st=64, player_1/loss=303.928, player_2/loss=55.725, rew=-12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.47it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=193.260, player_2/loss=111.134, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 344.87it/s, env_step=3072, len=32, n/ep=2, n/st=64, player_1/loss=59.811, player_2/loss=132.789, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.61it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=119.376, player_2/loss=124.541, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 342.54it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=157.645, player_2/loss=167.178, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 342.45it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=107.574, player_2/loss=238.495, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 343.26it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=77.947, player_2/loss=312.130, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 344.22it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=61.172, player_2/loss=340.086, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 335.95it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=50.185, player_2/loss=323.857, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 342.16it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=21.131, player_2/loss=337.969, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 350.39it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=12.171, player_2/loss=370.976, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 338.64it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=7.921, player_2/loss=377.206, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 337.66it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=8.356, player_2/loss=413.041, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 344.28it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=5.630, player_2/loss=444.785, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 340.80it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=16.751, player_2/loss=416.330, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 343.62it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=23.021, player_2/loss=340.487, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 343.15it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=17.360, player_2/loss=333.746, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.13it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=16.013, player_2/loss=381.064, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 342.87it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=8.552, player_2/loss=417.401, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 346.80it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=39.176, player_2/loss=258.565, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.51it/s, env_step=2048, len=24, n/ep=2, n/st=64, player_1/loss=47.721, player_2/loss=225.963, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 346.85it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=78.726, player_2/loss=159.958, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 339.85it/s, env_step=4096, len=26, n/ep=2, n/st=64, player_1/loss=85.264, player_2/loss=180.502, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 345.51it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=92.223, player_2/loss=205.293, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 342.23it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=131.449, player_2/loss=159.625, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 344.68it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=151.283, player_2/loss=90.457, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 346.20it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=93.125, player_2/loss=42.144, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 339.91it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=71.450, player_2/loss=27.884, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 344.58it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=80.945, player_2/loss=22.131, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 344.79it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=103.611, player_2/loss=20.538, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 345.55it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=103.704, player_2/loss=55.829, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 342.78it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=105.034, player_2/loss=72.168, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 341.04it/s, env_step=14336, len=20, n/ep=4, n/st=64, player_1/loss=134.569, player_2/loss=35.005, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 342.90it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=176.466, player_2/loss=20.447, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 346.09it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=177.609, player_2/loss=42.186, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 345.29it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=157.393, player_2/loss=49.512, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 339.07it/s, env_step=18432, len=20, n/ep=4, n/st=64, player_1/loss=158.988, player_2/loss=20.458, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 332.74it/s, env_step=19456, len=24, n/ep=2, n/st=64, player_1/loss=115.537, player_2/loss=11.377, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 342.94it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=92.266, player_2/loss=11.037, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 342.96it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=110.670, player_2/loss=82.991, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 341.02it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=84.916, player_2/loss=82.450, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 340.05it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=54.121, player_2/loss=60.988, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 342.11it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=86.530, player_2/loss=107.277, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 342.29it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=117.450, player_2/loss=161.898, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 341.37it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=110.659, player_2/loss=174.541, rew=-5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 342.29it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=141.055, player_2/loss=156.490, rew=-10.71]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 338.80it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=136.534, player_2/loss=198.313, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 341.94it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=95.803, player_2/loss=203.978, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 339.36it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=48.112, player_2/loss=220.346, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 343.07it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=53.694, player_2/loss=212.984, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 343.27it/s, env_step=13312, len=12, n/ep=6, n/st=64, player_1/loss=52.066, player_2/loss=235.791, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 342.00it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=36.086, player_2/loss=255.556, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 341.54it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=51.983, player_2/loss=255.463, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 340.12it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=77.096, player_2/loss=209.967, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.56it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=54.308, rew=16.67]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 341.21it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=49.546, player_2/loss=175.121, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 340.80it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=57.978, player_2/loss=192.306, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 341.21it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=60.755, player_2/loss=195.792, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.98it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=94.020, rew=12.50]          


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 345.51it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=114.642, player_2/loss=133.126, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 343.75it/s, env_step=4096, len=19, n/ep=4, n/st=64, player_1/loss=124.585, player_2/loss=84.221, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 343.34it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=215.673, player_2/loss=73.105, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 343.72it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=245.989, player_2/loss=86.536, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 340.20it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=173.975, player_2/loss=80.842, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 341.42it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=114.520, player_2/loss=109.332, rew=-12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 341.67it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=149.715, player_2/loss=117.586, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 342.75it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=172.577, player_2/loss=90.696, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 338.62it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=179.485, player_2/loss=54.987, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 341.71it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=184.818, player_2/loss=91.445, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 342.95it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=155.248, player_2/loss=79.479, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 344.52it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=135.488, player_2/loss=37.043, rew=8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 344.19it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=181.661, player_2/loss=35.835, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 339.47it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=202.602, player_2/loss=33.436, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 342.09it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=176.105, player_2/loss=27.623, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 342.39it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=253.431, player_2/loss=13.271, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 344.07it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=256.185, player_2/loss=44.745, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 342.32it/s, env_step=1024, len=23, n/ep=3, n/st=64, player_1/loss=160.241, player_2/loss=19.967, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 341.63it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_2/loss=22.065, rew=-8.33]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 344.37it/s, env_step=3072, len=25, n/ep=2, n/st=64, player_1/loss=73.831, player_2/loss=24.007, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 344.60it/s, env_step=4096, len=28, n/ep=2, n/st=64, player_1/loss=68.198, player_2/loss=24.908, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 343.27it/s, env_step=5120, len=32, n/ep=2, n/st=64, player_1/loss=31.931, player_2/loss=39.309, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:03, 340.20it/s, env_step=6144, len=31, n/ep=2, n/st=64, player_1/loss=44.397, player_2/loss=89.202, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:02, 342.79it/s, env_step=7168, len=35, n/ep=1, n/st=64, player_1/loss=73.720, player_2/loss=125.460, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 343.63it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=80.372, player_2/loss=127.566, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 343.99it/s, env_step=9216, len=29, n/ep=2, n/st=64, player_1/loss=104.576, player_2/loss=127.394, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 343.56it/s, env_step=10240, len=25, n/ep=2, n/st=64, player_1/loss=126.239, player_2/loss=102.675, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:03, 339.37it/s, env_step=11264, len=31, n/ep=2, n/st=64, player_1/loss=92.167, player_2/loss=71.046, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 343.00it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=85.178, player_2/loss=120.630, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:02, 343.95it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=81.139, player_2/loss=116.521, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:02, 346.95it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=75.868, player_2/loss=80.791, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:03, 340.12it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=73.850, player_2/loss=99.152, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:02, 341.77it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=58.140, player_2/loss=121.743, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:02, 344.04it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=100.258, player_2/loss=166.046, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 343.06it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=101.246, player_2/loss=138.678, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:03, 338.96it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=103.372, player_2/loss=173.782, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 342.49it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=49.982, player_2/loss=300.199, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.62it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=154.783, player_2/loss=190.623, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 338.01it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=258.600, player_2/loss=87.513, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 340.93it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=310.006, player_2/loss=19.314, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 338.02it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=294.286, player_2/loss=25.436, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 342.57it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=289.462, player_2/loss=27.133, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 342.35it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=243.757, player_2/loss=40.908, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 343.09it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=262.676, player_2/loss=44.296, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 341.88it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=218.299, player_2/loss=93.456, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 341.42it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=270.579, rew=25.00]       


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 341.58it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=328.786, player_2/loss=35.029, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 342.65it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=345.982, player_2/loss=34.798, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 342.43it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=307.053, player_2/loss=11.955, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 338.65it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=341.954, player_2/loss=12.433, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 340.51it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=362.108, player_2/loss=51.096, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 343.21it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=248.877, player_2/loss=60.692, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 343.38it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=223.684, player_2/loss=22.635, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 341.64it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=223.088, player_2/loss=35.196, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 338.89it/s, env_step=19456, len=13, n/ep=4, n/st=64, player_1/loss=225.698, player_2/loss=36.011, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 341.69it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=137.127, player_2/loss=58.058, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 342.93it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=110.749, player_2/loss=146.034, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 343.69it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=81.767, player_2/loss=205.279, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 342.43it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=84.503, player_2/loss=184.145, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.04it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=41.686, player_2/loss=175.284, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 340.45it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=50.910, player_2/loss=174.282, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 342.47it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=55.546, player_2/loss=167.061, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.70it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=73.378, player_2/loss=214.580, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 344.12it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=102.541, player_2/loss=228.626, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 337.89it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=83.667, player_2/loss=205.658, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 345.35it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=45.489, player_2/loss=182.967, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 341.72it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=34.851, player_2/loss=152.983, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 342.05it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=77.869, player_2/loss=247.185, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 340.38it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=76.910, player_2/loss=284.009, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 340.42it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=35.573, player_2/loss=298.382, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 337.13it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=25.979, player_2/loss=364.287, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 341.44it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=22.629, rew=16.67]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 340.75it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=27.268, player_2/loss=399.590, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 338.17it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=42.676, player_2/loss=353.839, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 340.94it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=35.153, player_2/loss=297.711, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 343.47it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=73.408, player_2/loss=240.962, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 342.91it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=120.277, player_2/loss=154.202, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 341.51it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=130.944, player_2/loss=120.443, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 340.20it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=123.285, player_2/loss=141.645, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 343.00it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=109.985, player_2/loss=141.246, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 343.36it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=123.020, player_2/loss=102.225, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.78it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=147.490, player_2/loss=127.875, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 338.13it/s, env_step=9216, len=16, n/ep=5, n/st=64, player_1/loss=126.515, player_2/loss=93.648, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 342.93it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=94.375, player_2/loss=50.730, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 342.78it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=87.881, player_2/loss=65.500, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 343.16it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=99.960, player_2/loss=74.095, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 341.88it/s, env_step=13312, len=21, n/ep=4, n/st=64, player_1/loss=96.935, player_2/loss=67.015, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 338.29it/s, env_step=14336, len=22, n/ep=2, n/st=64, player_1/loss=76.690, player_2/loss=51.683, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 342.57it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=73.870, player_2/loss=30.312, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 344.09it/s, env_step=16384, len=15, n/ep=5, n/st=64, player_1/loss=148.618, player_2/loss=33.105, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 340.55it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_2/loss=48.333, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 342.21it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=294.455, player_2/loss=37.812, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 337.35it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=323.192, player_2/loss=22.923, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 343.10it/s, env_step=1024, len=8, n/ep=9, n/st=64, player_1/loss=221.593, player_2/loss=34.734, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.87it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=160.398, player_2/loss=136.294, rew=17.86]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 341.67it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=81.822, player_2/loss=225.586, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 340.24it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=85.023, player_2/loss=303.767, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.11it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=75.679, player_2/loss=309.708, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 339.60it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=20.861, player_2/loss=322.916, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 340.17it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=45.008, player_2/loss=286.868, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 340.79it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=117.853, player_2/loss=246.550, rew=10.71]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 338.14it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=99.771, player_2/loss=274.495, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 341.14it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=72.619, player_2/loss=223.352, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 342.24it/s, env_step=11264, len=11, n/ep=4, n/st=64, player_1/loss=71.583, player_2/loss=187.401, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.98it/s, env_step=12288, len=8, n/ep=9, n/st=64, player_1/loss=87.436, player_2/loss=257.175, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 340.62it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=77.833, player_2/loss=276.341, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 337.44it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=46.757, player_2/loss=281.046, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 341.55it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=47.647, player_2/loss=294.462, rew=13.89]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 340.58it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=40.270, player_2/loss=271.444, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 346.31it/s, env_step=17408, len=10, n/ep=7, n/st=64, player_1/loss=48.514, player_2/loss=277.957, rew=3.57]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 339.26it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=94.243, player_2/loss=302.634, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 340.26it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=62.006, player_2/loss=283.636, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 342.70it/s, env_step=1024, len=10, n/ep=8, n/st=64, player_1/loss=74.124, player_2/loss=243.145, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.25it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=97.987, player_2/loss=213.057, rew=-17.86]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.80it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=129.455, player_2/loss=134.331, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 341.26it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=145.362, player_2/loss=104.535, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:03, 340.97it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=147.549, player_2/loss=149.874, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 340.10it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=175.859, player_2/loss=154.872, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 340.95it/s, env_step=7168, len=12, n/ep=4, n/st=64, player_1/loss=202.617, player_2/loss=146.829, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 338.36it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=172.210, player_2/loss=92.951, rew=-13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 342.96it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=177.368, player_2/loss=66.934, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 341.70it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=197.539, player_2/loss=107.406, rew=-17.86]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 341.51it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=163.991, player_2/loss=111.784, rew=5.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 342.11it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=189.244, player_2/loss=90.866, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 339.44it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_2/loss=88.786, rew=12.50]        


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 340.50it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=161.778, player_2/loss=101.163, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 343.23it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_1/loss=131.292, player_2/loss=109.868, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 341.56it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=163.611, player_2/loss=81.852, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:03, 341.39it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=175.775, player_2/loss=55.992, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 336.41it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=201.420, player_2/loss=90.487, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 343.48it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=189.119, player_2/loss=132.314, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 343.57it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=118.217, player_2/loss=63.782, rew=-5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.86it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=124.481, player_2/loss=96.460, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.56it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=90.280, player_2/loss=155.006, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 342.47it/s, env_step=4096, len=9, n/ep=8, n/st=64, player_1/loss=66.023, player_2/loss=230.431, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 341.62it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=108.919, player_2/loss=230.261, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.10it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=74.900, player_2/loss=237.688, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 339.87it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=28.698, player_2/loss=264.884, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.68it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=12.815, player_2/loss=308.738, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 339.75it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=24.316, player_2/loss=258.667, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 340.10it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=50.618, player_2/loss=287.105, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 338.74it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=55.648, player_2/loss=343.230, rew=17.86]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.30it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=61.317, player_2/loss=338.517, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 338.81it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=41.297, player_2/loss=318.757, rew=10.71]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 342.27it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=18.047, player_2/loss=269.171, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 341.91it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=42.558, player_2/loss=286.765, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 341.16it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=62.079, player_2/loss=260.841, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 338.59it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=43.811, player_2/loss=275.633, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 347.05it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=40.570, player_2/loss=260.505, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 336.83it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=64.718, player_2/loss=252.290, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 342.07it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=104.748, player_2/loss=234.781, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 341.83it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=101.406, player_2/loss=230.353, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.37it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=187.121, player_2/loss=179.159, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 342.54it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=370.827, player_2/loss=104.916, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 341.92it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=389.695, player_2/loss=63.828, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 344.80it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=254.992, player_2/loss=119.305, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 343.11it/s, env_step=7168, len=15, n/ep=5, n/st=64, player_1/loss=130.684, player_2/loss=138.277, rew=-5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 339.24it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=124.283, player_2/loss=97.076, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 342.23it/s, env_step=9216, len=26, n/ep=3, n/st=64, player_1/loss=144.740, player_2/loss=113.505, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 342.51it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=121.858, player_2/loss=90.203, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 342.40it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=95.193, player_2/loss=77.234, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 335.37it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=117.948, player_2/loss=45.548, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 346.60it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=116.914, player_2/loss=24.855, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 340.88it/s, env_step=14336, len=18, n/ep=5, n/st=64, player_1/loss=103.790, player_2/loss=22.081, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 343.47it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=107.984, player_2/loss=69.514, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 340.80it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=109.691, player_2/loss=81.977, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 341.26it/s, env_step=17408, len=25, n/ep=2, n/st=64, player_1/loss=101.302, player_2/loss=48.718, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 343.04it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=96.697, player_2/loss=48.690, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 342.70it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=111.008, player_2/loss=36.109, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 341.23it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=76.306, player_2/loss=104.889, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.94it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=106.668, player_2/loss=117.519, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.23it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=95.696, player_2/loss=143.539, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 342.75it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=45.065, player_2/loss=208.399, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.62it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=31.348, player_2/loss=279.808, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 341.09it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=33.029, player_2/loss=325.376, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.21it/s, env_step=7168, len=15, n/ep=5, n/st=64, player_1/loss=29.728, player_2/loss=328.311, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 342.06it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=26.280, player_2/loss=233.884, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 340.15it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=26.394, player_2/loss=190.366, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 344.05it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=22.594, player_2/loss=208.112, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 340.54it/s, env_step=11264, len=15, n/ep=5, n/st=64, player_1/loss=24.234, player_2/loss=213.043, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 336.55it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=61.392, player_2/loss=272.809, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 341.94it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=62.327, player_2/loss=275.386, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 341.81it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=8.709, player_2/loss=234.165, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 344.06it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=5.238, player_2/loss=230.917, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 342.17it/s, env_step=16384, len=15, n/ep=5, n/st=64, player_1/loss=10.057, player_2/loss=281.827, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 338.28it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=9.153, player_2/loss=330.254, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 341.51it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=10.118, player_2/loss=246.558, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 344.38it/s, env_step=19456, len=13, n/ep=3, n/st=64, player_1/loss=23.348, player_2/loss=162.557, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 348.59it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=83.069, player_2/loss=124.568, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.25it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=108.427, player_2/loss=74.442, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 341.45it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=110.858, player_2/loss=48.330, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 330.17it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=78.299, player_2/loss=77.810, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.18it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=75.215, player_2/loss=78.881, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.89it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=97.584, player_2/loss=67.598, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 341.92it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=87.596, player_2/loss=51.818, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 342.60it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=76.723, player_2/loss=53.162, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 341.21it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=92.122, player_2/loss=62.985, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 341.03it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=142.008, player_2/loss=51.667, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.12it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=198.842, player_2/loss=63.424, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 340.44it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=193.923, player_2/loss=57.651, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 342.55it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=210.242, player_2/loss=49.858, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 339.84it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=197.962, player_2/loss=74.701, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 343.09it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=168.069, player_2/loss=68.801, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 337.93it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=184.935, player_2/loss=51.036, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 340.93it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=164.690, player_2/loss=123.012, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 341.31it/s, env_step=18432, len=17, n/ep=3, n/st=64, player_1/loss=94.066, player_2/loss=133.719, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 341.40it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=115.489, player_2/loss=84.974, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 341.15it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=100.642, player_2/loss=41.890, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 335.61it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=92.226, player_2/loss=41.308, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 341.02it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=92.231, player_2/loss=37.438, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 340.84it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=78.344, player_2/loss=13.846, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 341.26it/s, env_step=5120, len=17, n/ep=3, n/st=64, player_1/loss=62.231, player_2/loss=23.398, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 341.05it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=74.858, player_2/loss=28.892, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 336.99it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=66.338, player_2/loss=16.167, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 343.89it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=56.335, player_2/loss=9.758, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 341.94it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=47.686, player_2/loss=31.127, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 341.79it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=94.713, player_2/loss=79.535, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 336.39it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=151.596, player_2/loss=124.907, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 341.45it/s, env_step=12288, len=20, n/ep=4, n/st=64, player_1/loss=114.949, player_2/loss=92.130, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 341.08it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=59.353, player_2/loss=75.644, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 343.63it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=65.855, player_2/loss=61.838, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 342.03it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=110.075, player_2/loss=116.139, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 336.95it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=88.680, player_2/loss=126.619, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 339.21it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=33.559, player_2/loss=73.843, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 341.45it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=47.367, player_2/loss=60.684, rew=-12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 340.52it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=81.225, player_2/loss=96.775, rew=16.67]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 341.92it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=127.046, player_2/loss=106.911, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 344.77it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=111.033, player_2/loss=74.433, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.63it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=62.625, player_2/loss=37.019, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 340.13it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=41.817, player_2/loss=20.987, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.86it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=49.794, player_2/loss=18.411, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 336.54it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=45.935, player_2/loss=8.071, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 343.08it/s, env_step=7168, len=15, n/ep=5, n/st=64, player_1/loss=48.190, player_2/loss=11.423, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.85it/s, env_step=8192, len=16, n/ep=3, n/st=64, player_1/loss=58.010, player_2/loss=14.083, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 336.43it/s, env_step=9216, len=20, n/ep=4, n/st=64, player_1/loss=59.628, player_2/loss=41.071, rew=0.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 336.32it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=51.540, player_2/loss=61.206, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 336.32it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=55.238, player_2/loss=58.005, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 340.70it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=66.017, player_2/loss=21.529, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 340.12it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=50.204, player_2/loss=23.221, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 340.74it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=58.706, player_2/loss=36.213, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 341.96it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=54.905, player_2/loss=26.924, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.88it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=83.842, player_2/loss=26.077, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 340.92it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=101.751, player_2/loss=33.128, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 340.01it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=71.329, player_2/loss=20.249, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 340.57it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=153.599, player_2/loss=14.800, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 339.90it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=105.908, player_2/loss=69.254, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.20it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=48.188, player_2/loss=77.392, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.87it/s, env_step=3072, len=27, n/ep=2, n/st=64, player_1/loss=32.748, player_2/loss=60.502, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 342.13it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=71.928, player_2/loss=59.518, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:03, 341.18it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=93.304, player_2/loss=125.182, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 335.58it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=125.025, player_2/loss=219.877, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 340.89it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=135.090, rew=16.67]         


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 340.34it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=82.863, player_2/loss=492.873, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 340.26it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=43.605, player_2/loss=424.387, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:03, 340.48it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=30.035, rew=25.00]         


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 340.22it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=22.711, player_2/loss=525.215, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 338.07it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=21.318, player_2/loss=471.499, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 342.70it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=8.713, player_2/loss=341.110, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 342.14it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=63.226, player_2/loss=387.279, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:03, 339.63it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=79.682, player_2/loss=373.469, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 338.26it/s, env_step=16384, len=9, n/ep=6, n/st=64, player_1/loss=42.252, player_2/loss=325.871, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:03, 341.01it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=40.745, player_2/loss=403.352, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 341.31it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=34.783, player_2/loss=432.558, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:03, 341.26it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=122.410, player_2/loss=421.957, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 341.02it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=111.560, player_2/loss=210.252, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.59it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=142.594, player_2/loss=184.025, rew=3.57]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 345.97it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=192.392, player_2/loss=92.551, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 336.57it/s, env_step=4096, len=9, n/ep=6, n/st=64, player_1/loss=237.052, player_2/loss=81.325, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 341.51it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=264.792, player_2/loss=73.167, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 335.37it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=272.978, player_2/loss=96.971, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 339.60it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=261.533, player_2/loss=127.557, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 341.57it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=246.694, player_2/loss=93.776, rew=18.75]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 340.40it/s, env_step=9216, len=9, n/ep=6, n/st=64, player_1/loss=257.219, player_2/loss=65.023, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 342.05it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=247.654, player_2/loss=40.662, rew=17.86]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 338.01it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=193.763, player_2/loss=19.868, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 339.93it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=215.221, player_2/loss=44.264, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 339.17it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=281.185, player_2/loss=54.423, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 341.16it/s, env_step=14336, len=9, n/ep=8, n/st=64, player_1/loss=269.131, player_2/loss=26.872, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 340.31it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=221.285, player_2/loss=25.390, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 335.26it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=194.212, player_2/loss=28.797, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 339.76it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=215.360, player_2/loss=61.056, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 339.24it/s, env_step=18432, len=10, n/ep=7, n/st=64, player_1/loss=254.751, player_2/loss=60.105, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 340.64it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=252.477, player_2/loss=82.342, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 340.07it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=125.807, player_2/loss=118.493, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 336.94it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=89.245, player_2/loss=168.540, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 341.68it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=79.163, player_2/loss=263.929, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 339.50it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=66.645, rew=25.00]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 341.21it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=38.063, player_2/loss=395.216, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 339.82it/s, env_step=6144, len=8, n/ep=7, n/st=64, player_1/loss=37.465, player_2/loss=404.511, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.36it/s, env_step=7168, len=10, n/ep=7, n/st=64, player_1/loss=45.762, player_2/loss=308.701, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 339.19it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=23.949, player_2/loss=321.116, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 341.78it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=19.165, player_2/loss=365.305, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 339.22it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=54.088, player_2/loss=374.151, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 341.71it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=48.187, player_2/loss=366.682, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 337.80it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=17.819, rew=25.00]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 341.60it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=28.318, player_2/loss=305.885, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 339.65it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=35.602, player_2/loss=325.848, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 339.26it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=22.963, player_2/loss=443.232, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.89it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=12.425, player_2/loss=438.928, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.61it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=23.476, player_2/loss=429.175, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 339.50it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=22.685, player_2/loss=401.126, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 338.86it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=27.469, player_2/loss=388.544, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 338.72it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=15.706, player_2/loss=294.739, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.90it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=22.408, player_2/loss=276.386, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 341.97it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=92.543, player_2/loss=181.459, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 341.94it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=148.856, player_2/loss=70.080, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 342.83it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=113.242, player_2/loss=56.825, rew=5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 341.81it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=127.499, player_2/loss=70.519, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 336.49it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=197.751, player_2/loss=41.865, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 340.52it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=210.603, player_2/loss=62.191, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 340.68it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=162.129, player_2/loss=42.624, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 340.55it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=89.125, player_2/loss=28.052, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 336.07it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=91.141, player_2/loss=24.048, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 340.73it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=111.037, player_2/loss=56.141, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 340.52it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=113.922, player_2/loss=60.610, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 341.97it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=125.301, player_2/loss=22.382, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 342.79it/s, env_step=15360, len=16, n/ep=3, n/st=64, player_1/loss=137.161, player_2/loss=7.656, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 340.21it/s, env_step=16384, len=16, n/ep=3, n/st=64, player_1/loss=161.893, player_2/loss=32.984, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 339.95it/s, env_step=17408, len=17, n/ep=4, n/st=64, player_1/loss=171.306, player_2/loss=95.017, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 340.78it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=173.747, player_2/loss=104.271, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 339.36it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=160.639, player_2/loss=51.345, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 339.92it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=107.989, player_2/loss=112.656, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.89it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=104.198, player_2/loss=154.303, rew=-5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.31it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=118.942, player_2/loss=226.682, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 341.40it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=98.889, player_2/loss=330.307, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 341.03it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=73.919, player_2/loss=274.592, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 336.66it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=71.814, player_2/loss=176.476, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 340.68it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=53.830, rew=25.00]          


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 343.32it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=47.581, player_2/loss=200.114, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 341.99it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=36.576, player_2/loss=259.980, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 340.76it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=25.733, player_2/loss=227.949, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 337.23it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=63.528, player_2/loss=275.936, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 340.92it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=66.019, player_2/loss=245.319, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 337.50it/s, env_step=13312, len=9, n/ep=8, n/st=64, player_1/loss=20.594, player_2/loss=287.812, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 341.50it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=37.831, player_2/loss=298.933, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 338.99it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=21.711, player_2/loss=293.936, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 336.37it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=8.615, player_2/loss=263.754, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 340.74it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=20.747, player_2/loss=258.725, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 340.66it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=29.551, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 340.38it/s, env_step=19456, len=11, n/ep=4, n/st=64, player_1/loss=27.659, player_2/loss=264.096, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 337.95it/s, env_step=1024, len=9, n/ep=6, n/st=64, player_1/loss=65.728, player_2/loss=311.071, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.83it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=91.873, player_2/loss=230.339, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.09it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=144.687, player_2/loss=157.960, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 340.35it/s, env_step=4096, len=19, n/ep=2, n/st=64, player_1/loss=144.527, player_2/loss=129.756, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 344.64it/s, env_step=5120, len=25, n/ep=3, n/st=64, player_1/loss=144.701, player_2/loss=86.809, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 343.31it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=143.496, player_2/loss=87.322, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 339.44it/s, env_step=7168, len=28, n/ep=2, n/st=64, player_1/loss=132.731, player_2/loss=85.862, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 343.80it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=329.034, player_2/loss=46.782, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 339.97it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=339.441, player_2/loss=42.862, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.66it/s, env_step=10240, len=19, n/ep=4, n/st=64, player_1/loss=182.698, player_2/loss=51.548, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 338.92it/s, env_step=11264, len=28, n/ep=2, n/st=64, player_1/loss=156.157, player_2/loss=100.462, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.60it/s, env_step=12288, len=25, n/ep=2, n/st=64, player_1/loss=142.701, player_2/loss=126.056, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 342.38it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=140.143, player_2/loss=88.852, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 341.37it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=111.728, player_2/loss=109.922, rew=-15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 336.10it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=108.670, player_2/loss=44.367, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 340.73it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=134.676, player_2/loss=83.993, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 341.32it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_1/loss=147.039, player_2/loss=121.907, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 339.99it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=199.535, player_2/loss=122.728, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 336.92it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=269.027, player_2/loss=118.868, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 339.46it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=226.410, player_2/loss=80.446, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 341.33it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=189.833, player_2/loss=68.201, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.68it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=188.933, player_2/loss=61.085, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 335.71it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=158.232, player_2/loss=83.989, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:03, 339.92it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=160.022, player_2/loss=170.361, rew=10.71]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 343.11it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=211.750, player_2/loss=351.770, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 337.12it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=139.417, player_2/loss=445.443, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 335.59it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=72.965, player_2/loss=396.058, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 338.56it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=68.908, player_2/loss=401.763, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:03, 339.48it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=33.974, player_2/loss=364.669, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 340.36it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=29.154, player_2/loss=358.984, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 334.22it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=59.862, player_2/loss=334.943, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 338.98it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=58.492, player_2/loss=320.438, rew=13.89]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 338.77it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=98.534, player_2/loss=297.664, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:03, 340.67it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=73.044, player_2/loss=353.368, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 332.97it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=41.652, player_2/loss=398.389, rew=13.89]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:03, 338.16it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=56.280, player_2/loss=384.286, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 339.04it/s, env_step=18432, len=7, n/ep=10, n/st=64, player_1/loss=27.525, player_2/loss=446.852, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:03, 337.71it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=10.142, player_2/loss=462.562, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 336.43it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=8.603, player_2/loss=322.807, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.45it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=49.005, rew=-25.00]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.03it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=84.493, player_2/loss=303.173, rew=-19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 340.98it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=66.576, player_2/loss=324.353, rew=-10.71]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 335.35it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=97.909, player_2/loss=298.390, rew=-18.75]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 340.69it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=119.717, player_2/loss=314.659, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 348.61it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=125.827, player_2/loss=300.662, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 338.17it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=127.002, player_2/loss=290.175, rew=-19.44]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 340.12it/s, env_step=9216, len=9, n/ep=6, n/st=64, player_1/loss=132.102, player_2/loss=266.635, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 339.19it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=89.234, player_2/loss=264.362, rew=-19.44]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 343.87it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=89.474, player_2/loss=272.530, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 336.77it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=120.916, rew=-25.00]       


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 341.43it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=93.285, player_2/loss=283.359, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 341.20it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=105.391, player_2/loss=246.298, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 341.10it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=110.817, player_2/loss=235.876, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 337.95it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=98.375, player_2/loss=266.104, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 341.65it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=84.922, player_2/loss=267.143, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 340.31it/s, env_step=18432, len=17, n/ep=3, n/st=64, player_1/loss=99.168, player_2/loss=245.004, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 340.59it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=129.437, player_2/loss=146.496, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 342.04it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=127.279, player_2/loss=98.462, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.08it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=111.463, player_2/loss=115.719, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 338.89it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=109.224, player_2/loss=175.191, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 341.73it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=110.336, player_2/loss=207.675, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 339.35it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=92.977, player_2/loss=138.180, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 334.84it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=81.072, player_2/loss=125.630, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 341.87it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=74.117, player_2/loss=161.006, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 341.46it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=70.880, player_2/loss=161.948, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 341.32it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=62.542, player_2/loss=192.226, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 338.50it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=53.739, player_2/loss=198.504, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 336.00it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=46.327, player_2/loss=178.399, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 340.47it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=36.096, player_2/loss=165.438, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 341.21it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=15.834, rew=0.00]         


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 338.95it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=26.077, player_2/loss=197.290, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 342.05it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=42.940, player_2/loss=201.805, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 333.54it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=26.623, player_2/loss=231.323, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 339.86it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=8.909, player_2/loss=233.382, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 338.14it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=29.631, player_2/loss=230.031, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 336.93it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=49.725, player_2/loss=204.586, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 335.70it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=84.427, player_2/loss=261.930, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.49it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=104.233, player_2/loss=221.744, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.80it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=94.383, player_2/loss=201.377, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.83it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=80.601, player_2/loss=206.888, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.87it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=136.034, player_2/loss=150.087, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 335.17it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=160.056, player_2/loss=139.021, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 340.13it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=136.095, player_2/loss=175.559, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:02, 350.91it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=115.594, player_2/loss=170.749, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:03, 337.58it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=113.293, player_2/loss=115.366, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:03, 337.63it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=109.797, player_2/loss=61.726, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:03, 338.84it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=156.530, player_2/loss=45.809, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:03, 341.49it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=150.522, player_2/loss=88.743, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:03, 341.43it/s, env_step=13312, len=20, n/ep=4, n/st=64, player_1/loss=158.282, player_2/loss=94.226, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:03, 336.29it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=170.100, player_2/loss=83.821, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:02, 342.06it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=117.315, player_2/loss=64.839, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:03, 336.84it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=126.248, player_2/loss=50.035, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:03, 339.66it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=187.391, player_2/loss=39.572, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:03, 338.47it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=261.746, player_2/loss=57.400, rew=5.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:03, 338.82it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=248.911, player_2/loss=111.815, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:02, 342.12it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=139.218, player_2/loss=186.740, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 341.57it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=149.983, player_2/loss=230.863, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 339.19it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=122.089, player_2/loss=295.587, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 337.83it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=84.957, player_2/loss=393.591, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.05it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=70.698, player_2/loss=308.592, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 339.94it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=106.619, player_2/loss=206.189, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.27it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=144.933, player_2/loss=269.087, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.21it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=118.223, player_2/loss=227.501, rew=-5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 335.72it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_2/loss=131.591, rew=25.00]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 340.21it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=108.219, player_2/loss=216.172, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 342.36it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=60.161, player_2/loss=259.479, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 341.82it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=49.586, player_2/loss=195.449, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 339.06it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=56.247, player_2/loss=221.860, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 337.74it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=24.689, player_2/loss=246.976, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 340.63it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=69.227, player_2/loss=264.838, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.18it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=76.654, player_2/loss=262.451, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 338.64it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=24.371, player_2/loss=262.296, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 340.80it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=21.731, player_2/loss=321.282, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 338.23it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=15.207, player_2/loss=303.396, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 342.76it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=33.895, player_2/loss=155.403, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 338.36it/s, env_step=2048, len=16, n/ep=5, n/st=64, player_1/loss=106.842, player_2/loss=122.062, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 339.57it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=108.077, player_2/loss=116.969, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 337.48it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=52.990, player_2/loss=144.539, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.54it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=43.953, player_2/loss=171.116, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 341.02it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=30.651, player_2/loss=170.927, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.86it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=51.047, player_2/loss=154.202, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.10it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=109.937, player_2/loss=110.450, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 339.51it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=143.127, player_2/loss=96.009, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 342.72it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=102.609, player_2/loss=129.519, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 340.83it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=67.594, player_2/loss=112.672, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 340.23it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=53.186, player_2/loss=72.811, rew=-12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 339.52it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=83.076, player_2/loss=111.931, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 336.62it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=105.230, player_2/loss=135.168, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 340.30it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=55.972, player_2/loss=127.221, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.13it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=55.306, player_2/loss=91.397, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 339.26it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=75.248, player_2/loss=73.559, rew=-5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 338.20it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=113.024, player_2/loss=143.402, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 336.49it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=160.847, player_2/loss=185.833, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 339.96it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=174.947, player_2/loss=111.397, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.83it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=141.026, player_2/loss=122.828, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.86it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=93.790, player_2/loss=89.090, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 341.01it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=98.236, player_2/loss=135.667, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.18it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=120.538, player_2/loss=143.780, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 341.37it/s, env_step=6144, len=23, n/ep=3, n/st=64, player_1/loss=97.465, player_2/loss=119.455, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 341.57it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=94.448, player_2/loss=110.704, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 342.40it/s, env_step=8192, len=22, n/ep=2, n/st=64, player_1/loss=101.767, player_2/loss=69.880, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 338.12it/s, env_step=9216, len=21, n/ep=3, n/st=64, player_1/loss=70.642, player_2/loss=76.897, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.87it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=71.212, player_2/loss=112.156, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 338.68it/s, env_step=11264, len=20, n/ep=4, n/st=64, player_1/loss=72.195, player_2/loss=123.701, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.45it/s, env_step=12288, len=20, n/ep=4, n/st=64, player_1/loss=90.239, player_2/loss=106.816, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 338.44it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=139.259, player_2/loss=134.079, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 324.32it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=105.813, player_2/loss=165.727, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 337.20it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=60.359, player_2/loss=218.863, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 338.67it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=35.922, player_2/loss=258.758, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 338.99it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=41.368, player_2/loss=325.825, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 337.50it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=32.547, player_2/loss=294.605, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 334.63it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=26.113, player_2/loss=261.328, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 337.77it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=79.231, player_2/loss=272.759, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.52it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=65.766, player_2/loss=250.638, rew=-16.67]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 342.16it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=131.175, player_2/loss=206.250, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 340.53it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=366.565, player_2/loss=140.689, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 338.98it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=501.934, player_2/loss=131.080, rew=10.71]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 342.23it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=420.457, player_2/loss=114.004, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 339.69it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=432.644, player_2/loss=63.179, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 340.32it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=418.738, player_2/loss=95.412, rew=10.71]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 339.33it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=407.888, player_2/loss=126.554, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 335.60it/s, env_step=10240, len=9, n/ep=6, n/st=64, player_1/loss=446.639, player_2/loss=80.047, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 349.62it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=565.826, player_2/loss=61.974, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 339.98it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=614.867, player_2/loss=28.815, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 338.93it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=536.526, player_2/loss=44.470, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 339.69it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=400.888, player_2/loss=44.001, rew=10.71]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 338.02it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=421.228, player_2/loss=70.896, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 342.13it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=493.771, player_2/loss=60.040, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 339.72it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=481.845, player_2/loss=34.128, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 339.61it/s, env_step=18432, len=9, n/ep=8, n/st=64, player_1/loss=447.333, player_2/loss=30.364, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 334.32it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=482.019, player_2/loss=59.448, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 338.15it/s, env_step=1024, len=8, n/ep=7, n/st=64, player_1/loss=356.367, player_2/loss=21.133, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.52it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=265.268, player_2/loss=46.760, rew=-18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.53it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=218.293, player_2/loss=68.881, rew=-18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 342.76it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=187.347, player_2/loss=103.590, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.30it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=92.864, player_2/loss=142.273, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 341.43it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=102.705, player_2/loss=222.824, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 341.33it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=95.570, player_2/loss=196.883, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 341.33it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=63.714, player_2/loss=196.740, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 335.42it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=55.161, player_2/loss=268.881, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.38it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=31.548, player_2/loss=280.861, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 340.18it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=15.420, player_2/loss=224.458, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 340.73it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=5.123, player_2/loss=149.966, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 339.68it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=11.362, player_2/loss=165.769, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 339.40it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=14.555, player_2/loss=169.795, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 340.35it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=7.013, player_2/loss=192.106, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 341.98it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=22.788, player_2/loss=222.610, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 339.76it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=26.363, player_2/loss=182.815, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 339.98it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=7.954, rew=25.00]         


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 330.31it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=18.657, player_2/loss=250.693, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 339.86it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=8.517, player_2/loss=169.961, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 341.00it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=36.203, player_2/loss=169.806, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 339.13it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=155.245, player_2/loss=146.021, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 339.79it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=229.725, player_2/loss=83.177, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 336.82it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=213.866, player_2/loss=112.914, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 339.75it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=208.101, player_2/loss=106.066, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.64it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=218.643, player_2/loss=130.731, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 343.00it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=176.413, player_2/loss=133.827, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 343.44it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=223.180, player_2/loss=102.815, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 337.10it/s, env_step=10240, len=19, n/ep=4, n/st=64, player_1/loss=151.923, player_2/loss=37.153, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 340.50it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=159.273, player_2/loss=31.753, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 348.22it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=202.001, player_2/loss=64.121, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 339.47it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=190.274, player_2/loss=112.742, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 337.85it/s, env_step=14336, len=17, n/ep=3, n/st=64, player_1/loss=187.784, player_2/loss=102.991, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 340.59it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=151.548, player_2/loss=84.820, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 341.11it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=161.237, player_2/loss=49.554, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 338.72it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=196.152, player_2/loss=44.553, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 339.57it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=190.236, player_2/loss=19.445, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 336.56it/s, env_step=19456, len=19, n/ep=3, n/st=64, player_1/loss=212.566, player_2/loss=29.567, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 338.77it/s, env_step=1024, len=19, n/ep=4, n/st=64, player_1/loss=150.554, player_2/loss=28.294, rew=0.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.14it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=139.920, rew=-25.00]        


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 340.01it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=96.390, player_2/loss=110.282, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 338.53it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=60.968, player_2/loss=108.498, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 337.63it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=60.695, player_2/loss=173.653, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 340.71it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=28.475, player_2/loss=213.519, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 339.40it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=29.664, player_2/loss=237.593, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 339.73it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=37.574, player_2/loss=247.986, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 339.17it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=55.519, player_2/loss=274.313, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 339.49it/s, env_step=10240, len=10, n/ep=7, n/st=64, player_1/loss=46.879, player_2/loss=309.417, rew=10.71]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 342.02it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=33.383, player_2/loss=286.805, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 340.91it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=32.488, player_2/loss=244.514, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 339.06it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=33.214, player_2/loss=262.157, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 341.16it/s, env_step=14336, len=14, n/ep=3, n/st=64, player_1/loss=17.904, player_2/loss=279.872, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 339.60it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=26.471, player_2/loss=267.093, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 341.80it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=30.105, player_2/loss=237.437, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 340.45it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=82.658, player_2/loss=306.278, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 342.21it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=69.972, player_2/loss=260.740, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 339.79it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=25.748, player_2/loss=221.064, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 337.35it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=6.491, player_2/loss=267.098, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 341.73it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=6.439, player_2/loss=199.180, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 341.03it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=60.306, player_2/loss=127.876, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 340.14it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_2/loss=103.204, rew=12.50]         


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.21it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=61.355, player_2/loss=114.262, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 336.59it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=97.514, player_2/loss=108.039, rew=-5.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 340.77it/s, env_step=7168, len=26, n/ep=3, n/st=64, player_1/loss=146.314, player_2/loss=78.394, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 342.17it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=134.358, player_2/loss=86.646, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:03, 341.61it/s, env_step=9216, len=26, n/ep=2, n/st=64, player_1/loss=124.031, player_2/loss=52.659, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:03, 335.91it/s, env_step=10240, len=26, n/ep=3, n/st=64, player_1/loss=180.392, player_2/loss=39.758, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:03, 341.50it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=192.233, player_2/loss=87.222, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:03, 339.48it/s, env_step=12288, len=23, n/ep=2, n/st=64, player_1/loss=130.469, player_2/loss=81.110, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 346.89it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=118.080, player_2/loss=65.773, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:03, 341.00it/s, env_step=14336, len=20, n/ep=2, n/st=64, player_1/loss=119.868, player_2/loss=123.615, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:03, 338.27it/s, env_step=15360, len=25, n/ep=3, n/st=64, player_1/loss=106.667, player_2/loss=121.664, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:03, 339.83it/s, env_step=16384, len=24, n/ep=2, n/st=64, player_1/loss=87.198, player_2/loss=97.121, rew=0.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:03, 340.65it/s, env_step=17408, len=27, n/ep=2, n/st=64, player_1/loss=121.949, player_2/loss=84.985, rew=0.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:03, 340.30it/s, env_step=18432, len=24, n/ep=2, n/st=64, player_1/loss=152.310, player_2/loss=92.260, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:03, 335.18it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=109.226, player_2/loss=99.169, rew=-12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:03, 339.74it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=66.986, player_2/loss=94.766, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.38it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=100.372, player_2/loss=145.506, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.69it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=77.287, player_2/loss=191.392, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.70it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=16.341, player_2/loss=198.676, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.48it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=37.135, player_2/loss=167.046, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 339.41it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=35.930, player_2/loss=152.628, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 339.95it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=21.192, player_2/loss=181.693, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 340.53it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=56.981, player_2/loss=150.001, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 338.83it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=50.244, player_2/loss=180.930, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 324.32it/s, env_step=10240, len=11, n/ep=4, n/st=64, player_1/loss=42.963, player_2/loss=178.223, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 339.02it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=39.496, player_2/loss=184.286, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.19it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=8.031, player_2/loss=198.540, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 338.41it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=6.107, player_2/loss=199.250, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.57it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=3.374, player_2/loss=195.397, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 333.82it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=20.645, player_2/loss=191.123, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 337.16it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=23.343, player_2/loss=198.308, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 340.51it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=6.325, player_2/loss=199.527, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 339.68it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=23.019, player_2/loss=224.765, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 339.51it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=25.838, player_2/loss=195.602, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 335.06it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=44.328, player_2/loss=200.072, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 341.60it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=84.817, player_2/loss=150.955, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 341.56it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=86.838, player_2/loss=81.931, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.52it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=47.153, player_2/loss=59.424, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 338.41it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=31.485, player_2/loss=50.451, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.28it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=40.606, rew=-25.00]         


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 342.28it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_2/loss=100.293, rew=-25.00]        


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 339.58it/s, env_step=8192, len=21, n/ep=2, n/st=64, player_1/loss=51.340, player_2/loss=119.222, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:03, 340.56it/s, env_step=9216, len=20, n/ep=4, n/st=64, player_1/loss=66.881, player_2/loss=100.583, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:03, 341.02it/s, env_step=10240, len=24, n/ep=3, n/st=64, player_1/loss=38.999, player_2/loss=104.363, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:03, 333.40it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=66.098, player_2/loss=167.555, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:03, 339.46it/s, env_step=12288, len=21, n/ep=3, n/st=64, player_1/loss=86.007, player_2/loss=118.542, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:02, 341.72it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=65.210, player_2/loss=108.400, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:02, 348.87it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=48.075, player_2/loss=99.972, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:03, 334.85it/s, env_step=15360, len=27, n/ep=3, n/st=64, player_1/loss=41.382, player_2/loss=88.839, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:03, 337.98it/s, env_step=16384, len=26, n/ep=3, n/st=64, player_1/loss=30.289, player_2/loss=61.833, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:03, 339.70it/s, env_step=17408, len=22, n/ep=4, n/st=64, player_1/loss=25.567, player_2/loss=88.784, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:03, 339.48it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=79.989, player_2/loss=116.832, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:03, 339.51it/s, env_step=19456, len=24, n/ep=2, n/st=64, player_1/loss=83.693, player_2/loss=105.996, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:03, 335.83it/s, env_step=1024, len=25, n/ep=3, n/st=64, player_1/loss=87.164, player_2/loss=117.337, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 341.27it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=92.250, player_2/loss=94.676, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 329.90it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=67.055, player_2/loss=83.785, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 340.49it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=48.204, player_2/loss=74.555, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.42it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=27.461, player_2/loss=85.003, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.34it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=25.783, player_2/loss=73.115, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 340.30it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=37.091, player_2/loss=54.027, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.54it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=44.934, player_2/loss=57.014, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 337.41it/s, env_step=9216, len=23, n/ep=2, n/st=64, player_1/loss=56.279, player_2/loss=65.056, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 341.47it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=55.578, player_2/loss=109.420, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 338.09it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=76.455, player_2/loss=110.138, rew=5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 338.87it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=67.513, player_2/loss=108.996, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 338.15it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=33.627, player_2/loss=138.638, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.56it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=34.373, player_2/loss=152.378, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 337.09it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=19.413, player_2/loss=141.937, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 335.07it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=20.530, player_2/loss=146.497, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 340.72it/s, env_step=17408, len=19, n/ep=4, n/st=64, player_1/loss=31.905, player_2/loss=192.748, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 339.73it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=22.091, player_2/loss=192.240, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 340.14it/s, env_step=19456, len=12, n/ep=4, n/st=64, player_1/loss=11.742, player_2/loss=162.362, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 339.73it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=121.839, player_2/loss=147.765, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 335.59it/s, env_step=2048, len=24, n/ep=3, n/st=64, player_1/loss=167.183, player_2/loss=149.464, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 341.20it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=146.694, player_2/loss=134.467, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 340.56it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=103.938, player_2/loss=117.772, rew=12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 340.20it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=167.392, player_2/loss=128.263, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 340.61it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=222.093, player_2/loss=134.371, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 336.95it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=386.334, player_2/loss=116.911, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 341.45it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=414.626, player_2/loss=59.453, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 341.34it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=363.013, player_2/loss=48.108, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.20it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=385.015, player_2/loss=37.912, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 338.32it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=349.607, player_2/loss=55.948, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 335.94it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=335.418, player_2/loss=99.352, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 340.05it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=320.834, player_2/loss=140.782, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 340.20it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=358.059, player_2/loss=90.395, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 346.95it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=389.543, player_2/loss=11.893, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 335.10it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=384.262, player_2/loss=6.326, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 341.08it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=359.108, player_2/loss=6.061, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 338.95it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=312.245, player_2/loss=58.237, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 340.89it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=309.081, player_2/loss=62.716, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 339.44it/s, env_step=1024, len=13, n/ep=4, n/st=64, player_1/loss=351.062, player_2/loss=88.799, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 337.04it/s, env_step=2048, len=12, n/ep=6, n/st=64, player_1/loss=189.735, player_2/loss=232.754, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 340.95it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=106.715, player_2/loss=272.609, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 340.91it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=103.249, player_2/loss=156.637, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.91it/s, env_step=5120, len=20, n/ep=2, n/st=64, player_1/loss=49.611, player_2/loss=93.166, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 333.18it/s, env_step=6144, len=25, n/ep=3, n/st=64, player_1/loss=38.384, player_2/loss=95.306, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.03it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=54.456, player_2/loss=105.301, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 339.24it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=80.901, player_2/loss=121.143, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 340.35it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=56.377, player_2/loss=237.709, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 339.85it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=31.109, player_2/loss=297.099, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 336.95it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=24.531, player_2/loss=236.876, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 339.38it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=13.425, player_2/loss=232.445, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 338.53it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=12.207, player_2/loss=218.068, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 341.24it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=16.217, player_2/loss=193.263, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 340.51it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=16.359, player_2/loss=224.945, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.22it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=9.220, player_2/loss=247.497, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 341.00it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=11.026, player_2/loss=255.146, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 338.72it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=14.516, player_2/loss=234.431, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 339.33it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=23.467, player_2/loss=326.374, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 339.09it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=20.425, player_2/loss=344.753, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.54it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=30.607, player_2/loss=269.368, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.79it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=36.208, player_2/loss=179.909, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.26it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=64.527, player_2/loss=139.109, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.55it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=80.645, player_2/loss=124.004, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.99it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=60.508, player_2/loss=98.105, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:03, 335.31it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=35.758, player_2/loss=93.822, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:03, 340.67it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=39.020, player_2/loss=79.123, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:03, 340.76it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=80.233, player_2/loss=105.357, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:03, 340.08it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=58.746, player_2/loss=89.041, rew=-15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:03, 338.80it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=29.331, player_2/loss=72.298, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 334.77it/s, env_step=12288, len=22, n/ep=3, n/st=64, player_1/loss=76.007, player_2/loss=70.137, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:03, 339.88it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=113.445, player_2/loss=95.067, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:03, 340.55it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=183.786, player_2/loss=91.437, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:03, 338.80it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=182.645, player_2/loss=70.614, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:03, 339.11it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=101.690, player_2/loss=36.377, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 346.91it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=156.858, player_2/loss=43.014, rew=12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 341.72it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=238.884, player_2/loss=58.739, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:03, 337.15it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=262.934, player_2/loss=72.919, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:03, 339.14it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=157.428, player_2/loss=43.522, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 335.79it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=115.781, player_2/loss=99.270, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.63it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=42.523, player_2/loss=141.180, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.50it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=118.254, rew=18.75]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 338.29it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=61.715, rew=25.00]           


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.39it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=20.834, player_2/loss=263.716, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 336.28it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=14.804, player_2/loss=223.923, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.06it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=6.092, player_2/loss=198.997, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 338.21it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=6.105, player_2/loss=204.252, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 337.84it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=17.796, player_2/loss=228.710, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.74it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=39.653, player_2/loss=251.536, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 334.95it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=8.872, player_2/loss=257.770, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 337.96it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=10.027, player_2/loss=267.619, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 337.77it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=14.710, player_2/loss=271.113, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 337.36it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=3.548, player_2/loss=270.177, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 337.58it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=2.668, player_2/loss=255.996, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 333.35it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=3.638, player_2/loss=275.006, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 337.46it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=6.185, player_2/loss=274.459, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 339.07it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=7.744, player_2/loss=257.338, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 340.97it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=88.734, player_2/loss=241.111, rew=-6.25]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 334.50it/s, env_step=2048, len=26, n/ep=2, n/st=64, player_1/loss=78.329, player_2/loss=149.984, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.62it/s, env_step=3072, len=16, n/ep=3, n/st=64, player_1/loss=171.223, player_2/loss=145.004, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.57it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=210.743, player_2/loss=126.224, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 342.20it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=194.744, player_2/loss=52.502, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 336.87it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=151.572, player_2/loss=51.113, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 336.52it/s, env_step=7168, len=17, n/ep=5, n/st=64, player_1/loss=200.117, player_2/loss=78.275, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 339.06it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=252.306, player_2/loss=113.240, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 338.76it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=281.987, player_2/loss=71.208, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:03, 339.13it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=168.030, player_2/loss=58.703, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 337.91it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=211.107, player_2/loss=77.142, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 336.03it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=207.971, player_2/loss=83.258, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 341.28it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=261.803, player_2/loss=75.069, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 339.30it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=253.828, player_2/loss=85.099, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:03, 339.04it/s, env_step=15360, len=24, n/ep=2, n/st=64, player_1/loss=295.424, player_2/loss=68.872, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 337.79it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=398.766, rew=12.50]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:03, 333.61it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=298.149, player_2/loss=58.660, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 349.79it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=223.714, player_2/loss=62.539, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:03, 340.98it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=293.612, player_2/loss=62.257, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 339.95it/s, env_step=1024, len=23, n/ep=3, n/st=64, player_1/loss=173.747, player_2/loss=122.907, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 337.73it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=133.618, player_2/loss=119.287, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 338.67it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=122.942, player_2/loss=137.887, rew=-10.71]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 340.80it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=105.978, player_2/loss=153.861, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.40it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=143.410, player_2/loss=156.992, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 339.33it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=129.642, player_2/loss=180.042, rew=-16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.57it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=95.170, player_2/loss=287.329, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 335.67it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=88.185, player_2/loss=373.122, rew=-5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 338.31it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=53.799, player_2/loss=330.275, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.86it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=58.098, player_2/loss=314.612, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 338.19it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=31.185, player_2/loss=355.542, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 337.76it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=31.860, player_2/loss=385.419, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 335.27it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=70.280, player_2/loss=325.177, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 341.16it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=54.486, player_2/loss=277.129, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 336.72it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=51.571, player_2/loss=359.825, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.07it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=27.281, player_2/loss=393.546, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 339.13it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=46.799, player_2/loss=406.483, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 336.32it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=62.677, player_2/loss=327.891, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 338.04it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=79.541, rew=15.00]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 341.36it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=81.612, player_2/loss=238.105, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 337.92it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=80.943, player_2/loss=233.440, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.07it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=48.394, rew=-25.00]         


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 338.68it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=39.641, player_2/loss=192.664, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 340.22it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=42.367, player_2/loss=178.709, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 341.01it/s, env_step=6144, len=13, n/ep=4, n/st=64, player_1/loss=41.497, player_2/loss=107.634, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.85it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=135.965, player_2/loss=91.246, rew=-5.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 334.97it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=231.260, player_2/loss=147.246, rew=-5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 338.95it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=343.745, player_2/loss=205.694, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 339.24it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=376.095, player_2/loss=182.829, rew=17.86]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 339.79it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=294.096, player_2/loss=183.771, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 334.76it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=360.553, player_2/loss=181.002, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 336.58it/s, env_step=13312, len=9, n/ep=8, n/st=64, player_1/loss=381.302, player_2/loss=137.194, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 339.06it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=325.234, player_2/loss=99.406, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 337.53it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=320.434, player_2/loss=40.896, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.51it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=326.901, player_2/loss=31.546, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.61it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=361.180, player_2/loss=41.150, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 340.48it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=370.761, player_2/loss=45.606, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 348.03it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=329.578, player_2/loss=55.311, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 337.75it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=167.939, player_2/loss=57.876, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 335.49it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=126.534, player_2/loss=96.896, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 339.74it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=94.842, player_2/loss=210.812, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 336.58it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=52.278, player_2/loss=263.845, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 330.69it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=12.815, player_2/loss=318.597, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 340.15it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=19.540, player_2/loss=280.804, rew=15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 336.02it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=52.382, player_2/loss=286.819, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.17it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=57.294, player_2/loss=329.215, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 339.74it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=18.083, player_2/loss=276.026, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 339.28it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=13.509, player_2/loss=270.762, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 339.69it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=23.741, player_2/loss=286.305, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 335.93it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=41.343, player_2/loss=228.338, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 339.45it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=30.875, player_2/loss=233.772, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 338.80it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=37.225, player_2/loss=275.651, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 339.51it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=36.687, player_2/loss=268.369, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.50it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=5.244, player_2/loss=275.368, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 337.22it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=5.707, player_2/loss=219.248, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 337.71it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=39.329, player_2/loss=239.330, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 338.89it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=65.537, player_2/loss=248.916, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 340.62it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=33.224, player_2/loss=199.387, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.72it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=24.545, player_2/loss=163.280, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 334.13it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=39.017, player_2/loss=134.016, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 340.92it/s, env_step=4096, len=23, n/ep=2, n/st=64, player_1/loss=92.126, player_2/loss=126.508, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.70it/s, env_step=5120, len=16, n/ep=3, n/st=64, player_1/loss=121.833, player_2/loss=123.110, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 339.63it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=72.483, player_2/loss=146.699, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 339.13it/s, env_step=7168, len=19, n/ep=4, n/st=64, player_1/loss=37.050, player_2/loss=149.542, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 333.69it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=148.422, player_2/loss=199.239, rew=-17.86]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 339.97it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=192.685, player_2/loss=200.732, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 339.56it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=116.932, player_2/loss=85.737, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 341.10it/s, env_step=11264, len=23, n/ep=3, n/st=64, player_1/loss=84.755, player_2/loss=84.198, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.82it/s, env_step=12288, len=25, n/ep=2, n/st=64, player_1/loss=116.331, player_2/loss=103.367, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 333.95it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=235.339, player_2/loss=82.114, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 341.33it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=358.435, player_2/loss=72.201, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 338.77it/s, env_step=15360, len=13, n/ep=6, n/st=64, player_1/loss=352.349, player_2/loss=76.839, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 338.24it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=364.582, player_2/loss=84.779, rew=16.67]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 339.18it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=379.035, player_2/loss=82.750, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 337.73it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=386.175, player_2/loss=80.512, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 338.25it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=377.092, player_2/loss=52.103, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 347.56it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=220.849, player_2/loss=154.758, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.65it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=210.716, player_2/loss=99.245, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 335.26it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=219.982, player_2/loss=61.486, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.43it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=196.654, player_2/loss=77.119, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 339.58it/s, env_step=5120, len=17, n/ep=3, n/st=64, player_1/loss=150.114, player_2/loss=91.527, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:03, 338.82it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=94.540, rew=25.00]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:03, 337.21it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=40.942, player_2/loss=237.545, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:03, 333.39it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=20.092, player_2/loss=261.992, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:03, 336.49it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=18.794, rew=25.00]          


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:03, 338.98it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=14.955, player_2/loss=258.591, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:03, 338.05it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=19.008, rew=25.00]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:03, 338.99it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=42.408, player_2/loss=239.416, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:03, 336.20it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=68.025, player_2/loss=211.692, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:03, 338.66it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=39.425, player_2/loss=153.005, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:03, 340.47it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=12.552, player_2/loss=201.201, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:03, 336.73it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=24.346, player_2/loss=272.519, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:03, 336.26it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=22.864, player_2/loss=282.629, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:03, 335.07it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=15.600, player_2/loss=244.344, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:03, 338.55it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=14.288, player_2/loss=274.123, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:03, 338.84it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=76.077, player_2/loss=233.824, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.35it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=156.551, player_2/loss=195.035, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 339.89it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=215.196, player_2/loss=148.517, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 333.36it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=139.240, player_2/loss=162.346, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 339.32it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=52.196, player_2/loss=184.970, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 339.85it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=38.138, player_2/loss=177.014, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 339.88it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=38.645, player_2/loss=155.687, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 339.03it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=12.452, player_2/loss=99.533, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 337.93it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=43.776, player_2/loss=67.959, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 339.54it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=55.355, player_2/loss=60.769, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 340.60it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=65.234, player_2/loss=66.760, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 341.20it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=104.918, player_2/loss=90.470, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 340.07it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=103.689, player_2/loss=106.406, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 335.29it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=131.677, player_2/loss=102.284, rew=8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 339.24it/s, env_step=15360, len=25, n/ep=3, n/st=64, player_1/loss=224.664, player_2/loss=68.525, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 340.08it/s, env_step=16384, len=19, n/ep=4, n/st=64, player_1/loss=269.986, player_2/loss=68.659, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 339.81it/s, env_step=17408, len=24, n/ep=2, n/st=64, player_1/loss=253.336, player_2/loss=94.302, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 336.55it/s, env_step=18432, len=25, n/ep=2, n/st=64, player_1/loss=209.959, player_2/loss=86.049, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 337.81it/s, env_step=19456, len=23, n/ep=2, n/st=64, player_1/loss=205.028, player_2/loss=76.473, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 340.85it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=131.766, player_2/loss=100.298, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 348.69it/s, env_step=2048, len=28, n/ep=3, n/st=64, player_1/loss=176.921, player_2/loss=106.325, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.13it/s, env_step=3072, len=28, n/ep=2, n/st=64, player_1/loss=160.972, player_2/loss=79.048, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 334.55it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=81.188, player_2/loss=68.248, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 337.95it/s, env_step=5120, len=23, n/ep=2, n/st=64, player_1/loss=95.049, player_2/loss=75.484, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 341.62it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=111.172, player_2/loss=106.486, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 337.62it/s, env_step=7168, len=9, n/ep=6, n/st=64, player_1/loss=88.507, player_2/loss=165.741, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 339.40it/s, env_step=8192, len=13, n/ep=6, n/st=64, player_1/loss=53.647, player_2/loss=164.973, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 335.43it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=49.332, player_2/loss=164.438, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 339.08it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=28.541, player_2/loss=165.412, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 338.11it/s, env_step=11264, len=9, n/ep=6, n/st=64, player_1/loss=18.811, player_2/loss=180.836, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 338.42it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=34.990, player_2/loss=205.373, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 337.29it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=64.584, player_2/loss=228.639, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 332.65it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=56.029, player_2/loss=248.146, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 337.83it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=21.243, player_2/loss=216.459, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 336.83it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=53.298, player_2/loss=241.639, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 338.11it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=51.460, player_2/loss=194.655, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 337.43it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=23.323, player_2/loss=197.109, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 334.04it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=26.048, player_2/loss=194.899, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 339.06it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=362.063, player_2/loss=274.059, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 341.61it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=376.693, player_2/loss=257.213, rew=18.75]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.05it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=336.592, player_2/loss=198.472, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 335.62it/s, env_step=4096, len=8, n/ep=6, n/st=64, player_1/loss=326.682, player_2/loss=136.421, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 338.25it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=321.688, player_2/loss=134.846, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 338.76it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=323.288, player_2/loss=159.438, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.52it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=319.412, player_2/loss=168.476, rew=18.75]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 339.86it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=388.912, player_2/loss=122.242, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 334.40it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=459.323, player_2/loss=144.811, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.75it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=406.738, player_2/loss=90.862, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 340.24it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=404.432, player_2/loss=64.466, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 339.78it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=360.494, player_2/loss=77.798, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 340.75it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=295.324, player_2/loss=81.964, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 334.26it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=355.156, player_2/loss=58.393, rew=18.75]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 341.19it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=422.242, player_2/loss=58.687, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 341.12it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=383.000, player_2/loss=83.320, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.55it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=365.443, player_2/loss=70.823, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 338.37it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=400.777, player_2/loss=48.699, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 334.81it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=425.892, player_2/loss=49.417, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 336.95it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=261.351, player_2/loss=60.793, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.37it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=264.895, player_2/loss=49.110, rew=-10.71]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 349.60it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=228.298, player_2/loss=306.854, rew=19.44]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 338.44it/s, env_step=4096, len=8, n/ep=9, n/st=64, player_1/loss=171.293, player_2/loss=561.369, rew=-2.78]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 335.06it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=111.642, player_2/loss=627.083, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 335.65it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=85.315, rew=18.75]           


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 334.82it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=94.502, player_2/loss=690.602, rew=13.89]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 338.84it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=52.748, player_2/loss=695.017, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 335.14it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=92.617, player_2/loss=563.168, rew=18.75]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 337.43it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=156.934, player_2/loss=559.561, rew=19.44]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 335.26it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=106.379, player_2/loss=528.328, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 339.26it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=105.246, player_2/loss=499.326, rew=13.89]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 337.97it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=109.201, player_2/loss=437.200, rew=18.75]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 334.92it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=65.153, player_2/loss=573.791, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 338.10it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=77.827, player_2/loss=700.440, rew=19.44]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 338.54it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=70.664, player_2/loss=657.109, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 336.73it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=51.645, player_2/loss=676.468, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 336.18it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=107.107, player_2/loss=523.486, rew=13.89]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 333.96it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=97.063, player_2/loss=479.077, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 336.46it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=256.485, player_2/loss=152.229, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 339.33it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=230.939, player_2/loss=114.498, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 339.11it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=203.793, player_2/loss=113.286, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 335.94it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=242.005, player_2/loss=119.320, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 338.97it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=239.585, player_2/loss=75.724, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 341.17it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=227.764, player_2/loss=111.735, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 337.82it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=224.355, player_2/loss=83.723, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.91it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=225.191, player_2/loss=64.772, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 334.96it/s, env_step=9216, len=10, n/ep=5, n/st=64, player_1/loss=199.018, player_2/loss=51.533, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 339.36it/s, env_step=10240, len=10, n/ep=7, n/st=64, player_1/loss=162.855, player_2/loss=63.503, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 334.87it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=210.823, player_2/loss=49.390, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 338.07it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=266.354, player_2/loss=38.589, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 339.62it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=255.947, player_2/loss=57.778, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 334.65it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=241.472, player_2/loss=61.901, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 339.51it/s, env_step=15360, len=15, n/ep=5, n/st=64, player_1/loss=226.603, player_2/loss=74.488, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 337.50it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=241.070, player_2/loss=55.876, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 338.10it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_2/loss=47.234, rew=16.67]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 335.70it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=190.327, player_2/loss=47.557, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 341.24it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=208.481, player_2/loss=76.580, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 338.57it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=235.279, player_2/loss=154.458, rew=16.67]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 338.27it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=180.303, player_2/loss=210.337, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.02it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=72.572, player_2/loss=221.804, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 345.70it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=83.252, player_2/loss=173.531, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 336.12it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=92.165, player_2/loss=180.153, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 340.94it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=106.133, player_2/loss=192.320, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.11it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=89.875, player_2/loss=193.688, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 337.26it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=68.706, player_2/loss=199.304, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 333.97it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=91.599, player_2/loss=197.001, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 337.31it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=57.564, player_2/loss=209.841, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 339.45it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=84.046, player_2/loss=179.951, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 337.90it/s, env_step=12288, len=10, n/ep=7, n/st=64, player_1/loss=99.238, player_2/loss=157.372, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 334.66it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=63.041, player_2/loss=173.053, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 337.83it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=47.437, player_2/loss=223.204, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 339.52it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=12.952, player_2/loss=245.361, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.75it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=18.052, player_2/loss=236.054, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 338.40it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=39.977, player_2/loss=222.217, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 336.33it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=62.611, player_2/loss=179.834, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 336.19it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=37.589, player_2/loss=188.059, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 338.45it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=54.803, player_2/loss=303.651, rew=15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.28it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=187.717, player_2/loss=311.655, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 340.99it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=428.821, player_2/loss=277.191, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 334.49it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=608.928, player_2/loss=201.482, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 337.71it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=469.583, player_2/loss=191.142, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 336.48it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=483.937, player_2/loss=169.492, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 338.31it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=598.462, player_2/loss=137.109, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 333.54it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=654.074, player_2/loss=66.174, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 339.11it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=501.312, player_2/loss=51.815, rew=18.75]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 339.19it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=441.838, player_2/loss=76.525, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 337.37it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=486.832, player_2/loss=112.538, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 338.55it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=537.539, player_2/loss=111.726, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 334.22it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=676.599, player_2/loss=52.887, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 339.73it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=668.549, player_2/loss=17.405, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 337.44it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=605.656, player_2/loss=13.676, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 339.18it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=570.805, player_2/loss=25.627, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 337.45it/s, env_step=17408, len=10, n/ep=5, n/st=64, player_1/loss=564.711, player_2/loss=101.986, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 334.53it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=539.384, player_2/loss=117.324, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 338.61it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=483.975, player_2/loss=48.139, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 338.34it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=340.324, player_2/loss=71.153, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.64it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=197.113, player_2/loss=126.583, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 334.87it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=96.036, player_2/loss=148.747, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.75it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=80.541, player_2/loss=130.904, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 343.65it/s, env_step=5120, len=21, n/ep=3, n/st=64, player_1/loss=65.051, player_2/loss=149.711, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 341.93it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=57.618, player_2/loss=190.708, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.37it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=49.601, player_2/loss=196.575, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.28it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=41.448, player_2/loss=190.765, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 337.17it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=22.998, rew=15.00]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.53it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=40.211, player_2/loss=232.671, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 340.66it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=45.941, player_2/loss=225.932, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 338.68it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=50.841, player_2/loss=248.326, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 335.35it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=74.796, player_2/loss=257.452, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.33it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=56.972, player_2/loss=249.200, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 339.37it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=18.795, player_2/loss=306.869, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 338.56it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=11.344, player_2/loss=344.390, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 332.49it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=32.206, player_2/loss=311.155, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 339.80it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=32.870, player_2/loss=226.142, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 340.17it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=9.818, player_2/loss=238.198, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 336.93it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=9.870, player_2/loss=230.340, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.69it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=16.620, player_2/loss=196.120, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.27it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=24.008, player_2/loss=153.173, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.42it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=37.230, player_2/loss=120.006, rew=-15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.87it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=49.845, player_2/loss=138.819, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.92it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=26.359, rew=-25.00]         


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 336.12it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=48.130, player_2/loss=85.573, rew=16.67]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 332.84it/s, env_step=8192, len=10, n/ep=5, n/st=64, player_1/loss=128.570, player_2/loss=66.993, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:03, 337.32it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=167.752, player_2/loss=55.689, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:03, 338.99it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=171.835, player_2/loss=86.673, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:03, 338.61it/s, env_step=11264, len=10, n/ep=5, n/st=64, player_1/loss=169.063, player_2/loss=67.751, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:03, 334.69it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=128.886, player_2/loss=59.398, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:03, 341.25it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=164.018, player_2/loss=47.583, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:03, 336.25it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=170.446, player_2/loss=40.614, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:03, 336.43it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=187.363, player_2/loss=52.884, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:03, 337.67it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=184.357, player_2/loss=52.984, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:03, 333.93it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=173.488, rew=25.00]       


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:03, 338.20it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=175.375, player_2/loss=67.624, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:03, 338.74it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=171.544, player_2/loss=65.004, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:03, 339.33it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=138.731, player_2/loss=141.019, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 336.46it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=91.996, player_2/loss=298.243, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.32it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=53.540, player_2/loss=408.278, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 337.41it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=104.237, player_2/loss=345.067, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 335.37it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=83.595, player_2/loss=407.812, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 340.37it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=32.680, player_2/loss=425.274, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.03it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=27.314, player_2/loss=375.832, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 337.34it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=111.954, player_2/loss=362.453, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 334.90it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=60.346, player_2/loss=320.927, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 335.91it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=12.637, player_2/loss=364.431, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 338.41it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=34.613, player_2/loss=353.835, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 335.48it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=40.649, player_2/loss=413.079, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 338.85it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=24.151, player_2/loss=437.967, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 331.92it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=25.521, player_2/loss=413.904, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 335.79it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=61.973, player_2/loss=380.834, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 332.87it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=74.417, player_2/loss=357.248, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.06it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=25.453, player_2/loss=397.688, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 337.77it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=9.097, player_2/loss=349.614, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 337.83it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=30.632, player_2/loss=388.526, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 334.63it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=134.951, player_2/loss=238.916, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 340.64it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=201.219, rew=25.00]         


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 339.18it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=243.081, player_2/loss=89.322, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 338.66it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=203.950, player_2/loss=82.577, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.35it/s, env_step=5120, len=19, n/ep=2, n/st=64, player_1/loss=139.223, player_2/loss=135.886, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 334.82it/s, env_step=6144, len=29, n/ep=3, n/st=64, player_1/loss=114.777, player_2/loss=157.495, rew=33.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 337.38it/s, env_step=7168, len=20, n/ep=2, n/st=64, player_1/loss=178.971, rew=0.00]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 332.80it/s, env_step=8192, len=27, n/ep=2, n/st=64, player_1/loss=179.438, player_2/loss=122.180, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 330.88it/s, env_step=9216, len=30, n/ep=2, n/st=64, player_1/loss=122.779, player_2/loss=111.895, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.29it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_2/loss=85.827, rew=-25.00]       


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 337.19it/s, env_step=11264, len=26, n/ep=2, n/st=64, player_1/loss=195.225, rew=0.00]        


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 340.27it/s, env_step=12288, len=31, n/ep=2, n/st=64, player_1/loss=216.130, player_2/loss=95.707, rew=0.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 336.86it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=153.601, player_2/loss=81.021, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 339.83it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=140.858, player_2/loss=69.261, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 334.65it/s, env_step=15360, len=23, n/ep=2, n/st=64, player_1/loss=147.611, player_2/loss=57.283, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.60it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=194.923, player_2/loss=47.654, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 338.04it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=211.672, player_2/loss=55.859, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 340.20it/s, env_step=18432, len=20, n/ep=4, n/st=64, player_1/loss=187.140, player_2/loss=58.508, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 337.64it/s, env_step=19456, len=17, n/ep=5, n/st=64, player_1/loss=189.485, rew=-25.00]      


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 335.82it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=138.735, player_2/loss=160.926, rew=5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 339.94it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=107.902, player_2/loss=244.744, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 339.79it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=125.816, player_2/loss=232.231, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 337.68it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=128.438, player_2/loss=141.808, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.11it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=106.582, player_2/loss=117.804, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 334.54it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=91.975, player_2/loss=103.751, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 344.98it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=84.236, player_2/loss=150.360, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.08it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=117.937, player_2/loss=220.953, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 337.23it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=97.277, player_2/loss=236.568, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 333.06it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=48.834, player_2/loss=208.373, rew=16.67]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 338.29it/s, env_step=11264, len=9, n/ep=6, n/st=64, player_1/loss=62.426, player_2/loss=194.573, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 335.15it/s, env_step=12288, len=10, n/ep=7, n/st=64, player_1/loss=101.823, player_2/loss=176.135, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 336.11it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=86.475, player_2/loss=184.675, rew=16.67]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 339.01it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=74.753, player_2/loss=201.200, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 335.43it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=62.271, player_2/loss=192.591, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 338.48it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=66.585, player_2/loss=164.176, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 337.27it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=49.725, player_2/loss=157.192, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 337.98it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=19.919, player_2/loss=159.570, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 335.40it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=63.648, player_2/loss=168.540, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 340.24it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=83.146, player_2/loss=116.290, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 336.34it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=118.984, player_2/loss=80.736, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 340.17it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=138.299, player_2/loss=86.723, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 338.89it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=153.551, player_2/loss=64.584, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 335.83it/s, env_step=5120, len=27, n/ep=2, n/st=64, player_1/loss=152.864, player_2/loss=23.806, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 338.81it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=138.906, player_2/loss=53.496, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 337.02it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=118.671, player_2/loss=56.650, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 339.44it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=109.149, player_2/loss=37.658, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 329.16it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=134.772, player_2/loss=68.913, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 331.57it/s, env_step=10240, len=26, n/ep=2, n/st=64, player_1/loss=187.496, player_2/loss=67.473, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 336.94it/s, env_step=11264, len=20, n/ep=4, n/st=64, player_1/loss=165.049, player_2/loss=77.687, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 336.79it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=129.311, player_2/loss=93.803, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 333.48it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=134.750, player_2/loss=99.089, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 334.18it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=167.883, player_2/loss=135.325, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 336.30it/s, env_step=15360, len=16, n/ep=5, n/st=64, player_1/loss=139.487, player_2/loss=106.365, rew=-5.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 342.86it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=135.338, player_2/loss=98.554, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 340.52it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=150.953, player_2/loss=159.385, rew=-12.50]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 337.28it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=133.275, player_2/loss=207.898, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 333.28it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=163.954, player_2/loss=207.875, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 320.87it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=196.672, player_2/loss=207.581, rew=-17.86]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.24it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=186.272, player_2/loss=233.365, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 335.41it/s, env_step=3072, len=9, n/ep=6, n/st=64, player_1/loss=197.666, player_2/loss=243.028, rew=16.67]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 335.49it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=128.291, player_2/loss=255.520, rew=10.71]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 331.24it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=96.094, player_2/loss=280.145, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 336.78it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=142.933, player_2/loss=206.725, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 339.54it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=113.065, player_2/loss=201.427, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 348.00it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=76.609, player_2/loss=278.822, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 335.43it/s, env_step=9216, len=10, n/ep=5, n/st=64, player_1/loss=90.914, player_2/loss=286.898, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 336.94it/s, env_step=10240, len=10, n/ep=4, n/st=64, player_1/loss=46.072, player_2/loss=266.278, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 336.16it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=43.856, player_2/loss=297.467, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 337.42it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=57.508, player_2/loss=256.780, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 337.44it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=42.658, player_2/loss=233.458, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 336.33it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=38.005, player_2/loss=213.623, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 336.74it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=41.846, player_2/loss=223.572, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 336.95it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=45.316, player_2/loss=202.614, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 338.60it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=47.824, player_2/loss=189.264, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 338.89it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_1/loss=31.993, player_2/loss=207.476, rew=16.67]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 334.99it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=18.985, player_2/loss=248.810, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 337.43it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=21.047, player_2/loss=233.018, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.33it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=39.694, player_2/loss=218.093, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.24it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=101.753, player_2/loss=189.529, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 336.69it/s, env_step=4096, len=12, n/ep=3, n/st=64, player_1/loss=170.725, player_2/loss=182.284, rew=-8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 336.68it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=175.444, player_2/loss=169.400, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 337.19it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=126.319, player_2/loss=117.799, rew=-5.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 336.53it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=118.555, player_2/loss=98.625, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 338.99it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=127.180, player_2/loss=117.249, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 336.25it/s, env_step=9216, len=16, n/ep=5, n/st=64, player_1/loss=118.926, player_2/loss=112.727, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 338.05it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=115.894, player_2/loss=108.193, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 339.37it/s, env_step=11264, len=21, n/ep=3, n/st=64, player_1/loss=141.585, player_2/loss=94.046, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 338.08it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=153.508, player_2/loss=82.849, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 338.57it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=138.055, player_2/loss=55.763, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 334.36it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=115.774, player_2/loss=35.836, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 337.33it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=110.890, player_2/loss=44.209, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 335.04it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=117.601, player_2/loss=48.206, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 337.20it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=123.377, player_2/loss=76.266, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 334.59it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=154.871, player_2/loss=63.078, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 337.91it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=147.551, player_2/loss=67.378, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 338.42it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=88.764, player_2/loss=85.945, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.48it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=109.684, player_2/loss=89.758, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.97it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=105.349, player_2/loss=93.022, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 334.64it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=88.655, player_2/loss=62.221, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.04it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=110.022, player_2/loss=86.726, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 339.97it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=118.628, player_2/loss=118.293, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.47it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=133.536, player_2/loss=139.030, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 338.01it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=129.619, rew=-12.50]        


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 344.38it/s, env_step=9216, len=27, n/ep=2, n/st=64, player_1/loss=109.704, player_2/loss=128.554, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 335.80it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=69.267, player_2/loss=128.446, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.69it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=50.257, player_2/loss=122.596, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 338.32it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=69.963, player_2/loss=158.294, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 333.56it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=79.300, player_2/loss=197.290, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 337.65it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=35.933, player_2/loss=187.167, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 335.94it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=12.708, player_2/loss=180.706, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 334.83it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=13.253, player_2/loss=187.086, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 337.69it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=62.578, player_2/loss=180.519, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 332.28it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=23.564, player_2/loss=215.320, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.18it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_2/loss=205.297, rew=25.00]       


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 337.25it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=27.801, player_2/loss=242.573, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.44it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=25.678, player_2/loss=163.687, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 331.61it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=31.362, player_2/loss=93.537, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 341.35it/s, env_step=4096, len=26, n/ep=3, n/st=64, player_1/loss=49.495, player_2/loss=75.190, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 341.54it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=72.989, player_2/loss=74.346, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 339.77it/s, env_step=6144, len=26, n/ep=2, n/st=64, player_1/loss=75.750, player_2/loss=82.444, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 335.43it/s, env_step=7168, len=28, n/ep=2, n/st=64, player_1/loss=90.119, player_2/loss=59.300, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 339.88it/s, env_step=8192, len=31, n/ep=2, n/st=64, player_1/loss=94.823, player_2/loss=73.311, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 339.37it/s, env_step=9216, len=31, n/ep=2, n/st=64, player_1/loss=62.711, player_2/loss=68.228, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.12it/s, env_step=10240, len=31, n/ep=2, n/st=64, player_1/loss=78.940, player_2/loss=51.161, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.75it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=101.902, player_2/loss=35.824, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #12: 1025it [00:03, 335.20it/s, env_step=12288, len=31, n/ep=2, n/st=64, player_1/loss=113.853, player_2/loss=54.804, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #13: 1025it [00:03, 338.45it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=100.721, player_2/loss=76.537, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #14: 1025it [00:03, 338.83it/s, env_step=14336, len=29, n/ep=2, n/st=64, player_1/loss=111.111, player_2/loss=70.467, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #15: 1025it [00:03, 341.28it/s, env_step=15360, len=31, n/ep=2, n/st=64, player_1/loss=99.463, player_2/loss=46.625, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #16: 1025it [00:03, 332.72it/s, env_step=16384, len=28, n/ep=2, n/st=64, player_1/loss=103.000, player_2/loss=36.865, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #17: 1025it [00:03, 340.25it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=124.993, player_2/loss=33.673, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #18: 1025it [00:03, 338.96it/s, env_step=18432, len=25, n/ep=2, n/st=64, player_1/loss=284.880, player_2/loss=270.389, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #19: 1025it [00:03, 337.07it/s, env_step=19456, len=28, n/ep=3, n/st=64, player_1/loss=264.868, player_2/loss=277.111, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #1: 1025it [00:03, 337.98it/s, env_step=1024, len=25, n/ep=3, n/st=64, player_1/loss=23.714, player_2/loss=44.355, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 333.50it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=44.015, player_2/loss=36.012, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.55it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=45.243, player_2/loss=64.207, rew=12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.65it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=94.790, player_2/loss=74.944, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.54it/s, env_step=5120, len=21, n/ep=4, n/st=64, player_1/loss=108.842, player_2/loss=66.994, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 334.72it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=47.744, player_2/loss=50.590, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.74it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=27.815, player_2/loss=89.824, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.80it/s, env_step=8192, len=23, n/ep=3, n/st=64, player_1/loss=74.698, player_2/loss=98.894, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 335.41it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=73.760, player_2/loss=76.492, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 348.23it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=21.704, player_2/loss=63.665, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 335.40it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=24.279, player_2/loss=74.712, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 338.83it/s, env_step=12288, len=12, n/ep=6, n/st=64, player_1/loss=21.435, player_2/loss=107.861, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 337.75it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_2/loss=151.222, rew=25.00]       


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 337.06it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=47.628, player_2/loss=127.504, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 338.78it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=40.560, player_2/loss=118.875, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 332.63it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=35.058, player_2/loss=115.434, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 336.66it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=42.447, player_2/loss=127.475, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 337.82it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=16.377, player_2/loss=133.378, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.34it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=20.421, player_2/loss=152.658, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 335.13it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=8.384, player_2/loss=132.048, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.48it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=9.834, player_2/loss=92.065, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.66it/s, env_step=3072, len=10, n/ep=6, n/st=64, player_1/loss=74.904, player_2/loss=89.120, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.31it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=81.918, player_2/loss=77.240, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 341.27it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=36.702, rew=-25.00]         


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 331.90it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=77.020, player_2/loss=93.622, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.84it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=67.088, player_2/loss=104.433, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.19it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=27.470, player_2/loss=53.480, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 340.07it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=58.109, player_2/loss=63.637, rew=-15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.00it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=136.874, player_2/loss=93.722, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 333.89it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=163.276, player_2/loss=79.261, rew=-16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.04it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=151.421, player_2/loss=81.615, rew=-15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 336.49it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=138.674, player_2/loss=99.062, rew=5.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 336.62it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=91.703, player_2/loss=85.697, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 335.90it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=77.805, player_2/loss=64.286, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 335.91it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=98.021, player_2/loss=72.319, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 336.45it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=154.366, player_2/loss=113.599, rew=-16.67]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.39it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=126.535, player_2/loss=105.655, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.13it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=215.231, player_2/loss=186.113, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 335.52it/s, env_step=1024, len=25, n/ep=2, n/st=64, player_1/loss=150.023, player_2/loss=66.428, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.98it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=168.398, player_2/loss=92.646, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.34it/s, env_step=3072, len=23, n/ep=3, n/st=64, player_1/loss=120.818, player_2/loss=95.333, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.82it/s, env_step=4096, len=26, n/ep=3, n/st=64, player_1/loss=72.959, player_2/loss=103.065, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 339.34it/s, env_step=5120, len=22, n/ep=2, n/st=64, player_1/loss=121.124, player_2/loss=86.181, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 333.56it/s, env_step=6144, len=34, n/ep=2, n/st=64, player_1/loss=139.084, player_2/loss=112.654, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.76it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=98.348, player_2/loss=125.778, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.11it/s, env_step=8192, len=31, n/ep=3, n/st=64, player_1/loss=86.810, player_2/loss=141.578, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 339.94it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=73.742, player_2/loss=129.617, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 339.19it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=86.389, player_2/loss=128.184, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 345.30it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=90.718, player_2/loss=250.395, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 335.39it/s, env_step=12288, len=32, n/ep=2, n/st=64, player_1/loss=106.079, player_2/loss=253.189, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 336.22it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=89.957, player_2/loss=111.119, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 337.19it/s, env_step=14336, len=21, n/ep=4, n/st=64, player_1/loss=51.974, player_2/loss=100.408, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 334.53it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=47.587, player_2/loss=101.044, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 324.72it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=90.198, player_2/loss=143.931, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 333.92it/s, env_step=17408, len=7, n/ep=7, n/st=64, player_1/loss=80.416, player_2/loss=189.621, rew=17.86]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.52it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=61.262, player_2/loss=145.384, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 321.72it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=44.018, player_2/loss=129.572, rew=13.89]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 331.00it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=139.255, player_2/loss=183.814, rew=17.86]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 337.75it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=202.042, player_2/loss=143.645, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 336.87it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=302.282, player_2/loss=83.051, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 322.41it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=313.676, player_2/loss=84.676, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 332.40it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=327.688, player_2/loss=87.380, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 333.29it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=291.301, player_2/loss=75.502, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 329.59it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=303.597, player_2/loss=64.804, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 328.59it/s, env_step=8192, len=10, n/ep=7, n/st=64, player_1/loss=293.962, player_2/loss=79.463, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 339.00it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=278.842, player_2/loss=55.085, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.94it/s, env_step=10240, len=10, n/ep=5, n/st=64, player_1/loss=271.607, player_2/loss=68.064, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 334.92it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=302.684, player_2/loss=48.839, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 338.29it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=271.975, player_2/loss=35.543, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 334.83it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=221.551, player_2/loss=15.439, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 337.78it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=242.032, player_2/loss=17.181, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 334.96it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=237.625, player_2/loss=14.807, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.51it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=278.118, player_2/loss=20.152, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 339.42it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=286.764, player_2/loss=22.747, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 337.45it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=210.983, player_2/loss=17.343, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 336.10it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=226.033, player_2/loss=32.842, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 334.48it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=283.992, player_2/loss=9.531, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.68it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=241.296, player_2/loss=27.816, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.07it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=142.909, player_2/loss=117.506, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 337.66it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=129.594, player_2/loss=186.031, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 338.73it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=133.274, player_2/loss=331.033, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 332.47it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=101.039, player_2/loss=282.238, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 338.54it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=105.581, player_2/loss=149.304, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 337.79it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=123.100, player_2/loss=149.477, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 337.51it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=90.751, player_2/loss=126.191, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 339.67it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=82.162, player_2/loss=125.339, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 335.43it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=100.557, player_2/loss=149.509, rew=8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 345.79it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=75.123, player_2/loss=115.064, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 336.89it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=57.864, player_2/loss=157.902, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 337.95it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=34.961, player_2/loss=200.385, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 333.55it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=24.479, player_2/loss=345.311, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 337.53it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=37.289, player_2/loss=338.949, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 337.01it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=31.865, player_2/loss=245.426, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 338.83it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=11.868, player_2/loss=195.569, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 336.86it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=56.986, player_2/loss=253.178, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 337.09it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=66.821, player_2/loss=171.551, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.61it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=41.228, player_2/loss=113.852, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.35it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=86.562, player_2/loss=97.154, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.25it/s, env_step=4096, len=13, n/ep=2, n/st=64, player_1/loss=122.120, player_2/loss=113.260, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.22it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=115.680, player_2/loss=119.802, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 334.47it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=89.509, player_2/loss=118.905, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:03, 336.95it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=133.062, player_2/loss=95.139, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:03, 337.16it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=143.635, player_2/loss=124.980, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:03, 336.76it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=90.996, player_2/loss=152.708, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:03, 337.49it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=79.161, player_2/loss=96.286, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:03, 338.96it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=154.424, player_2/loss=88.509, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 338.28it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=213.835, player_2/loss=52.521, rew=-8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:03, 333.41it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=160.320, player_2/loss=24.307, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:03, 336.42it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=102.184, player_2/loss=36.546, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:03, 333.45it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=87.848, player_2/loss=56.179, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:03, 337.13it/s, env_step=16384, len=16, n/ep=5, n/st=64, player_1/loss=94.913, player_2/loss=101.608, rew=-5.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:03, 337.07it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=153.979, player_2/loss=122.983, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:03, 336.78it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=191.837, player_2/loss=115.779, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:03, 337.99it/s, env_step=19456, len=25, n/ep=2, n/st=64, player_1/loss=201.705, player_2/loss=100.066, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:03, 336.64it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=96.243, player_2/loss=143.262, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.23it/s, env_step=2048, len=17, n/ep=3, n/st=64, player_1/loss=94.172, player_2/loss=148.238, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.05it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=61.337, player_2/loss=158.575, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.76it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=33.311, player_2/loss=192.941, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 333.66it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=34.288, player_2/loss=252.196, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 336.36it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=42.376, player_2/loss=294.422, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.90it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=42.254, player_2/loss=318.869, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 335.16it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=14.902, player_2/loss=270.305, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 337.94it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_2/loss=270.017, rew=19.44]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 332.55it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=14.111, player_2/loss=304.343, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 335.12it/s, env_step=11264, len=7, n/ep=7, n/st=64, player_1/loss=9.291, player_2/loss=291.658, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 335.67it/s, env_step=12288, len=10, n/ep=7, n/st=64, player_1/loss=39.151, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 348.92it/s, env_step=13312, len=7, n/ep=7, n/st=64, player_1/loss=8.791, player_2/loss=244.154, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 332.78it/s, env_step=14336, len=9, n/ep=8, n/st=64, player_1/loss=3.224, player_2/loss=275.197, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 335.18it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=4.046, rew=18.75]          


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 335.58it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=11.351, player_2/loss=261.399, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 335.60it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=8.589, player_2/loss=256.178, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 335.78it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=21.889, player_2/loss=275.027, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 331.92it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=25.191, player_2/loss=285.548, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 337.41it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=13.680, player_2/loss=250.336, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.15it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=23.307, player_2/loss=189.296, rew=-12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.19it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=31.448, player_2/loss=127.026, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 333.44it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=38.579, player_2/loss=95.777, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.07it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=56.000, player_2/loss=77.729, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.05it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=66.794, player_2/loss=90.953, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 338.64it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=72.760, player_2/loss=106.674, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.46it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=130.206, player_2/loss=117.103, rew=-19.44]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #9: 1025it [00:03, 335.98it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=284.303, player_2/loss=102.043, rew=16.67]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #10: 1025it [00:03, 337.56it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=302.920, player_2/loss=123.898, rew=-5.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #11: 1025it [00:03, 337.59it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=293.015, player_2/loss=173.900, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #12: 1025it [00:03, 336.95it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=341.458, player_2/loss=127.973, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #13: 1025it [00:03, 338.49it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=345.349, player_2/loss=81.285, rew=-5.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #14: 1025it [00:03, 333.85it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=273.493, player_2/loss=60.752, rew=5.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #15: 1025it [00:03, 338.30it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=323.294, player_2/loss=60.493, rew=5.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #16: 1025it [00:03, 338.87it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=350.251, player_2/loss=56.287, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #17: 1025it [00:03, 338.58it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=279.848, player_2/loss=37.532, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #18: 1025it [00:03, 338.32it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=294.315, player_2/loss=85.134, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #19: 1025it [00:03, 335.51it/s, env_step=19456, len=13, n/ep=4, n/st=64, player_1/loss=392.251, player_2/loss=125.660, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #8


Epoch #1: 1025it [00:03, 337.19it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=148.675, player_2/loss=123.861, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 333.59it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=56.928, player_2/loss=155.199, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.90it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=8.941, player_2/loss=218.591, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 332.52it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=9.118, player_2/loss=225.524, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 339.58it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=39.448, player_2/loss=216.330, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 340.49it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=40.094, player_2/loss=212.091, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 339.85it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=11.269, player_2/loss=188.254, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 339.57it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=9.178, player_2/loss=232.498, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 335.88it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_2/loss=179.251, rew=16.67]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.58it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=40.050, player_2/loss=161.635, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.55it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=14.991, player_2/loss=184.023, rew=-5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.10it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=39.157, player_2/loss=204.724, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 338.19it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=73.330, player_2/loss=215.467, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 345.63it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=43.113, player_2/loss=178.903, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 337.16it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=27.101, player_2/loss=217.000, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 336.32it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=36.343, player_2/loss=267.744, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 336.65it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=30.848, player_2/loss=230.298, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.28it/s, env_step=18432, len=10, n/ep=5, n/st=64, player_1/loss=27.204, player_2/loss=173.401, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 333.77it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=35.929, player_2/loss=186.354, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 335.23it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=145.440, player_2/loss=184.463, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.65it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=176.154, player_2/loss=151.158, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.43it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=221.715, player_2/loss=125.291, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 338.75it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=307.095, player_2/loss=105.036, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 334.84it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=366.229, player_2/loss=82.982, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 336.17it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=420.283, player_2/loss=91.969, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 338.29it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=421.478, player_2/loss=72.695, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 338.66it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=460.967, player_2/loss=48.327, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 334.02it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=396.506, player_2/loss=23.853, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 338.84it/s, env_step=10240, len=14, n/ep=5, n/st=64, player_1/loss=284.023, player_2/loss=20.581, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 339.39it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=354.526, player_2/loss=37.333, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 338.86it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=449.233, player_2/loss=41.739, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 336.74it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=385.169, player_2/loss=30.228, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 331.16it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=317.317, player_2/loss=58.840, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 337.71it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=404.574, player_2/loss=86.553, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 336.97it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=470.905, player_2/loss=61.706, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 338.17it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_2/loss=38.168, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 336.42it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=243.763, player_2/loss=28.454, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 336.25it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=294.531, player_2/loss=34.477, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 340.12it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=151.096, player_2/loss=106.347, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 337.01it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=108.642, player_2/loss=171.770, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.68it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=89.033, player_2/loss=244.975, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 338.65it/s, env_step=4096, len=23, n/ep=2, n/st=64, player_1/loss=93.571, player_2/loss=253.583, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 335.76it/s, env_step=5120, len=16, n/ep=5, n/st=64, player_1/loss=96.853, player_2/loss=188.689, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 336.57it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=105.280, player_2/loss=153.317, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 337.52it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=66.694, player_2/loss=164.893, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 338.40it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=53.626, player_2/loss=210.645, rew=12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 333.90it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=37.385, player_2/loss=225.378, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 337.91it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=22.877, player_2/loss=231.258, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 337.53it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=57.385, player_2/loss=236.653, rew=5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 337.35it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=72.277, player_2/loss=199.749, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 336.72it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=63.776, player_2/loss=177.824, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 333.92it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=102.896, player_2/loss=196.760, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 347.65it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=70.831, player_2/loss=208.119, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 338.16it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=53.309, player_2/loss=161.062, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.26it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=65.109, player_2/loss=161.629, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 336.32it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=55.921, player_2/loss=196.276, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 335.38it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=44.684, player_2/loss=219.309, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 334.93it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=62.019, player_2/loss=208.869, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 328.16it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=62.699, player_2/loss=156.649, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 339.44it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=73.022, player_2/loss=125.648, rew=12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 339.59it/s, env_step=4096, len=20, n/ep=4, n/st=64, player_1/loss=103.317, player_2/loss=76.433, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 334.79it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=140.689, player_2/loss=40.382, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 337.86it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=112.988, player_2/loss=76.621, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 335.67it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=111.638, player_2/loss=99.613, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 338.77it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=111.532, player_2/loss=82.264, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 335.46it/s, env_step=9216, len=23, n/ep=3, n/st=64, player_1/loss=81.718, player_2/loss=64.954, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 337.85it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_1/loss=105.866, player_2/loss=56.692, rew=8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 339.55it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=112.327, player_2/loss=58.483, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 337.09it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=101.188, player_2/loss=45.710, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 336.94it/s, env_step=13312, len=26, n/ep=2, n/st=64, player_1/loss=77.113, player_2/loss=51.729, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 334.80it/s, env_step=14336, len=27, n/ep=2, n/st=64, player_1/loss=72.249, player_2/loss=67.308, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 338.30it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=78.868, player_2/loss=64.725, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 337.29it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=143.066, player_2/loss=31.857, rew=12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 339.04it/s, env_step=17408, len=23, n/ep=3, n/st=64, player_1/loss=174.938, player_2/loss=108.037, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 332.40it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=112.824, player_2/loss=126.911, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 338.21it/s, env_step=19456, len=30, n/ep=2, n/st=64, player_1/loss=75.078, player_2/loss=73.774, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 336.77it/s, env_step=1024, len=28, n/ep=2, n/st=64, player_1/loss=146.441, player_2/loss=47.891, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.18it/s, env_step=2048, len=32, n/ep=2, n/st=64, player_1/loss=121.543, player_2/loss=55.140, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.07it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=90.741, player_2/loss=84.006, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 331.80it/s, env_step=4096, len=25, n/ep=2, n/st=64, player_1/loss=107.466, player_2/loss=112.942, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.16it/s, env_step=5120, len=25, n/ep=3, n/st=64, player_1/loss=103.617, player_2/loss=101.632, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 339.04it/s, env_step=6144, len=23, n/ep=3, n/st=64, player_1/loss=93.837, player_2/loss=75.769, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.63it/s, env_step=7168, len=25, n/ep=3, n/st=64, player_1/loss=71.075, player_2/loss=75.009, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.00it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=65.840, player_2/loss=85.811, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 334.98it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=79.693, player_2/loss=113.258, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 334.34it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=82.973, player_2/loss=150.466, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.63it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=44.432, player_2/loss=152.544, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 336.12it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=48.942, player_2/loss=152.744, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 334.34it/s, env_step=13312, len=26, n/ep=3, n/st=64, player_1/loss=69.290, player_2/loss=174.590, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 335.40it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=108.799, player_2/loss=156.158, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 344.36it/s, env_step=15360, len=16, n/ep=3, n/st=64, player_1/loss=116.299, player_2/loss=112.695, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 340.74it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=78.685, player_2/loss=96.653, rew=-8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 335.95it/s, env_step=17408, len=25, n/ep=3, n/st=64, player_1/loss=75.727, player_2/loss=84.220, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 332.42it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=75.254, player_2/loss=56.257, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.98it/s, env_step=19456, len=26, n/ep=2, n/st=64, player_1/loss=78.811, player_2/loss=66.436, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 336.89it/s, env_step=1024, len=27, n/ep=2, n/st=64, player_1/loss=59.923, player_2/loss=62.248, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.63it/s, env_step=2048, len=29, n/ep=2, n/st=64, player_1/loss=90.470, player_2/loss=83.151, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.32it/s, env_step=3072, len=24, n/ep=2, n/st=64, player_1/loss=91.938, player_2/loss=102.755, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 324.45it/s, env_step=4096, len=34, n/ep=2, n/st=64, player_1/loss=117.395, player_2/loss=86.625, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.17it/s, env_step=5120, len=24, n/ep=2, n/st=64, player_1/loss=109.099, player_2/loss=69.804, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.09it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=108.312, player_2/loss=83.997, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 338.69it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=167.151, player_2/loss=136.221, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.86it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=223.636, player_2/loss=170.334, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 332.84it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=254.618, player_2/loss=122.995, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.22it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=328.742, player_2/loss=71.467, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.71it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=360.084, player_2/loss=68.646, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 334.51it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=370.789, player_2/loss=63.781, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 331.87it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=391.132, player_2/loss=116.834, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 335.95it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=386.102, player_2/loss=108.635, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 339.62it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=363.307, player_2/loss=31.612, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 335.36it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=269.686, player_2/loss=58.404, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 336.42it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=251.467, player_2/loss=82.368, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 331.84it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=298.909, player_2/loss=51.373, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.68it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=310.233, player_2/loss=98.827, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 336.92it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=298.654, player_2/loss=105.911, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 335.56it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=270.348, player_2/loss=94.711, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.26it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=221.661, player_2/loss=94.383, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 331.30it/s, env_step=4096, len=10, n/ep=7, n/st=64, player_1/loss=168.921, player_2/loss=225.101, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 338.27it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=108.412, player_2/loss=465.160, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 333.93it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=62.894, player_2/loss=543.330, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 334.60it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=58.138, player_2/loss=438.315, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 329.08it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=40.836, player_2/loss=447.776, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 334.56it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=22.123, player_2/loss=460.660, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 335.76it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=45.015, player_2/loss=424.256, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 337.47it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=52.386, player_2/loss=409.826, rew=-10.71]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 336.05it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=53.517, player_2/loss=443.237, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 333.22it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=25.547, player_2/loss=417.196, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 336.81it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=39.888, player_2/loss=325.026, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 336.25it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=67.192, player_2/loss=316.501, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 345.62it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=58.988, player_2/loss=342.091, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 335.06it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=26.282, player_2/loss=389.768, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 334.65it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=47.874, player_2/loss=368.242, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 335.71it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=69.669, player_2/loss=331.729, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 336.89it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=88.689, player_2/loss=319.400, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 334.08it/s, env_step=2048, len=31, n/ep=3, n/st=64, player_1/loss=116.557, player_2/loss=208.911, rew=50.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 338.90it/s, env_step=3072, len=24, n/ep=3, n/st=64, player_1/loss=142.007, player_2/loss=142.708, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 340.24it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=84.000, player_2/loss=74.885, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 338.01it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=95.603, player_2/loss=68.139, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 338.20it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=94.038, player_2/loss=78.508, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 332.52it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=128.786, player_2/loss=103.161, rew=-5.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 339.47it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=141.294, player_2/loss=106.319, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 337.14it/s, env_step=9216, len=16, n/ep=3, n/st=64, player_1/loss=119.301, player_2/loss=119.651, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 338.19it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=110.709, player_2/loss=105.973, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 333.77it/s, env_step=11264, len=30, n/ep=2, n/st=64, player_1/loss=123.582, player_2/loss=55.639, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 337.58it/s, env_step=12288, len=19, n/ep=2, n/st=64, player_1/loss=81.887, player_2/loss=75.827, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 341.88it/s, env_step=13312, len=42, n/ep=2, n/st=64, player_1/loss=143.768, player_2/loss=135.467, rew=100.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 336.30it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=165.917, player_2/loss=120.124, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 337.61it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=122.101, player_2/loss=97.260, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 333.41it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=91.528, player_2/loss=90.996, rew=-8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 336.99it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=69.300, player_2/loss=61.146, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 338.07it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=84.044, player_2/loss=78.558, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 336.15it/s, env_step=19456, len=20, n/ep=4, n/st=64, player_1/loss=85.555, player_2/loss=67.696, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 336.02it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=96.694, player_2/loss=82.246, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 332.80it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=105.155, player_2/loss=87.227, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.66it/s, env_step=3072, len=26, n/ep=2, n/st=64, player_1/loss=74.342, player_2/loss=83.874, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.15it/s, env_step=4096, len=23, n/ep=2, n/st=64, player_1/loss=54.753, player_2/loss=160.440, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 335.84it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=95.251, player_2/loss=152.289, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 330.87it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=94.650, rew=-8.33]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 338.21it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=108.442, player_2/loss=56.615, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 340.32it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=78.456, player_2/loss=70.793, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 336.07it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=69.760, player_2/loss=95.531, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 339.31it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=93.390, player_2/loss=108.278, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 332.44it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=132.374, player_2/loss=183.904, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 334.64it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=139.844, player_2/loss=283.779, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 336.19it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=111.980, player_2/loss=336.996, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 334.56it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=98.193, player_2/loss=343.302, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 335.75it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=46.105, player_2/loss=367.362, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 332.21it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=36.822, player_2/loss=357.753, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 349.71it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=13.013, player_2/loss=320.389, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 334.23it/s, env_step=18432, len=7, n/ep=5, n/st=64, player_1/loss=26.195, player_2/loss=287.418, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.73it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=24.283, player_2/loss=268.619, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 331.67it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=5.585, player_2/loss=319.021, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.99it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=8.757, player_2/loss=322.566, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.50it/s, env_step=3072, len=7, n/ep=10, n/st=64, player_1/loss=21.086, player_2/loss=299.729, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.39it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=33.115, player_2/loss=298.961, rew=-19.44]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.38it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=54.327, player_2/loss=264.750, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 331.60it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=38.994, player_2/loss=238.606, rew=-17.86]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 336.24it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=53.569, player_2/loss=217.372, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.65it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=112.497, player_2/loss=208.558, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 336.06it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=69.696, player_2/loss=158.607, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.10it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=74.519, player_2/loss=110.229, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #11: 1025it [00:03, 334.17it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=91.145, player_2/loss=121.000, rew=-16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #12: 1025it [00:03, 339.35it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=71.792, player_2/loss=129.379, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #13: 1025it [00:03, 338.63it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=144.498, player_2/loss=90.857, rew=-5.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #14: 1025it [00:03, 337.93it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=151.806, player_2/loss=86.614, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #15: 1025it [00:03, 337.05it/s, env_step=15360, len=21, n/ep=4, n/st=64, player_1/loss=119.935, player_2/loss=117.462, rew=0.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #16: 1025it [00:03, 333.02it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=148.838, player_2/loss=140.435, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #17: 1025it [00:03, 336.76it/s, env_step=17408, len=23, n/ep=2, n/st=64, player_1/loss=153.876, player_2/loss=112.954, rew=0.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #18: 1025it [00:03, 335.31it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=165.702, rew=25.00]       


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #19: 1025it [00:03, 337.10it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=200.940, player_2/loss=141.476, rew=16.67]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #10


Epoch #1: 1025it [00:03, 334.40it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=42.028, player_2/loss=267.749, rew=18.75]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.33it/s, env_step=2048, len=7, n/ep=10, n/st=64, player_1/loss=64.704, player_2/loss=282.684, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 334.14it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=88.679, player_2/loss=277.778, rew=18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.42it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=105.249, player_2/loss=283.702, rew=18.75]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 334.66it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=132.343, player_2/loss=263.597, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 326.75it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=117.254, player_2/loss=261.860, rew=2.78]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.82it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=76.748, player_2/loss=263.261, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.39it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=108.404, player_2/loss=262.078, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 336.48it/s, env_step=9216, len=7, n/ep=10, n/st=64, player_1/loss=91.075, player_2/loss=283.713, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 334.02it/s, env_step=10240, len=10, n/ep=8, n/st=64, player_1/loss=68.196, player_2/loss=267.675, rew=6.25]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 333.28it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=91.642, player_2/loss=264.914, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 336.20it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=102.753, player_2/loss=260.245, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 336.29it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=96.963, player_2/loss=229.789, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 335.21it/s, env_step=14336, len=7, n/ep=10, n/st=64, player_1/loss=82.188, player_2/loss=236.664, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 337.84it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=17.084, player_2/loss=238.828, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 331.84it/s, env_step=16384, len=8, n/ep=9, n/st=64, player_1/loss=42.033, player_2/loss=247.040, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 335.95it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=39.680, player_2/loss=251.726, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 346.94it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=20.910, player_2/loss=280.990, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 332.57it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=22.080, player_2/loss=283.797, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 336.28it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=281.100, player_2/loss=168.539, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 333.83it/s, env_step=2048, len=12, n/ep=4, n/st=64, player_1/loss=260.078, player_2/loss=128.551, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 336.73it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=239.231, player_2/loss=96.946, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 339.25it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=178.855, player_2/loss=108.741, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 338.87it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=98.443, player_2/loss=103.094, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 333.18it/s, env_step=6144, len=25, n/ep=3, n/st=64, player_1/loss=89.008, player_2/loss=74.429, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.29it/s, env_step=7168, len=37, n/ep=1, n/st=64, player_1/loss=111.724, player_2/loss=64.240, rew=-25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 337.43it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=154.343, player_2/loss=106.777, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 338.84it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=178.115, player_2/loss=158.648, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 336.11it/s, env_step=10240, len=26, n/ep=3, n/st=64, player_1/loss=195.805, player_2/loss=136.633, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 334.60it/s, env_step=11264, len=19, n/ep=4, n/st=64, player_1/loss=95.045, player_2/loss=121.044, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 337.64it/s, env_step=12288, len=25, n/ep=2, n/st=64, player_1/loss=119.155, player_2/loss=328.009, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 337.89it/s, env_step=13312, len=26, n/ep=3, n/st=64, player_1/loss=148.603, player_2/loss=338.318, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 336.75it/s, env_step=14336, len=12, n/ep=4, n/st=64, player_1/loss=131.810, player_2/loss=81.796, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 338.71it/s, env_step=15360, len=10, n/ep=5, n/st=64, player_1/loss=226.959, player_2/loss=103.385, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.88it/s, env_step=16384, len=24, n/ep=2, n/st=64, player_1/loss=218.027, player_2/loss=125.150, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 338.59it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=156.626, rew=-25.00]      


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 337.01it/s, env_step=18432, len=22, n/ep=3, n/st=64, player_1/loss=161.574, player_2/loss=123.898, rew=-8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 337.89it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=158.213, player_2/loss=145.343, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 333.10it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=199.588, player_2/loss=107.608, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.30it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=155.645, player_2/loss=112.377, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.02it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=103.628, player_2/loss=96.196, rew=0.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.42it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=66.194, player_2/loss=54.998, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 335.63it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=59.432, player_2/loss=86.284, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 334.66it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=57.927, rew=25.00]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.69it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=28.856, player_2/loss=171.667, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 335.99it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=17.771, player_2/loss=147.774, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 336.05it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=13.881, player_2/loss=120.925, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.79it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=13.149, player_2/loss=104.646, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 331.69it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=10.251, player_2/loss=93.828, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 336.62it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=34.719, player_2/loss=105.488, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 337.79it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=35.497, player_2/loss=70.665, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 335.56it/s, env_step=14336, len=15, n/ep=5, n/st=64, player_1/loss=18.143, player_2/loss=80.475, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 338.23it/s, env_step=15360, len=18, n/ep=4, n/st=64, player_1/loss=10.998, player_2/loss=94.575, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 333.13it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=6.118, player_2/loss=86.165, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 337.38it/s, env_step=17408, len=13, n/ep=4, n/st=64, player_1/loss=12.330, player_2/loss=92.675, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 335.53it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=12.334, player_2/loss=84.205, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 344.00it/s, env_step=19456, len=15, n/ep=5, n/st=64, player_1/loss=9.652, player_2/loss=81.606, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 334.52it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=19.066, player_2/loss=133.647, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 332.50it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=32.743, player_2/loss=86.971, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.82it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=61.144, player_2/loss=62.157, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.37it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=42.512, player_2/loss=65.119, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 335.57it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=15.482, player_2/loss=39.852, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.59it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=99.319, player_2/loss=93.657, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:03, 334.21it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=206.632, player_2/loss=138.114, rew=15.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:03, 336.53it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=194.649, player_2/loss=177.150, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:03, 336.77it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=165.124, player_2/loss=138.732, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:03, 337.72it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=144.206, player_2/loss=161.321, rew=15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:03, 333.29it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=153.150, player_2/loss=169.653, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 336.08it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=147.856, player_2/loss=145.636, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:03, 335.94it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=139.885, player_2/loss=134.416, rew=-5.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:03, 337.69it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=133.855, player_2/loss=135.125, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:03, 335.57it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=187.450, player_2/loss=99.765, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:03, 330.87it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=181.152, player_2/loss=80.846, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:03, 337.21it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=165.094, player_2/loss=78.472, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:03, 337.09it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=244.481, player_2/loss=49.152, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:03, 336.35it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=340.355, player_2/loss=32.930, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:03, 336.16it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=184.091, player_2/loss=66.896, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 332.58it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=154.605, player_2/loss=82.783, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 335.37it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=120.417, player_2/loss=142.740, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 339.31it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=95.129, player_2/loss=147.095, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 336.50it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_2/loss=134.838, rew=8.33]          


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 338.59it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=46.124, player_2/loss=188.228, rew=8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 335.51it/s, env_step=7168, len=20, n/ep=4, n/st=64, player_1/loss=69.756, player_2/loss=185.180, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 336.72it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=50.144, player_2/loss=175.667, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 337.91it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=33.466, rew=25.00]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 337.86it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=23.718, player_2/loss=126.372, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 333.32it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=22.161, player_2/loss=99.625, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 336.53it/s, env_step=12288, len=15, n/ep=5, n/st=64, player_1/loss=12.759, player_2/loss=108.096, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 336.34it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=12.705, player_2/loss=151.809, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 335.56it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=44.591, player_2/loss=148.865, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 336.55it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=62.814, player_2/loss=131.717, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 333.12it/s, env_step=16384, len=15, n/ep=5, n/st=64, player_1/loss=54.091, player_2/loss=157.578, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 335.29it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=60.853, player_2/loss=156.086, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 335.24it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=93.797, player_2/loss=190.433, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 339.22it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=108.970, player_2/loss=214.099, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 342.81it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=39.941, player_2/loss=161.244, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.29it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=26.337, player_2/loss=149.503, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.32it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=39.866, player_2/loss=110.892, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.29it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=46.208, player_2/loss=112.614, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:03, 335.94it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=34.591, player_2/loss=94.810, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 333.45it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=43.241, player_2/loss=104.124, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 336.82it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=84.321, player_2/loss=131.090, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 332.19it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=95.929, player_2/loss=154.488, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 339.42it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=100.519, player_2/loss=170.875, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:03, 337.03it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=69.340, player_2/loss=143.563, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 335.75it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=115.506, player_2/loss=127.674, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 335.45it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=172.117, player_2/loss=124.910, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 339.73it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=152.971, player_2/loss=160.344, rew=-5.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 337.78it/s, env_step=14336, len=14, n/ep=5, n/st=64, player_1/loss=121.069, player_2/loss=206.142, rew=-15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:03, 334.11it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=96.640, player_2/loss=181.057, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 338.67it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=65.605, player_2/loss=151.804, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:03, 339.01it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=52.655, player_2/loss=148.370, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 336.70it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=71.492, player_2/loss=137.212, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:03, 337.37it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=73.573, player_2/loss=111.726, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 331.02it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=76.400, player_2/loss=139.339, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.04it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=51.562, player_2/loss=119.028, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 336.04it/s, env_step=3072, len=13, n/ep=4, n/st=64, player_1/loss=22.373, player_2/loss=127.832, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 336.29it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=16.819, player_2/loss=128.372, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 336.29it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=18.270, player_2/loss=136.160, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 330.33it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=42.916, player_2/loss=135.040, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 336.38it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=61.672, player_2/loss=101.851, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 337.25it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=57.803, rew=25.00]          


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 335.54it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=16.312, player_2/loss=144.783, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 335.60it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=8.107, player_2/loss=150.910, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 333.18it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=34.692, player_2/loss=153.933, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 334.47it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=7.300, player_2/loss=149.829, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 337.92it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=26.406, player_2/loss=146.695, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 334.32it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=27.624, player_2/loss=150.704, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 333.98it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=25.409, player_2/loss=160.416, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 334.93it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=31.826, player_2/loss=171.012, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 335.00it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=8.657, player_2/loss=152.539, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 334.96it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=9.977, player_2/loss=121.627, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 337.69it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=12.491, player_2/loss=123.784, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 337.98it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=4.150, player_2/loss=101.584, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.35it/s, env_step=2048, len=11, n/ep=4, n/st=64, player_1/loss=9.851, player_2/loss=87.675, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.18it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=14.900, player_2/loss=73.595, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.18it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=53.897, player_2/loss=72.198, rew=-16.67]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.58it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=48.796, player_2/loss=64.731, rew=-16.67]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 333.38it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=6.850, player_2/loss=52.328, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 334.12it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=15.986, player_2/loss=58.466, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 338.14it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=46.256, player_2/loss=75.627, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 335.31it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=108.498, player_2/loss=96.617, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 335.44it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=180.235, rew=-25.00]       


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 332.85it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=184.179, player_2/loss=126.553, rew=5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #12: 1025it [00:03, 337.35it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=199.490, player_2/loss=111.073, rew=15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #13: 1025it [00:03, 337.75it/s, env_step=13312, len=20, n/ep=4, n/st=64, player_1/loss=184.389, player_2/loss=155.148, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #14: 1025it [00:03, 338.48it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=215.563, player_2/loss=166.160, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #15: 1025it [00:03, 333.19it/s, env_step=15360, len=12, n/ep=6, n/st=64, player_2/loss=99.135, rew=25.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #16: 1025it [00:03, 338.69it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=240.019, player_2/loss=77.995, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #17: 1025it [00:03, 339.26it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=298.794, player_2/loss=48.947, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #18: 1025it [00:03, 336.30it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=269.340, player_2/loss=81.969, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #19: 1025it [00:03, 339.32it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=273.848, player_2/loss=91.500, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #11


Epoch #1: 1025it [00:03, 332.89it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=287.648, player_2/loss=45.457, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 335.73it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=223.755, player_2/loss=92.081, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 335.40it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=167.649, player_2/loss=93.014, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 334.23it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=144.719, player_2/loss=71.958, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 333.18it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=175.988, player_2/loss=144.362, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 332.28it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=146.532, player_2/loss=276.009, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 335.18it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=81.165, player_2/loss=425.351, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.50it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=46.470, player_2/loss=417.106, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 326.54it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=79.378, player_2/loss=495.862, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 332.85it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=76.036, player_2/loss=465.536, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 333.20it/s, env_step=11264, len=9, n/ep=8, n/st=64, player_1/loss=37.889, player_2/loss=466.177, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.26it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=15.358, player_2/loss=370.987, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 337.58it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=19.188, player_2/loss=402.979, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 333.46it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=23.891, player_2/loss=437.394, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 337.66it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=9.556, player_2/loss=479.307, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 336.11it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=8.063, player_2/loss=482.062, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 337.76it/s, env_step=17408, len=10, n/ep=7, n/st=64, player_1/loss=9.324, player_2/loss=472.636, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 336.53it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=9.039, player_2/loss=478.891, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 330.60it/s, env_step=19456, len=10, n/ep=7, n/st=64, player_1/loss=20.268, player_2/loss=449.184, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 339.49it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=13.104, player_2/loss=223.358, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 344.87it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=37.564, player_2/loss=192.775, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 335.42it/s, env_step=3072, len=18, n/ep=3, n/st=64, player_1/loss=126.560, player_2/loss=181.654, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 333.44it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=255.029, player_2/loss=123.219, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 333.00it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=329.234, player_2/loss=45.168, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 337.51it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=272.066, player_2/loss=101.904, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 337.06it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=243.815, player_2/loss=100.838, rew=-5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 338.54it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=270.445, player_2/loss=61.594, rew=12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 334.59it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=306.810, player_2/loss=22.406, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 334.67it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=255.458, player_2/loss=97.750, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 335.71it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=225.157, player_2/loss=131.359, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 334.52it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=254.090, player_2/loss=68.241, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 336.54it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=277.413, player_2/loss=77.661, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 331.07it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=226.800, player_2/loss=98.902, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 338.34it/s, env_step=15360, len=13, n/ep=6, n/st=64, player_1/loss=263.573, player_2/loss=81.959, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 333.94it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=295.502, player_2/loss=57.981, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 338.94it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=234.097, player_2/loss=69.244, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 335.92it/s, env_step=18432, len=15, n/ep=5, n/st=64, player_1/loss=273.307, player_2/loss=30.373, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 332.88it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=231.344, player_2/loss=46.530, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 335.56it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=228.614, player_2/loss=106.865, rew=15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.60it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=211.851, player_2/loss=85.111, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 337.84it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=163.579, player_2/loss=154.529, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 336.23it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=96.040, player_2/loss=180.926, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 331.82it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=107.420, player_2/loss=123.128, rew=-15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 336.86it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=109.548, player_2/loss=75.923, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 335.81it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=46.543, player_2/loss=73.324, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 337.97it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=73.257, player_2/loss=86.700, rew=-12.50]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 334.88it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=103.258, player_2/loss=122.260, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 331.48it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=116.197, player_2/loss=208.214, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 336.34it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=121.135, player_2/loss=185.013, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 335.42it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=75.216, player_2/loss=162.293, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 337.32it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=42.945, player_2/loss=149.652, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 337.60it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=69.427, player_2/loss=129.771, rew=-12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 334.67it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=113.352, player_2/loss=175.456, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 334.04it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=98.609, player_2/loss=226.519, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 335.82it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=58.395, player_2/loss=271.403, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 334.88it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=64.734, player_2/loss=287.732, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 335.87it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_2/loss=326.698, rew=25.00]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 333.30it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=39.930, player_2/loss=264.409, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.27it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=85.480, player_2/loss=210.992, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 348.72it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=138.833, player_2/loss=212.058, rew=-18.75]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 339.89it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=140.811, player_2/loss=228.682, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 335.21it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=172.878, player_2/loss=195.121, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 334.92it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=230.829, player_2/loss=126.499, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 337.88it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=244.395, player_2/loss=106.842, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 335.45it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=228.846, player_2/loss=121.255, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 336.75it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=214.088, player_2/loss=121.333, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 333.16it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=262.913, player_2/loss=100.132, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 334.47it/s, env_step=11264, len=10, n/ep=7, n/st=64, player_1/loss=316.642, player_2/loss=57.408, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 336.00it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=321.123, player_2/loss=49.406, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 335.32it/s, env_step=13312, len=10, n/ep=5, n/st=64, player_1/loss=273.773, player_2/loss=43.788, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 337.83it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=291.597, player_2/loss=63.150, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 331.59it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=391.867, player_2/loss=43.602, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 337.99it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=424.798, player_2/loss=33.806, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 336.78it/s, env_step=17408, len=12, n/ep=4, n/st=64, player_1/loss=378.504, player_2/loss=29.731, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 336.50it/s, env_step=18432, len=12, n/ep=6, n/st=64, player_1/loss=345.282, player_2/loss=22.979, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 335.74it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=304.769, player_2/loss=12.047, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 333.76it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=178.723, player_2/loss=162.048, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.96it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=203.572, player_2/loss=150.126, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 338.62it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=191.419, player_2/loss=114.464, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 337.93it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=131.546, player_2/loss=70.824, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 338.40it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=110.314, player_2/loss=73.696, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 331.16it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=78.558, player_2/loss=186.665, rew=15.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 336.14it/s, env_step=7168, len=15, n/ep=5, n/st=64, player_1/loss=43.964, player_2/loss=288.471, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 334.32it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=20.055, player_2/loss=301.237, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 340.01it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=36.464, player_2/loss=350.995, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 334.73it/s, env_step=10240, len=17, n/ep=5, n/st=64, player_1/loss=51.378, player_2/loss=292.336, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 335.28it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=29.750, player_2/loss=292.715, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 336.54it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=41.890, player_2/loss=251.626, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 336.43it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=26.728, player_2/loss=254.999, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 336.28it/s, env_step=14336, len=32, n/ep=3, n/st=64, player_1/loss=16.553, player_2/loss=236.074, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 337.57it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_2/loss=156.694, rew=25.00]       


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 333.42it/s, env_step=16384, len=22, n/ep=3, n/st=64, player_1/loss=57.577, player_2/loss=116.080, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 336.27it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=59.000, player_2/loss=156.198, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 335.77it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=52.806, player_2/loss=177.949, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 336.89it/s, env_step=19456, len=12, n/ep=4, n/st=64, player_1/loss=52.134, player_2/loss=312.155, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 330.00it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=26.234, rew=-25.00]         


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.82it/s, env_step=2048, len=13, n/ep=3, n/st=64, player_1/loss=34.461, player_2/loss=140.469, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.01it/s, env_step=3072, len=25, n/ep=2, n/st=64, player_1/loss=51.399, player_2/loss=65.156, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 349.49it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=45.617, player_2/loss=58.306, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 338.41it/s, env_step=5120, len=25, n/ep=3, n/st=64, player_1/loss=46.160, player_2/loss=61.394, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 336.14it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=47.940, rew=-25.00]         


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 335.98it/s, env_step=7168, len=16, n/ep=3, n/st=64, player_1/loss=59.452, player_2/loss=94.380, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 339.31it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=60.843, player_2/loss=124.844, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 339.53it/s, env_step=9216, len=25, n/ep=3, n/st=64, player_1/loss=94.758, player_2/loss=102.188, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 337.26it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=76.793, player_2/loss=77.075, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 335.09it/s, env_step=11264, len=22, n/ep=2, n/st=64, player_1/loss=70.491, player_2/loss=38.580, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 337.90it/s, env_step=12288, len=24, n/ep=2, n/st=64, player_1/loss=39.229, player_2/loss=24.656, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 336.33it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=77.137, player_2/loss=34.968, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 336.41it/s, env_step=14336, len=25, n/ep=3, n/st=64, player_1/loss=73.972, player_2/loss=38.361, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 338.11it/s, env_step=15360, len=19, n/ep=4, n/st=64, player_1/loss=103.502, rew=0.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 336.06it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=160.313, player_2/loss=86.806, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 336.62it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=112.922, player_2/loss=69.687, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 336.52it/s, env_step=18432, len=20, n/ep=4, n/st=64, player_1/loss=154.961, player_2/loss=72.834, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 334.79it/s, env_step=19456, len=22, n/ep=3, n/st=64, player_1/loss=263.027, player_2/loss=78.016, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 332.88it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=99.231, player_2/loss=91.848, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 331.21it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=169.823, player_2/loss=103.718, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 335.41it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=155.729, player_2/loss=115.492, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 334.15it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=67.683, rew=25.00]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 335.23it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=33.501, player_2/loss=162.383, rew=15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.36it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=27.229, player_2/loss=198.244, rew=16.67]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 329.59it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=29.395, player_2/loss=201.497, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 335.32it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=39.010, player_2/loss=212.207, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 335.49it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=30.102, player_2/loss=210.903, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 337.16it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=63.798, player_2/loss=186.480, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 336.03it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=74.294, player_2/loss=171.034, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 330.69it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=121.872, player_2/loss=176.716, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 335.74it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=111.943, player_2/loss=186.799, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 335.82it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=10.062, player_2/loss=215.841, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 335.08it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=7.018, player_2/loss=221.576, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 335.72it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=17.615, player_2/loss=175.089, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 334.10it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=31.553, player_2/loss=200.709, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 334.49it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=56.240, player_2/loss=222.106, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 333.36it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=44.246, player_2/loss=188.035, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 335.19it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=48.832, player_2/loss=210.176, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 333.45it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=92.342, player_2/loss=140.807, rew=-5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 335.97it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=141.861, player_2/loss=112.021, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 335.21it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=171.266, player_2/loss=113.365, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 349.44it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=169.001, player_2/loss=100.371, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 336.66it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=168.447, player_2/loss=77.289, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 332.44it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=171.094, player_2/loss=63.098, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 335.55it/s, env_step=8192, len=22, n/ep=2, n/st=64, player_1/loss=145.134, player_2/loss=69.646, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 334.61it/s, env_step=9216, len=23, n/ep=3, n/st=64, player_1/loss=128.477, player_2/loss=129.676, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 336.56it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=223.243, player_2/loss=193.942, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 335.44it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=367.654, player_2/loss=195.500, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 323.15it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=377.287, player_2/loss=101.190, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 332.16it/s, env_step=13312, len=11, n/ep=7, n/st=64, player_1/loss=339.794, player_2/loss=77.126, rew=10.71]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 334.71it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=324.876, player_2/loss=74.242, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 336.76it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=332.734, player_2/loss=53.854, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 332.09it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=327.817, player_2/loss=38.147, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 336.33it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=320.729, player_2/loss=23.143, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 334.32it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=342.311, player_2/loss=10.489, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 335.17it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=371.075, player_2/loss=8.766, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 338.49it/s, env_step=1024, len=18, n/ep=3, n/st=64, player_1/loss=131.878, player_2/loss=92.258, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 332.78it/s, env_step=2048, len=15, n/ep=3, n/st=64, player_1/loss=95.151, player_2/loss=159.202, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 334.97it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_2/loss=244.018, rew=25.00]         


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 338.18it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=26.242, player_2/loss=223.247, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 337.40it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=19.695, player_2/loss=166.480, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 330.49it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=10.266, player_2/loss=178.069, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 336.56it/s, env_step=7168, len=15, n/ep=5, n/st=64, player_1/loss=6.069, player_2/loss=187.269, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 336.79it/s, env_step=8192, len=12, n/ep=4, n/st=64, player_1/loss=33.598, player_2/loss=190.652, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 337.43it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=33.342, player_2/loss=172.530, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 332.13it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=13.695, player_2/loss=207.060, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 337.14it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=10.626, player_2/loss=219.479, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 335.98it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=31.322, player_2/loss=179.418, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 338.91it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=40.989, player_2/loss=157.279, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 335.88it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=29.554, player_2/loss=170.417, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 334.90it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=15.079, player_2/loss=186.298, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 335.24it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=25.983, player_2/loss=160.950, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.06it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=17.233, player_2/loss=127.842, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 325.16it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=71.749, player_2/loss=130.959, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 337.30it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=78.016, player_2/loss=154.740, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 333.29it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=49.713, player_2/loss=134.609, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.09it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=63.182, player_2/loss=157.070, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.35it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=73.495, player_2/loss=142.341, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.88it/s, env_step=4096, len=20, n/ep=4, n/st=64, player_1/loss=93.880, player_2/loss=141.966, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 342.54it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=102.382, player_2/loss=144.948, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:03, 338.31it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=66.477, player_2/loss=137.675, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:03, 337.36it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=75.881, player_2/loss=129.350, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:03, 337.06it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=65.348, player_2/loss=115.951, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:03, 337.19it/s, env_step=9216, len=28, n/ep=2, n/st=64, player_1/loss=33.880, player_2/loss=87.613, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:03, 335.06it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=67.538, player_2/loss=60.056, rew=-8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:03, 337.15it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=94.948, player_2/loss=57.438, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:03, 338.56it/s, env_step=12288, len=22, n/ep=3, n/st=64, player_1/loss=57.013, player_2/loss=63.417, rew=-8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:03, 337.69it/s, env_step=13312, len=23, n/ep=3, n/st=64, player_1/loss=76.735, player_2/loss=65.411, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:03, 338.37it/s, env_step=14336, len=16, n/ep=5, n/st=64, player_1/loss=110.306, player_2/loss=62.453, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:03, 332.21it/s, env_step=15360, len=20, n/ep=3, n/st=64, player_1/loss=77.693, player_2/loss=64.719, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:03, 336.55it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=88.085, player_2/loss=75.944, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:03, 337.03it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=92.285, player_2/loss=61.507, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:03, 338.25it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=69.593, player_2/loss=33.622, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:03, 338.30it/s, env_step=19456, len=19, n/ep=3, n/st=64, player_1/loss=79.906, player_2/loss=30.452, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:03, 333.32it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=37.264, player_2/loss=35.565, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.97it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=120.606, rew=8.33]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.24it/s, env_step=3072, len=12, n/ep=6, n/st=64, player_1/loss=206.613, player_2/loss=175.222, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 334.78it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=225.410, player_2/loss=244.353, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 339.02it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=162.245, player_2/loss=218.601, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 332.82it/s, env_step=6144, len=12, n/ep=6, n/st=64, player_1/loss=122.841, player_2/loss=199.232, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 335.72it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=80.849, player_2/loss=165.527, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 335.65it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=41.574, player_2/loss=137.733, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 334.51it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=28.167, player_2/loss=146.336, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 337.39it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=21.452, player_2/loss=145.726, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 332.12it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=31.838, player_2/loss=121.881, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.53it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=62.144, player_2/loss=121.583, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 336.68it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=52.450, player_2/loss=159.549, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 336.33it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=18.315, player_2/loss=193.636, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 333.97it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=29.829, player_2/loss=158.069, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 336.40it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=37.980, player_2/loss=116.153, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 335.08it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=44.002, player_2/loss=143.422, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.11it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=17.977, player_2/loss=186.075, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.65it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=9.932, player_2/loss=159.761, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 331.62it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=32.423, player_2/loss=132.190, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 335.74it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=40.170, player_2/loss=145.253, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 336.93it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=73.734, player_2/loss=153.467, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 336.83it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=86.227, player_2/loss=132.545, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 333.01it/s, env_step=5120, len=23, n/ep=3, n/st=64, player_1/loss=85.540, player_2/loss=95.123, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 349.06it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=105.253, rew=-25.00]        


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.96it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=114.497, player_2/loss=97.219, rew=-12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 336.37it/s, env_step=8192, len=22, n/ep=2, n/st=64, player_1/loss=102.277, player_2/loss=83.639, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 331.55it/s, env_step=9216, len=24, n/ep=2, n/st=64, player_1/loss=108.896, player_2/loss=64.602, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 335.79it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=92.755, player_2/loss=76.206, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 333.75it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=80.085, player_2/loss=54.378, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 339.49it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=88.330, player_2/loss=39.485, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 336.86it/s, env_step=13312, len=30, n/ep=2, n/st=64, player_1/loss=81.574, player_2/loss=64.604, rew=-25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 334.09it/s, env_step=14336, len=27, n/ep=3, n/st=64, player_1/loss=66.985, player_2/loss=80.731, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 336.30it/s, env_step=15360, len=29, n/ep=2, n/st=64, player_2/loss=59.719, rew=0.00]         


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 338.03it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=89.086, player_2/loss=52.873, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 335.54it/s, env_step=17408, len=20, n/ep=2, n/st=64, player_1/loss=81.397, player_2/loss=67.516, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 335.39it/s, env_step=18432, len=22, n/ep=2, n/st=64, player_1/loss=79.292, player_2/loss=55.394, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 331.08it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=68.779, player_2/loss=29.189, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 338.24it/s, env_step=1024, len=16, n/ep=3, n/st=64, player_1/loss=84.237, player_2/loss=110.479, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.02it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=81.621, player_2/loss=180.462, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 333.83it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=67.125, player_2/loss=175.606, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 335.95it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=35.007, player_2/loss=106.474, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 331.20it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=23.562, player_2/loss=62.516, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 339.36it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=52.803, player_2/loss=52.426, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 334.78it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=83.490, player_2/loss=46.707, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.30it/s, env_step=8192, len=16, n/ep=5, n/st=64, player_1/loss=67.311, player_2/loss=77.918, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 336.20it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=37.068, player_2/loss=58.183, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 335.83it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=14.152, player_2/loss=72.441, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.65it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=25.595, player_2/loss=81.573, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.73it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=23.355, player_2/loss=86.545, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 334.49it/s, env_step=13312, len=12, n/ep=6, n/st=64, player_1/loss=21.920, player_2/loss=95.351, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 327.88it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=16.714, player_2/loss=68.882, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 333.92it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=33.936, player_2/loss=111.728, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 335.80it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=10.255, player_2/loss=119.049, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 335.22it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=41.011, player_2/loss=110.200, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 332.77it/s, env_step=18432, len=12, n/ep=4, n/st=64, player_1/loss=40.382, player_2/loss=111.461, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 329.68it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=8.831, player_2/loss=98.598, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 336.39it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=34.996, player_2/loss=59.687, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.49it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=22.819, player_2/loss=44.355, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 335.75it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=11.002, player_2/loss=41.590, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 337.12it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=70.995, player_2/loss=59.802, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 336.76it/s, env_step=5120, len=20, n/ep=4, n/st=64, player_1/loss=98.804, player_2/loss=66.453, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 335.63it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=111.096, player_2/loss=143.582, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 349.79it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=113.246, player_2/loss=155.473, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 332.03it/s, env_step=8192, len=23, n/ep=2, n/st=64, player_1/loss=123.755, player_2/loss=101.036, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 331.38it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=164.253, player_2/loss=133.309, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 337.13it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=201.616, player_2/loss=172.577, rew=-16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 337.81it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=176.319, player_2/loss=119.384, rew=-15.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 336.72it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=207.985, player_2/loss=122.045, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 337.05it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=276.755, player_2/loss=154.265, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 335.40it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=378.236, player_2/loss=140.212, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 336.85it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=456.806, player_2/loss=159.855, rew=-8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 335.23it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=234.038, player_2/loss=169.930, rew=-15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 337.54it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=89.737, player_2/loss=153.531, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 333.59it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=90.104, player_2/loss=139.782, rew=-25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 338.56it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_2/loss=164.110, rew=0.00]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 336.12it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=225.032, player_2/loss=106.152, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 335.32it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=185.217, player_2/loss=121.495, rew=-12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 329.68it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=125.351, player_2/loss=104.135, rew=-5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 335.52it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=69.800, player_2/loss=141.663, rew=17.86]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 337.26it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=42.375, player_2/loss=190.576, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 336.93it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=97.059, player_2/loss=224.060, rew=10.71]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 334.33it/s, env_step=7168, len=8, n/ep=7, n/st=64, player_1/loss=107.780, player_2/loss=205.605, rew=17.86]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 333.40it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=116.273, player_2/loss=233.809, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 336.36it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=111.776, player_2/loss=223.140, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 336.50it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=237.125, player_2/loss=250.627, rew=-17.86]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 339.25it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=325.042, player_2/loss=209.731, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 337.08it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=182.835, player_2/loss=103.101, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 332.23it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=137.681, player_2/loss=155.015, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 339.63it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_2/loss=207.363, rew=17.86]        


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 336.42it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=127.554, player_2/loss=196.220, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 335.25it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=85.070, player_2/loss=173.136, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 336.88it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=83.490, player_2/loss=186.919, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 330.18it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=78.218, player_2/loss=220.796, rew=10.71]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 334.93it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=111.136, player_2/loss=224.552, rew=16.67]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 335.99it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=121.434, player_2/loss=180.838, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.05it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=113.681, player_2/loss=142.019, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 334.32it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=143.507, player_2/loss=94.372, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 333.63it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=179.148, player_2/loss=63.668, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 337.02it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=218.426, player_2/loss=49.423, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 337.22it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=225.664, player_2/loss=52.012, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 336.64it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=193.845, player_2/loss=52.013, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 344.90it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=201.219, player_2/loss=35.396, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 336.65it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=194.083, player_2/loss=50.714, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 338.91it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=155.610, player_2/loss=38.721, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 334.89it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=141.745, player_2/loss=50.716, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 336.24it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=170.762, player_2/loss=68.499, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 333.88it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=179.224, player_2/loss=38.443, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 337.95it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=143.012, player_2/loss=46.732, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 337.05it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=126.793, player_2/loss=68.969, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 328.87it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=165.712, player_2/loss=43.785, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 339.94it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=189.211, player_2/loss=51.336, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 334.54it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=168.596, player_2/loss=43.932, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 338.13it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_2/loss=32.888, rew=25.00]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 335.97it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=128.699, player_2/loss=235.437, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 339.84it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=84.222, player_2/loss=173.275, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 338.96it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=55.103, player_2/loss=192.782, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 332.00it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=74.012, player_2/loss=249.938, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 338.88it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=44.945, player_2/loss=242.979, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 336.10it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=23.910, player_2/loss=260.958, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.00it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=26.258, player_2/loss=266.507, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 337.27it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=38.422, player_2/loss=236.512, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 333.24it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=28.401, player_2/loss=272.169, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 335.26it/s, env_step=10240, len=15, n/ep=4, n/st=64, player_1/loss=37.129, player_2/loss=252.862, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 333.80it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=42.155, rew=25.00]        


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 336.97it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=12.000, player_2/loss=239.795, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 333.69it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=3.866, player_2/loss=217.039, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 333.20it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=7.671, player_2/loss=213.433, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 338.02it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=5.665, player_2/loss=214.557, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.62it/s, env_step=16384, len=18, n/ep=3, n/st=64, player_1/loss=4.265, player_2/loss=221.706, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.17it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=8.255, player_2/loss=223.577, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 339.16it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=21.126, player_2/loss=236.265, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 333.37it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=37.279, player_2/loss=236.205, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 339.15it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=44.402, player_2/loss=127.406, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.66it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=73.617, player_2/loss=147.177, rew=8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.43it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=71.416, player_2/loss=130.847, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.15it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=75.498, player_2/loss=76.188, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 334.04it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=22.899, player_2/loss=67.783, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 339.86it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=15.909, player_2/loss=66.066, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.48it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=17.335, player_2/loss=88.811, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.93it/s, env_step=8192, len=15, n/ep=5, n/st=64, player_1/loss=40.846, player_2/loss=122.405, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 351.80it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=69.843, player_2/loss=129.980, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 331.79it/s, env_step=10240, len=16, n/ep=3, n/st=64, player_1/loss=53.723, player_2/loss=81.063, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.79it/s, env_step=11264, len=26, n/ep=2, n/st=64, player_1/loss=34.159, player_2/loss=54.640, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 335.58it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=32.008, player_2/loss=95.113, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 335.72it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=133.954, player_2/loss=146.027, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #14: 1025it [00:03, 331.96it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=291.194, player_2/loss=132.369, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #15: 1025it [00:03, 336.38it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=390.857, player_2/loss=83.527, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #16: 1025it [00:03, 335.94it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=418.543, player_2/loss=57.598, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #17: 1025it [00:03, 332.25it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=459.864, player_2/loss=37.748, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #18: 1025it [00:03, 336.88it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=500.690, player_2/loss=46.523, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #19: 1025it [00:03, 337.67it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=435.331, player_2/loss=73.109, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #13


Epoch #1: 1025it [00:03, 336.27it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=307.427, player_2/loss=29.746, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.25it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=336.012, player_2/loss=20.556, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 334.79it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=252.561, player_2/loss=110.461, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 335.14it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=138.875, player_2/loss=173.953, rew=5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 337.48it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=122.814, player_2/loss=231.711, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 337.89it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=80.360, player_2/loss=268.378, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 338.69it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=71.156, player_2/loss=259.004, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 333.86it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=40.621, player_2/loss=251.799, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 338.42it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_2/loss=245.987, rew=25.00]         


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 338.82it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=21.143, player_2/loss=243.419, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 336.97it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=28.462, player_2/loss=239.055, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 335.97it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=24.965, player_2/loss=244.590, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 334.47it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=17.298, player_2/loss=253.018, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 336.44it/s, env_step=14336, len=13, n/ep=4, n/st=64, player_1/loss=9.646, player_2/loss=248.709, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 337.94it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=5.610, rew=25.00]         


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 336.14it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=44.040, player_2/loss=295.311, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 337.40it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=47.774, player_2/loss=268.183, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 334.07it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=23.181, player_2/loss=445.024, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 335.14it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=24.646, player_2/loss=617.473, rew=18.75]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 337.32it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=41.531, player_2/loss=466.662, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.00it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=87.627, player_2/loss=431.784, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.44it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=128.867, player_2/loss=304.250, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 333.98it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=93.665, player_2/loss=154.339, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.63it/s, env_step=5120, len=29, n/ep=2, n/st=64, player_1/loss=99.854, player_2/loss=148.284, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:03, 340.87it/s, env_step=6144, len=24, n/ep=2, n/st=64, player_1/loss=93.411, player_2/loss=171.002, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:03, 336.99it/s, env_step=7168, len=23, n/ep=3, n/st=64, player_1/loss=111.215, player_2/loss=136.901, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:03, 337.42it/s, env_step=8192, len=25, n/ep=2, n/st=64, player_1/loss=215.594, player_2/loss=106.782, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:03, 333.16it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=231.691, player_2/loss=119.754, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 347.32it/s, env_step=10240, len=27, n/ep=2, n/st=64, player_1/loss=215.094, player_2/loss=134.144, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:03, 337.68it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=195.079, player_2/loss=154.640, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:03, 336.52it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=165.626, player_2/loss=121.682, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:03, 333.23it/s, env_step=13312, len=22, n/ep=3, n/st=64, player_1/loss=270.839, player_2/loss=135.596, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:03, 337.12it/s, env_step=14336, len=17, n/ep=3, n/st=64, player_1/loss=287.445, player_2/loss=138.093, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:03, 338.32it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=147.360, player_2/loss=126.418, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:03, 340.49it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=110.632, player_2/loss=148.112, rew=-15.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:03, 339.21it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=120.860, player_2/loss=174.138, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:03, 333.64it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=135.337, player_2/loss=132.962, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:03, 337.47it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=253.886, player_2/loss=142.620, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:03, 336.10it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=45.641, player_2/loss=157.453, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.51it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=32.272, player_2/loss=177.366, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.48it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=24.482, player_2/loss=171.315, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 330.98it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=43.355, player_2/loss=184.190, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.85it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=35.772, player_2/loss=145.807, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.13it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=20.163, player_2/loss=127.775, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 336.12it/s, env_step=7168, len=9, n/ep=6, n/st=64, player_1/loss=21.092, player_2/loss=165.543, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.76it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=46.599, player_2/loss=183.320, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 331.06it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=46.076, player_2/loss=177.026, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 333.16it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=26.010, player_2/loss=151.093, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 336.70it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=24.417, player_2/loss=160.168, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.96it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=10.058, player_2/loss=177.236, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 334.70it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=9.008, player_2/loss=170.858, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 333.10it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=10.612, player_2/loss=192.235, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 333.04it/s, env_step=15360, len=9, n/ep=7, n/st=64, player_1/loss=7.633, player_2/loss=229.754, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 334.17it/s, env_step=16384, len=10, n/ep=7, n/st=64, player_1/loss=21.157, player_2/loss=204.712, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 337.11it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=31.661, player_2/loss=200.456, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 336.14it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=15.418, player_2/loss=196.611, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 332.38it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=6.134, player_2/loss=182.366, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 338.01it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=56.796, player_2/loss=156.763, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 339.64it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=171.704, player_2/loss=167.720, rew=15.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 338.17it/s, env_step=3072, len=10, n/ep=5, n/st=64, player_1/loss=338.381, player_2/loss=171.497, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 337.67it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=441.994, player_2/loss=123.466, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 332.08it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=496.176, player_2/loss=104.939, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 337.75it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=452.727, player_2/loss=121.169, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 334.95it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=402.012, player_2/loss=109.214, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 338.53it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=517.585, player_2/loss=69.136, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 332.56it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=455.547, player_2/loss=28.086, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 332.74it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=434.409, player_2/loss=69.985, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 348.37it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=349.294, player_2/loss=81.283, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 334.94it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=212.211, player_2/loss=147.248, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 335.33it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=129.783, player_2/loss=204.230, rew=-16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 337.93it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=135.707, player_2/loss=188.383, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 335.72it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=223.143, player_2/loss=172.025, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.65it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=353.728, player_2/loss=103.855, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 337.45it/s, env_step=17408, len=10, n/ep=6, n/st=64, player_1/loss=307.276, player_2/loss=115.283, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 338.24it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=301.957, player_2/loss=111.722, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 337.11it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=372.488, player_2/loss=38.824, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 330.40it/s, env_step=1024, len=25, n/ep=3, n/st=64, player_1/loss=293.859, player_2/loss=71.136, rew=-25.00]


Epoch #1: test_reward: 100.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 336.25it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=213.402, player_2/loss=101.933, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.77it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=145.552, player_2/loss=155.754, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 337.58it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=146.805, player_2/loss=313.135, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 338.20it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=160.824, player_2/loss=340.461, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 333.51it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=150.932, player_2/loss=197.158, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.98it/s, env_step=7168, len=17, n/ep=3, n/st=64, player_1/loss=111.877, player_2/loss=170.841, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 338.79it/s, env_step=8192, len=21, n/ep=4, n/st=64, player_1/loss=55.790, player_2/loss=121.942, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 338.22it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=117.677, player_2/loss=122.198, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.29it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=137.812, player_2/loss=140.158, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 336.02it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=99.116, player_2/loss=160.557, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 336.82it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=46.259, player_2/loss=151.120, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 337.75it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=49.533, player_2/loss=106.872, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 335.79it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=42.553, rew=16.67]        


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 338.19it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=28.583, player_2/loss=255.004, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 335.54it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=20.267, player_2/loss=283.335, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 337.17it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=43.528, player_2/loss=251.877, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 339.45it/s, env_step=18432, len=13, n/ep=6, n/st=64, player_1/loss=53.217, player_2/loss=233.511, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 329.73it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=16.518, player_2/loss=246.697, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 100.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 339.38it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=17.223, player_2/loss=216.296, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.69it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=24.797, player_2/loss=166.940, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 334.63it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=61.644, player_2/loss=165.784, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 338.10it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=122.101, player_2/loss=152.806, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 339.68it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=139.583, player_2/loss=148.068, rew=-5.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 337.89it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=156.889, player_2/loss=116.329, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 335.15it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=188.764, player_2/loss=99.713, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 335.09it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=165.989, player_2/loss=120.348, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 335.44it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=137.623, player_2/loss=107.371, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 335.56it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=109.039, player_2/loss=76.031, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 337.99it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=113.137, player_2/loss=81.298, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 343.86it/s, env_step=12288, len=15, n/ep=3, n/st=64, player_1/loss=167.390, player_2/loss=56.774, rew=8.33]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 337.23it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=147.763, player_2/loss=19.232, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 337.09it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=149.272, player_2/loss=27.800, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 335.67it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=174.572, player_2/loss=43.987, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 335.22it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=130.576, player_2/loss=36.378, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 334.69it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=168.338, player_2/loss=55.134, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 337.79it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=204.252, player_2/loss=46.965, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 337.97it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=152.886, player_2/loss=33.743, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 333.56it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=146.869, player_2/loss=151.876, rew=18.75]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 332.28it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=120.247, player_2/loss=179.088, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 335.41it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=109.647, player_2/loss=225.750, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 335.32it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=86.156, player_2/loss=261.543, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.71it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_1/loss=65.798, player_2/loss=248.072, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.63it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=92.329, player_2/loss=265.171, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 332.98it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=106.784, player_2/loss=331.650, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.08it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=79.497, player_2/loss=300.286, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 333.28it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=41.901, player_2/loss=278.462, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 334.13it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=19.108, player_2/loss=249.867, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 335.41it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=31.720, rew=25.00]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 330.90it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=94.591, player_2/loss=267.544, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 335.88it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=73.635, player_2/loss=281.612, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 337.18it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=76.640, player_2/loss=308.301, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 334.21it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=30.606, player_2/loss=333.391, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 335.18it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=10.831, player_2/loss=253.520, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 332.78it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=7.925, player_2/loss=223.245, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 336.58it/s, env_step=18432, len=9, n/ep=9, n/st=64, player_1/loss=52.379, player_2/loss=279.570, rew=19.44]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.10it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=55.810, player_2/loss=269.862, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 337.03it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=28.012, player_2/loss=175.564, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.86it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=27.414, player_2/loss=144.923, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 333.59it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=21.781, rew=-25.00]         


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.66it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=10.749, player_2/loss=85.177, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.88it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=9.876, player_2/loss=94.027, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.66it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=17.384, player_2/loss=118.164, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 336.69it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=59.699, player_2/loss=101.708, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 334.12it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=91.092, rew=-8.33]          


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 339.85it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=199.700, player_2/loss=123.171, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #10: 1025it [00:03, 336.97it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=330.512, player_2/loss=162.253, rew=5.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #11: 1025it [00:03, 338.85it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=279.585, player_2/loss=172.766, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #12: 1025it [00:02, 343.16it/s, env_step=12288, len=13, n/ep=4, n/st=64, player_1/loss=207.982, player_2/loss=186.368, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #13: 1025it [00:03, 338.26it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=223.345, player_2/loss=137.423, rew=-5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #14: 1025it [00:03, 339.21it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=247.874, player_2/loss=146.471, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #15: 1025it [00:03, 341.06it/s, env_step=15360, len=11, n/ep=6, n/st=64, player_1/loss=178.188, player_2/loss=153.474, rew=-16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #16: 1025it [00:03, 337.73it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=245.052, rew=25.00]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #17: 1025it [00:03, 337.85it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=313.077, player_2/loss=105.824, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #18: 1025it [00:03, 333.61it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=227.735, rew=-25.00]      


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #19: 1025it [00:03, 337.15it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=186.862, player_2/loss=97.547, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #9


Epoch #1: 1025it [00:03, 338.85it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=410.913, player_2/loss=155.519, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 334.57it/s, env_step=2048, len=12, n/ep=6, n/st=64, player_1/loss=307.192, player_2/loss=161.264, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.74it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=210.440, player_2/loss=167.455, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 334.08it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=176.837, player_2/loss=166.596, rew=-16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:03, 336.84it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=148.233, player_2/loss=181.571, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 338.03it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=111.198, player_2/loss=168.161, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 337.52it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=140.841, player_2/loss=148.373, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 338.35it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=152.729, player_2/loss=150.272, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 335.18it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=126.253, player_2/loss=127.553, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:03, 340.13it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=154.960, player_2/loss=146.888, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 322.19it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=153.884, player_2/loss=170.187, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 263.36it/s, env_step=12288, len=11, n/ep=5, n/st=64, player_1/loss=117.833, player_2/loss=219.039, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 294.29it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=98.664, player_2/loss=200.000, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 339.01it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=55.372, player_2/loss=206.537, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 415.75it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=52.494, player_2/loss=219.634, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 356.43it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=33.770, player_2/loss=249.814, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 363.70it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=27.267, player_2/loss=240.640, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 332.32it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=34.909, player_2/loss=218.983, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 341.93it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=20.469, player_2/loss=202.711, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 334.19it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=160.000, player_2/loss=168.028, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 329.25it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=90.193, player_2/loss=158.206, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 329.82it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=66.775, player_2/loss=172.838, rew=-19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 323.70it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=183.375, player_2/loss=185.402, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 338.22it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=255.124, player_2/loss=134.098, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 330.08it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=250.781, rew=0.00]          


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 343.77it/s, env_step=7168, len=15, n/ep=3, n/st=64, player_1/loss=226.273, player_2/loss=108.550, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.68it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=269.516, player_2/loss=91.220, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 345.12it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=244.936, player_2/loss=84.774, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 344.24it/s, env_step=10240, len=14, n/ep=4, n/st=64, player_1/loss=247.850, player_2/loss=73.122, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 343.89it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=268.271, player_2/loss=47.548, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 341.30it/s, env_step=12288, len=12, n/ep=4, n/st=64, player_1/loss=226.627, player_2/loss=40.154, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 347.93it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=193.128, player_2/loss=43.952, rew=-5.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 340.87it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=187.350, player_2/loss=65.129, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 342.90it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=222.733, player_2/loss=86.052, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 343.40it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=182.997, player_2/loss=80.988, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 343.85it/s, env_step=17408, len=16, n/ep=3, n/st=64, player_1/loss=168.616, player_2/loss=47.986, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 337.88it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=207.263, player_2/loss=36.669, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 343.76it/s, env_step=19456, len=18, n/ep=3, n/st=64, player_1/loss=178.864, player_2/loss=48.779, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 340.41it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=221.666, player_2/loss=62.229, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 334.77it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=159.824, player_2/loss=60.705, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 341.43it/s, env_step=3072, len=24, n/ep=2, n/st=64, player_1/loss=105.817, player_2/loss=56.787, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 339.55it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_2/loss=87.186, rew=-25.00]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 344.87it/s, env_step=5120, len=29, n/ep=2, n/st=64, player_1/loss=78.252, player_2/loss=85.836, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 346.35it/s, env_step=6144, len=23, n/ep=3, n/st=64, player_1/loss=79.107, player_2/loss=87.043, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 342.28it/s, env_step=7168, len=22, n/ep=2, n/st=64, player_1/loss=104.626, player_2/loss=111.339, rew=0.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 344.11it/s, env_step=8192, len=25, n/ep=3, n/st=64, player_1/loss=105.600, player_2/loss=110.022, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 338.70it/s, env_step=9216, len=26, n/ep=3, n/st=64, player_1/loss=83.901, player_2/loss=73.300, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 342.96it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=88.338, player_2/loss=77.869, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 345.37it/s, env_step=11264, len=22, n/ep=3, n/st=64, player_1/loss=125.103, player_2/loss=106.917, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 341.51it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=145.074, player_2/loss=145.676, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 341.57it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=114.035, player_2/loss=143.305, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 336.33it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=30.188, player_2/loss=164.213, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 341.24it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=86.656, player_2/loss=128.185, rew=-12.50]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 342.71it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=90.889, player_2/loss=88.280, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 342.20it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=122.877, player_2/loss=95.503, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 343.91it/s, env_step=18432, len=17, n/ep=3, n/st=64, player_1/loss=138.265, player_2/loss=157.241, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 336.54it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=100.624, player_2/loss=148.383, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 341.77it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=62.460, player_2/loss=151.258, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.41it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=63.330, player_2/loss=143.968, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 345.19it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=74.074, player_2/loss=115.172, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.42it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=75.593, player_2/loss=110.640, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 342.37it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=65.575, player_2/loss=150.590, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 345.28it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=46.758, player_2/loss=143.240, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 344.45it/s, env_step=7168, len=20, n/ep=3, n/st=64, player_1/loss=49.797, player_2/loss=106.018, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:02, 344.74it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=75.696, player_2/loss=73.845, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 339.40it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=73.375, player_2/loss=63.585, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 342.16it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=95.226, player_2/loss=103.633, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 342.05it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=134.745, player_2/loss=102.120, rew=-5.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 343.77it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=103.073, rew=-25.00]      


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 343.62it/s, env_step=13312, len=15, n/ep=5, n/st=64, player_1/loss=49.395, player_2/loss=84.143, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 337.89it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=58.770, player_2/loss=64.501, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 353.67it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=96.826, player_2/loss=85.214, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 342.07it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=133.934, player_2/loss=96.787, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 343.72it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=182.647, player_2/loss=86.695, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 343.94it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=157.154, player_2/loss=107.388, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:03, 337.86it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=127.935, player_2/loss=107.436, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 340.80it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=222.179, player_2/loss=56.140, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.63it/s, env_step=2048, len=14, n/ep=4, n/st=64, player_1/loss=171.003, player_2/loss=107.250, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 344.08it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=120.909, player_2/loss=120.367, rew=-12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 343.34it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=59.946, player_2/loss=150.857, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 337.86it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=24.830, player_2/loss=144.617, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 342.27it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=28.663, player_2/loss=200.461, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 340.58it/s, env_step=7168, len=15, n/ep=5, n/st=64, player_1/loss=27.908, player_2/loss=240.224, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 341.29it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=23.311, player_2/loss=180.028, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 343.58it/s, env_step=9216, len=19, n/ep=3, n/st=64, player_1/loss=20.016, player_2/loss=158.748, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 338.78it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=9.362, player_2/loss=139.621, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 344.89it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=8.447, player_2/loss=168.860, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 342.60it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=48.959, player_2/loss=184.145, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 343.90it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=103.975, player_2/loss=165.207, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 342.22it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=99.148, player_2/loss=163.722, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 337.08it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=44.189, player_2/loss=160.437, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 339.40it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=19.915, player_2/loss=150.458, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 340.61it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=20.448, player_2/loss=98.466, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 339.93it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=10.325, player_2/loss=68.540, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 341.07it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=27.643, player_2/loss=72.554, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 338.46it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=99.993, player_2/loss=101.982, rew=15.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 341.52it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=192.578, player_2/loss=185.826, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 342.85it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=292.906, player_2/loss=231.967, rew=5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 341.92it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=296.020, player_2/loss=195.650, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 335.46it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=299.813, player_2/loss=152.296, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 341.31it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=356.035, player_2/loss=155.593, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 342.34it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=294.884, player_2/loss=93.107, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 339.92it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=279.282, player_2/loss=71.134, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 340.49it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=340.792, player_2/loss=75.303, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.83it/s, env_step=10240, len=13, n/ep=3, n/st=64, player_1/loss=387.666, player_2/loss=43.430, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 340.98it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=370.848, player_2/loss=48.141, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 343.12it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=290.418, player_2/loss=57.597, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 341.78it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=289.062, player_2/loss=41.762, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 342.15it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=393.630, player_2/loss=45.923, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 336.96it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=388.961, player_2/loss=50.381, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 351.44it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=397.665, rew=25.00]       


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 342.23it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=358.972, player_2/loss=51.914, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 341.68it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=310.577, player_2/loss=28.411, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 336.79it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=377.762, player_2/loss=69.842, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 330.93it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=81.762, player_2/loss=114.587, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 345.18it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=79.287, player_2/loss=151.455, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 344.74it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=118.492, player_2/loss=165.695, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 344.66it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=82.398, player_2/loss=179.200, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 341.79it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=27.349, player_2/loss=242.304, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 327.15it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=58.794, player_2/loss=227.660, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 342.81it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=108.853, player_2/loss=177.253, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 344.53it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=121.071, player_2/loss=116.176, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 343.09it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=67.618, player_2/loss=137.069, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 343.55it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=42.739, player_2/loss=203.682, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 339.70it/s, env_step=11264, len=30, n/ep=2, n/st=64, player_1/loss=39.946, player_2/loss=189.726, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 343.30it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=42.891, player_2/loss=172.690, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 343.54it/s, env_step=13312, len=19, n/ep=5, n/st=64, player_1/loss=46.564, player_2/loss=178.610, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 341.87it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=34.407, rew=25.00]        


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 342.97it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=38.414, player_2/loss=177.224, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.14it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=49.060, player_2/loss=153.834, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 341.53it/s, env_step=17408, len=15, n/ep=3, n/st=64, player_1/loss=54.408, player_2/loss=152.531, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 342.21it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=20.535, player_2/loss=177.249, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 344.68it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=18.902, player_2/loss=192.624, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 338.18it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=31.918, player_2/loss=157.655, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 345.38it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=24.065, player_2/loss=116.449, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.77it/s, env_step=3072, len=21, n/ep=4, n/st=64, player_1/loss=31.223, player_2/loss=70.175, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 342.68it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=52.863, player_2/loss=95.265, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 341.59it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=100.496, player_2/loss=107.101, rew=-12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 339.84it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=93.299, player_2/loss=121.128, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:02, 345.69it/s, env_step=7168, len=23, n/ep=2, n/st=64, player_1/loss=93.750, player_2/loss=119.135, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 343.57it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=96.586, player_2/loss=88.904, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:02, 343.53it/s, env_step=9216, len=21, n/ep=4, n/st=64, player_1/loss=73.862, player_2/loss=72.757, rew=-12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 343.34it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=110.810, player_2/loss=129.910, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:03, 338.67it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=150.643, player_2/loss=133.213, rew=-16.67]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:02, 344.59it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=161.488, player_2/loss=124.688, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 345.77it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=138.715, player_2/loss=123.782, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 344.43it/s, env_step=14336, len=17, n/ep=3, n/st=64, player_1/loss=107.502, player_2/loss=103.061, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 342.57it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=48.792, player_2/loss=88.390, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 342.03it/s, env_step=16384, len=27, n/ep=2, n/st=64, player_1/loss=21.130, player_2/loss=54.652, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:02, 349.68it/s, env_step=17408, len=31, n/ep=2, n/st=64, player_1/loss=28.583, player_2/loss=39.140, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:03, 341.63it/s, env_step=18432, len=26, n/ep=2, n/st=64, player_1/loss=29.154, player_2/loss=48.133, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 343.03it/s, env_step=19456, len=19, n/ep=3, n/st=64, player_1/loss=28.125, player_2/loss=42.764, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:03, 340.00it/s, env_step=1024, len=27, n/ep=3, n/st=64, player_1/loss=62.171, player_2/loss=76.138, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.85it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=43.569, player_2/loss=46.919, rew=25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.60it/s, env_step=3072, len=30, n/ep=2, n/st=64, player_1/loss=48.719, player_2/loss=44.602, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 343.00it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=88.210, player_2/loss=81.303, rew=12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 341.52it/s, env_step=5120, len=13, n/ep=4, n/st=64, player_1/loss=83.089, player_2/loss=79.514, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 342.21it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=32.568, player_2/loss=59.296, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 339.40it/s, env_step=7168, len=11, n/ep=4, n/st=64, player_1/loss=12.934, player_2/loss=58.225, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 341.62it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=14.691, player_2/loss=71.601, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 341.66it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=10.867, player_2/loss=76.888, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 342.57it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=8.810, player_2/loss=85.694, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 342.35it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=44.437, player_2/loss=95.967, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.13it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=49.790, player_2/loss=93.889, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 344.88it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=12.426, player_2/loss=67.335, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 341.92it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=5.579, player_2/loss=71.079, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 340.33it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=8.609, player_2/loss=73.977, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 342.59it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=11.536, player_2/loss=72.216, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 336.41it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=30.167, player_2/loss=83.119, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 341.01it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=26.814, player_2/loss=79.300, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 341.91it/s, env_step=19456, len=15, n/ep=5, n/st=64, player_1/loss=13.354, player_2/loss=77.970, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 338.95it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=20.787, player_2/loss=66.211, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.73it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=30.674, player_2/loss=73.326, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 335.73it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=37.563, player_2/loss=89.998, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 343.13it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=83.948, player_2/loss=118.093, rew=-15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 342.08it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=60.344, player_2/loss=102.464, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 341.13it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=70.314, player_2/loss=89.465, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:02, 343.55it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=63.388, player_2/loss=93.686, rew=-16.67]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 338.50it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=42.483, player_2/loss=81.772, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:02, 342.47it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=43.247, player_2/loss=93.939, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 342.59it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=82.052, player_2/loss=135.627, rew=-15.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:02, 346.40it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=137.293, rew=-5.00]       


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 344.65it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=107.937, player_2/loss=111.427, rew=5.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 338.91it/s, env_step=13312, len=14, n/ep=5, n/st=64, player_1/loss=81.246, player_2/loss=112.967, rew=-15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 341.68it/s, env_step=14336, len=12, n/ep=4, n/st=64, player_1/loss=115.928, player_2/loss=108.304, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 343.44it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=70.425, player_2/loss=78.025, rew=-15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 341.43it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_2/loss=75.793, rew=25.00]        


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 343.49it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=168.113, player_2/loss=79.196, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 337.86it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_2/loss=56.713, rew=-8.33]        


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 352.53it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_1/loss=128.577, player_2/loss=53.032, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 335.03it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=67.444, player_2/loss=173.591, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.85it/s, env_step=2048, len=19, n/ep=4, n/st=64, player_1/loss=50.581, player_2/loss=132.324, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.02it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=34.920, player_2/loss=55.890, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 341.77it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=43.797, player_2/loss=29.769, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 342.46it/s, env_step=5120, len=30, n/ep=2, n/st=64, player_1/loss=63.009, player_2/loss=63.339, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 342.32it/s, env_step=6144, len=29, n/ep=2, n/st=64, player_1/loss=93.933, player_2/loss=99.056, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 341.57it/s, env_step=7168, len=23, n/ep=2, n/st=64, player_1/loss=62.766, player_2/loss=95.644, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.76it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=42.830, player_2/loss=66.907, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 342.22it/s, env_step=9216, len=24, n/ep=2, n/st=64, player_1/loss=24.356, player_2/loss=53.385, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 343.70it/s, env_step=10240, len=23, n/ep=3, n/st=64, player_2/loss=87.404, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 341.60it/s, env_step=11264, len=23, n/ep=4, n/st=64, player_1/loss=45.405, player_2/loss=115.071, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 344.33it/s, env_step=12288, len=24, n/ep=3, n/st=64, player_1/loss=50.600, player_2/loss=105.073, rew=-25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 337.99it/s, env_step=13312, len=33, n/ep=2, n/st=64, player_1/loss=106.537, player_2/loss=152.682, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 341.54it/s, env_step=14336, len=27, n/ep=2, n/st=64, player_1/loss=109.662, player_2/loss=151.245, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 343.63it/s, env_step=15360, len=23, n/ep=3, n/st=64, player_1/loss=37.507, player_2/loss=49.379, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 342.35it/s, env_step=16384, len=19, n/ep=3, n/st=64, player_1/loss=24.208, player_2/loss=33.617, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 341.11it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=12.388, player_2/loss=42.185, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 339.70it/s, env_step=18432, len=19, n/ep=4, n/st=64, player_1/loss=9.341, player_2/loss=58.466, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 340.32it/s, env_step=19456, len=17, n/ep=3, n/st=64, player_1/loss=24.279, player_2/loss=60.770, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 338.72it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=46.985, player_2/loss=181.224, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.15it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=39.730, player_2/loss=177.556, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 343.83it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=78.593, player_2/loss=229.455, rew=-12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 338.27it/s, env_step=4096, len=13, n/ep=4, n/st=64, player_1/loss=102.529, player_2/loss=119.817, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 341.02it/s, env_step=5120, len=17, n/ep=3, n/st=64, player_1/loss=175.281, player_2/loss=69.011, rew=8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 343.36it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=216.034, player_2/loss=174.547, rew=-16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 344.57it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=152.397, player_2/loss=213.487, rew=-12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 338.88it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=50.773, player_2/loss=132.172, rew=15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 342.70it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=62.411, player_2/loss=108.042, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 343.38it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=158.792, player_2/loss=140.788, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 343.73it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=263.772, player_2/loss=126.780, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 344.86it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=277.511, player_2/loss=119.291, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 338.57it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=231.240, player_2/loss=122.553, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 344.09it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=180.894, player_2/loss=136.285, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 341.16it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=203.389, player_2/loss=147.022, rew=17.86]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 343.29it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=212.037, player_2/loss=156.222, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 340.94it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=207.670, player_2/loss=110.130, rew=10.71]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 337.07it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=242.447, player_2/loss=78.794, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 341.58it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=267.039, player_2/loss=86.055, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 349.47it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=190.650, player_2/loss=159.080, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.69it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=153.134, player_2/loss=152.494, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.74it/s, env_step=3072, len=26, n/ep=2, n/st=64, player_1/loss=91.097, player_2/loss=137.767, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.39it/s, env_step=4096, len=24, n/ep=2, n/st=64, player_1/loss=53.514, player_2/loss=139.351, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 341.32it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=86.625, player_2/loss=154.445, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 340.80it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=84.269, player_2/loss=162.355, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 340.21it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=59.137, player_2/loss=209.592, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 341.03it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=70.040, player_2/loss=336.356, rew=17.86]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 330.37it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=55.246, player_2/loss=480.767, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.65it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=44.884, player_2/loss=457.048, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 340.89it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=55.077, player_2/loss=475.475, rew=18.75]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 341.06it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=82.965, player_2/loss=458.476, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 339.93it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=57.581, player_2/loss=473.079, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 335.44it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=31.137, player_2/loss=445.860, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 343.34it/s, env_step=15360, len=15, n/ep=6, n/st=64, player_1/loss=28.841, player_2/loss=442.829, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 339.43it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=85.453, player_2/loss=407.588, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 342.04it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=85.253, player_2/loss=421.746, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 342.42it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=19.906, player_2/loss=417.857, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 335.88it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=26.142, player_2/loss=412.382, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 338.98it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=10.656, player_2/loss=331.711, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 341.90it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=14.165, rew=-25.00]          


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.06it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=18.405, player_2/loss=210.343, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 344.43it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=15.530, player_2/loss=173.585, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.30it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=60.070, player_2/loss=130.833, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 341.79it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=142.444, player_2/loss=190.467, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:03, 339.74it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=269.655, player_2/loss=231.339, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:02, 343.83it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=353.352, player_2/loss=125.117, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:03, 335.82it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=381.996, player_2/loss=121.239, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:02, 342.07it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=389.548, player_2/loss=107.438, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:02, 342.66it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=430.509, player_2/loss=68.835, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 341.47it/s, env_step=12288, len=8, n/ep=7, n/st=64, player_1/loss=398.705, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 342.16it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=409.412, player_2/loss=68.762, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:03, 337.11it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=444.480, player_2/loss=58.813, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 342.93it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=390.348, player_2/loss=37.816, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:02, 342.05it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=406.688, player_2/loss=37.404, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:03, 340.60it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_2/loss=36.527, rew=25.00]         


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:03, 340.76it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=408.466, player_2/loss=61.929, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:03, 335.84it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=422.396, player_2/loss=71.874, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:03, 337.12it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=273.787, player_2/loss=15.101, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 341.24it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=263.933, player_2/loss=19.655, rew=-18.75]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 354.55it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=233.496, player_2/loss=87.045, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.28it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=121.581, player_2/loss=187.852, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.62it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=46.344, player_2/loss=254.894, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 342.92it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=39.566, player_2/loss=210.815, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 342.87it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=26.988, player_2/loss=208.131, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 341.69it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=68.614, player_2/loss=236.674, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 342.72it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=66.294, player_2/loss=265.734, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.31it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=12.002, player_2/loss=245.489, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 340.31it/s, env_step=11264, len=13, n/ep=5, n/st=64, player_1/loss=11.483, player_2/loss=230.454, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 341.78it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=9.677, player_2/loss=243.264, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 341.60it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=8.966, player_2/loss=222.124, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 344.04it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=8.935, player_2/loss=232.768, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 338.88it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=6.641, player_2/loss=212.068, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 341.90it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=2.746, player_2/loss=215.529, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 343.38it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=27.564, player_2/loss=235.807, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 343.24it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=30.537, player_2/loss=240.231, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 341.01it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=6.564, player_2/loss=257.164, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 336.83it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=18.073, player_2/loss=185.829, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 342.80it/s, env_step=2048, len=13, n/ep=4, n/st=64, player_1/loss=22.504, player_2/loss=151.307, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 342.48it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=20.190, player_2/loss=124.192, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 342.49it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=13.329, player_2/loss=103.816, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 342.50it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=5.006, player_2/loss=79.581, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 333.02it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=6.350, player_2/loss=94.937, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 342.71it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=8.659, player_2/loss=79.548, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 343.76it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=5.713, player_2/loss=55.180, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 341.30it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=27.473, player_2/loss=49.017, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 341.22it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_2/loss=74.033, rew=-12.50]       


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.30it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=81.642, player_2/loss=103.150, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 342.12it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=106.597, player_2/loss=105.544, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 342.76it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=106.907, player_2/loss=98.138, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 342.74it/s, env_step=14336, len=20, n/ep=3, n/st=64, player_1/loss=88.145, player_2/loss=82.674, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #15: 1025it [00:02, 342.20it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=52.867, player_2/loss=59.046, rew=-12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #16: 1025it [00:03, 338.02it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=63.913, player_2/loss=70.668, rew=8.33]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #17: 1025it [00:02, 343.76it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=90.514, player_2/loss=73.811, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #18: 1025it [00:02, 342.26it/s, env_step=18432, len=24, n/ep=3, n/st=64, player_1/loss=77.226, player_2/loss=80.779, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #19: 1025it [00:02, 342.04it/s, env_step=19456, len=25, n/ep=3, n/st=64, player_1/loss=113.178, player_2/loss=81.118, rew=8.33]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #14


Epoch #1: 1025it [00:03, 333.65it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=153.441, player_2/loss=110.069, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 343.49it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=120.072, player_2/loss=114.147, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 341.81it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=113.873, player_2/loss=143.348, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 354.56it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=127.692, player_2/loss=151.173, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 338.04it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=125.913, player_2/loss=158.649, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:03, 337.95it/s, env_step=6144, len=11, n/ep=4, n/st=64, player_1/loss=100.447, player_2/loss=125.394, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:03, 339.85it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=81.314, player_2/loss=99.372, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:02, 342.67it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=86.220, player_2/loss=60.546, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:02, 341.73it/s, env_step=9216, len=16, n/ep=3, n/st=64, player_1/loss=67.460, player_2/loss=83.610, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:02, 345.11it/s, env_step=10240, len=20, n/ep=4, n/st=64, player_1/loss=86.610, player_2/loss=80.220, rew=0.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:03, 336.67it/s, env_step=11264, len=13, n/ep=6, n/st=64, player_1/loss=111.345, player_2/loss=82.371, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:02, 344.08it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=122.893, player_2/loss=96.155, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:03, 340.61it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=102.440, player_2/loss=104.578, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:03, 341.65it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=67.050, player_2/loss=79.134, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:03, 340.51it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=46.223, player_2/loss=70.699, rew=15.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:03, 332.07it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=30.892, player_2/loss=77.839, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:03, 327.58it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=41.680, player_2/loss=78.693, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:02, 342.93it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=34.537, player_2/loss=70.685, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:02, 341.97it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=59.623, player_2/loss=101.193, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:03, 340.13it/s, env_step=1024, len=7, n/ep=8, n/st=64, player_1/loss=130.895, player_2/loss=148.881, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.85it/s, env_step=2048, len=12, n/ep=6, n/st=64, player_1/loss=264.005, player_2/loss=138.222, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 341.71it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=298.805, player_2/loss=154.034, rew=-5.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 340.95it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=193.773, player_2/loss=147.477, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 342.35it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=181.919, player_2/loss=147.962, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 343.70it/s, env_step=6144, len=19, n/ep=3, n/st=64, player_1/loss=197.972, player_2/loss=126.700, rew=8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.05it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=163.899, player_2/loss=94.726, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 341.29it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=118.353, player_2/loss=116.840, rew=-5.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 342.82it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=136.478, player_2/loss=88.761, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 343.40it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=166.798, player_2/loss=78.800, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 344.44it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=198.662, player_2/loss=42.389, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 332.80it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=145.947, player_2/loss=95.835, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 340.20it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=127.234, player_2/loss=113.868, rew=12.50]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 341.31it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=181.682, player_2/loss=58.511, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 342.27it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=161.002, player_2/loss=28.235, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 337.27it/s, env_step=16384, len=19, n/ep=4, n/st=64, player_1/loss=102.367, player_2/loss=89.595, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 342.13it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=113.974, player_2/loss=114.964, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 341.66it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=117.886, player_2/loss=80.375, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 341.91it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=111.027, player_2/loss=67.783, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 340.86it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=192.526, player_2/loss=31.194, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.18it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=160.568, player_2/loss=45.400, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 341.08it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=107.019, player_2/loss=50.806, rew=-12.50]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 342.33it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=87.130, player_2/loss=77.310, rew=-12.50]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:02, 347.96it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=81.907, player_2/loss=106.863, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:02, 344.48it/s, env_step=6144, len=16, n/ep=3, n/st=64, player_1/loss=82.791, player_2/loss=114.360, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 335.95it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=43.335, player_2/loss=85.889, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 341.19it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_1/loss=45.680, player_2/loss=80.057, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 341.44it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=89.927, player_2/loss=124.527, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:02, 343.31it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=114.617, player_2/loss=224.132, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 335.42it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=102.168, player_2/loss=325.076, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:02, 343.25it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=63.611, player_2/loss=299.656, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 343.31it/s, env_step=13312, len=9, n/ep=8, n/st=64, player_1/loss=70.177, player_2/loss=316.651, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 341.27it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=40.193, player_2/loss=370.185, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:03, 341.01it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=9.535, player_2/loss=353.596, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 337.75it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=14.851, player_2/loss=404.420, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:03, 339.83it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=15.638, player_2/loss=385.841, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 338.94it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=25.261, player_2/loss=386.382, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:03, 341.15it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=39.238, player_2/loss=424.765, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 339.83it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=147.169, player_2/loss=196.756, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 382.21it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=144.518, player_2/loss=136.477, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 392.90it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=139.406, player_2/loss=61.513, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 396.73it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=119.701, player_2/loss=64.525, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 372.35it/s, env_step=5120, len=20, n/ep=4, n/st=64, player_1/loss=145.921, player_2/loss=138.220, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 357.54it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=159.086, player_2/loss=162.533, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 336.17it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=134.628, player_2/loss=173.439, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 411.42it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=161.008, player_2/loss=149.549, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 417.24it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=171.637, player_2/loss=141.458, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 415.46it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=176.175, player_2/loss=116.039, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 425.21it/s, env_step=11264, len=18, n/ep=3, n/st=64, player_1/loss=149.579, player_2/loss=81.466, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 415.02it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=160.096, player_2/loss=63.231, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 403.74it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=141.950, player_2/loss=49.777, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 411.09it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=144.090, player_2/loss=34.469, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 416.03it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=207.536, player_2/loss=28.051, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 416.95it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=224.954, player_2/loss=24.198, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 409.34it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=222.413, player_2/loss=38.732, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 385.96it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=244.605, player_2/loss=29.766, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 408.05it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=274.617, player_2/loss=15.883, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 410.92it/s, env_step=1024, len=14, n/ep=5, n/st=64, player_1/loss=146.501, player_2/loss=102.097, rew=-5.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 407.26it/s, env_step=2048, len=15, n/ep=5, n/st=64, player_1/loss=123.902, player_2/loss=248.637, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 409.13it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=113.384, player_2/loss=420.727, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 403.10it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=116.874, player_2/loss=412.018, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 375.41it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=153.374, player_2/loss=450.375, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 377.26it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=151.350, player_2/loss=503.231, rew=16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 375.43it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=124.071, player_2/loss=555.442, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 363.14it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=56.257, player_2/loss=515.278, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 355.56it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=53.221, player_2/loss=422.432, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 326.35it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=72.151, player_2/loss=385.957, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 334.80it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=46.549, player_2/loss=470.338, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 307.68it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=30.859, player_2/loss=534.131, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 325.91it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=33.052, player_2/loss=528.283, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 383.31it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=38.205, player_2/loss=439.775, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 391.95it/s, env_step=15360, len=12, n/ep=4, n/st=64, player_1/loss=40.323, player_2/loss=387.619, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 402.12it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=83.532, player_2/loss=458.639, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 391.25it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_2/loss=431.798, rew=15.00]       


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 397.33it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=21.185, player_2/loss=392.930, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 388.72it/s, env_step=19456, len=11, n/ep=6, n/st=64, player_1/loss=11.763, player_2/loss=377.619, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 388.05it/s, env_step=1024, len=11, n/ep=5, n/st=64, player_1/loss=25.246, player_2/loss=407.697, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 439.46it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=13.774, player_2/loss=344.026, rew=-15.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 434.91it/s, env_step=3072, len=29, n/ep=2, n/st=64, player_1/loss=33.380, player_2/loss=200.846, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 443.38it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=97.953, player_2/loss=153.651, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 413.13it/s, env_step=5120, len=24, n/ep=3, n/st=64, player_1/loss=124.199, player_2/loss=109.435, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 374.89it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=109.264, player_2/loss=65.297, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 399.03it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=81.340, player_2/loss=80.082, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 375.47it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=133.377, player_2/loss=80.422, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 389.33it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=181.305, player_2/loss=91.623, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 374.33it/s, env_step=10240, len=24, n/ep=3, n/st=64, player_1/loss=182.492, player_2/loss=87.645, rew=-8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 379.65it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=218.717, player_2/loss=82.342, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 353.05it/s, env_step=12288, len=23, n/ep=3, n/st=64, player_1/loss=153.230, player_2/loss=83.124, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 371.28it/s, env_step=13312, len=20, n/ep=4, n/st=64, player_1/loss=130.738, player_2/loss=92.438, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:02, 369.10it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=121.740, player_2/loss=72.723, rew=-8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 335.96it/s, env_step=15360, len=24, n/ep=2, n/st=64, player_1/loss=125.988, player_2/loss=47.008, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 363.91it/s, env_step=16384, len=27, n/ep=2, n/st=64, player_1/loss=139.859, player_2/loss=57.323, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 383.42it/s, env_step=17408, len=20, n/ep=3, n/st=64, player_1/loss=193.812, player_2/loss=64.651, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 380.43it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=201.472, player_2/loss=71.669, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:02, 372.04it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=187.881, player_2/loss=76.182, rew=-15.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:02, 389.50it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=97.945, player_2/loss=260.850, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 385.67it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=73.351, player_2/loss=191.164, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 388.24it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=55.324, player_2/loss=156.877, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 406.89it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=103.882, player_2/loss=152.811, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 392.57it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=128.577, player_2/loss=118.756, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 398.23it/s, env_step=6144, len=17, n/ep=3, n/st=64, player_1/loss=164.888, player_2/loss=154.510, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 357.65it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=166.285, rew=0.00]          


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 391.09it/s, env_step=8192, len=14, n/ep=5, n/st=64, player_1/loss=111.549, player_2/loss=142.149, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 345.00it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=78.280, player_2/loss=201.621, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 341.31it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=38.572, player_2/loss=244.671, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 346.37it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=32.386, player_2/loss=258.740, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 366.61it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=39.965, player_2/loss=262.517, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 336.85it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=46.131, player_2/loss=241.275, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 340.78it/s, env_step=14336, len=8, n/ep=8, n/st=64, player_1/loss=76.101, player_2/loss=247.154, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 341.40it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=71.606, player_2/loss=263.104, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:02, 396.83it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=30.736, player_2/loss=266.888, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:02, 457.74it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=14.502, player_2/loss=250.091, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:02, 413.39it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=12.016, player_2/loss=237.455, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:02, 406.60it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=17.183, player_2/loss=266.183, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:02, 465.12it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=5.352, player_2/loss=191.688, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 441.99it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=5.970, player_2/loss=160.937, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 470.16it/s, env_step=3072, len=21, n/ep=3, n/st=64, player_1/loss=34.762, player_2/loss=128.466, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 467.55it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=67.719, player_2/loss=96.835, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [10:48:55, 37.99s/it, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=131.017, player_2/loss=102.920, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 396.34it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=151.913, player_2/loss=115.460, rew=-8.33]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 406.71it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=201.941, player_2/loss=104.656, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:02, 463.15it/s, env_step=8192, len=17, n/ep=4, n/st=64, player_2/loss=84.829, rew=12.50]          


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:02, 463.94it/s, env_step=9216, len=19, n/ep=4, n/st=64, player_1/loss=185.880, player_2/loss=66.117, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:02, 422.46it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=129.518, player_2/loss=52.129, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 417.01it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=155.937, player_2/loss=50.111, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:02, 393.96it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=193.210, player_2/loss=36.172, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:02, 414.33it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=170.272, player_2/loss=43.130, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 340.88it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=186.544, player_2/loss=45.160, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:02, 368.45it/s, env_step=15360, len=18, n/ep=3, n/st=64, player_1/loss=186.559, player_2/loss=69.739, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:02, 399.88it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=176.568, player_2/loss=89.591, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 377.12it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=170.469, player_2/loss=66.357, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:02, 345.01it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=180.913, player_2/loss=26.396, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 312.36it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=186.342, player_2/loss=29.005, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 326.00it/s, env_step=1024, len=23, n/ep=3, n/st=64, player_1/loss=116.827, player_2/loss=249.546, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 324.76it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=105.977, player_2/loss=207.474, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 307.59it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=115.119, rew=-12.50]        


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 327.52it/s, env_step=4096, len=18, n/ep=4, n/st=64, player_1/loss=107.897, player_2/loss=176.947, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 347.66it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=100.745, player_2/loss=180.037, rew=-12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 337.96it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=88.748, player_2/loss=167.731, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 350.77it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=67.655, player_2/loss=142.221, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 347.09it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=66.051, player_2/loss=157.795, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 350.10it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=73.031, player_2/loss=180.376, rew=-25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 363.81it/s, env_step=10240, len=13, n/ep=4, n/st=64, player_1/loss=55.469, player_2/loss=311.252, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 361.06it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=46.347, player_2/loss=318.978, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 339.44it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=48.419, player_2/loss=327.259, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 415.79it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=53.147, player_2/loss=283.517, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 424.44it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=34.452, rew=0.00]         


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 414.79it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=9.468, player_2/loss=240.493, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 404.90it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=23.248, player_2/loss=225.187, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 364.44it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=28.488, player_2/loss=306.288, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 362.50it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=73.934, player_2/loss=310.467, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 342.07it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=106.204, player_2/loss=259.972, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 346.86it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=40.081, player_2/loss=231.704, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 330.33it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=126.800, player_2/loss=170.218, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 341.08it/s, env_step=3072, len=16, n/ep=3, n/st=64, player_1/loss=189.964, player_2/loss=108.375, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:02, 398.42it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=202.264, player_2/loss=116.000, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:02, 436.92it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=263.974, player_2/loss=152.730, rew=12.50]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 465.49it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=227.907, player_2/loss=145.256, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 458.00it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=188.716, player_2/loss=75.795, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 417.13it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=244.873, player_2/loss=85.326, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 425.32it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=123.944, player_2/loss=95.448, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:02, 418.17it/s, env_step=10240, len=19, n/ep=4, n/st=64, player_2/loss=132.588, rew=-12.50]      


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 416.62it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=90.437, player_2/loss=117.667, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 415.88it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=212.534, player_2/loss=174.190, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 427.87it/s, env_step=13312, len=11, n/ep=5, n/st=64, player_1/loss=285.888, player_2/loss=167.570, rew=15.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 451.48it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=201.391, player_2/loss=141.394, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 423.64it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=145.170, player_2/loss=93.947, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 424.78it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=178.863, player_2/loss=37.471, rew=0.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:02, 405.75it/s, env_step=17408, len=14, n/ep=5, n/st=64, player_1/loss=213.937, player_2/loss=48.065, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 425.48it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=278.314, player_2/loss=54.088, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 426.53it/s, env_step=19456, len=22, n/ep=4, n/st=64, player_1/loss=206.646, player_2/loss=35.552, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 405.27it/s, env_step=1024, len=24, n/ep=3, n/st=64, player_1/loss=76.948, player_2/loss=88.348, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.74it/s, env_step=2048, len=23, n/ep=3, n/st=64, player_1/loss=94.265, player_2/loss=127.892, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 367.96it/s, env_step=3072, len=26, n/ep=3, n/st=64, player_1/loss=72.978, player_2/loss=100.058, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 329.27it/s, env_step=4096, len=17, n/ep=2, n/st=64, player_1/loss=111.876, player_2/loss=101.261, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 354.32it/s, env_step=5120, len=17, n/ep=3, n/st=64, player_1/loss=148.586, player_2/loss=117.051, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 437.98it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=128.162, player_2/loss=110.741, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 412.37it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=84.351, player_2/loss=167.476, rew=5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 459.49it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=73.003, player_2/loss=181.518, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 424.30it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_1/loss=102.436, player_2/loss=233.965, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 449.11it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=124.161, player_2/loss=294.756, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 437.70it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=132.587, player_2/loss=389.272, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 456.29it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=87.463, player_2/loss=406.250, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 461.44it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=28.868, player_2/loss=456.274, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 457.60it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=41.190, player_2/loss=427.933, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 439.48it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=13.408, player_2/loss=432.061, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 461.86it/s, env_step=16384, len=7, n/ep=10, n/st=64, player_1/loss=9.617, player_2/loss=441.760, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 440.43it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=42.350, player_2/loss=403.502, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 432.63it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=40.647, player_2/loss=391.219, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 447.16it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=27.387, player_2/loss=370.970, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 444.25it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=11.830, player_2/loss=219.261, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 452.74it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=49.521, player_2/loss=206.460, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 409.93it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=91.169, player_2/loss=189.755, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 419.00it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=144.818, player_2/loss=309.776, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 409.04it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=119.323, player_2/loss=263.826, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 399.40it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=19.442, player_2/loss=215.764, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 382.98it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=19.071, player_2/loss=83.163, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 360.78it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=55.787, player_2/loss=88.999, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 332.38it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=69.102, player_2/loss=123.712, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 340.28it/s, env_step=10240, len=7, n/ep=10, n/st=64, player_1/loss=168.273, player_2/loss=111.707, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 350.07it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=118.387, player_2/loss=69.666, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 353.54it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=22.999, player_2/loss=27.705, rew=-17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #13: 1025it [00:02, 369.52it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=51.661, player_2/loss=74.135, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #14: 1025it [00:02, 355.03it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=39.843, player_2/loss=78.608, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #15: 1025it [00:03, 341.19it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=41.995, player_2/loss=83.944, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #16: 1025it [00:03, 339.75it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=42.263, player_2/loss=130.126, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #17: 1025it [00:02, 367.26it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=14.046, player_2/loss=84.282, rew=-19.44]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #18: 1025it [00:02, 391.20it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=43.453, player_2/loss=68.518, rew=-8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #19: 1025it [00:03, 316.40it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=59.943, player_2/loss=112.290, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #12


Epoch #1: 1025it [00:02, 366.08it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=161.483, player_2/loss=135.646, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 377.96it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=107.040, player_2/loss=160.259, rew=18.75]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 393.39it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=94.389, player_2/loss=133.751, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 391.87it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=108.664, player_2/loss=125.169, rew=13.89]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 365.85it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=57.628, player_2/loss=123.100, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 357.82it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=96.095, player_2/loss=112.798, rew=18.75]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 335.66it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=103.974, player_2/loss=156.910, rew=13.89]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 321.71it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=82.239, player_2/loss=131.778, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 326.90it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=78.874, player_2/loss=127.263, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 323.60it/s, env_step=10240, len=8, n/ep=7, n/st=64, player_1/loss=126.351, player_2/loss=180.018, rew=17.86]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 327.70it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=131.868, player_2/loss=210.590, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 326.21it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=93.189, player_2/loss=126.787, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 330.41it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=51.730, player_2/loss=55.777, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 330.88it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=23.736, rew=3.57]         


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 334.58it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=96.402, player_2/loss=70.629, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 316.95it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=72.234, player_2/loss=70.454, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 328.51it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=59.108, player_2/loss=53.033, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 334.73it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=63.584, player_2/loss=81.032, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 332.12it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=77.028, player_2/loss=89.652, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 292.60it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=82.993, player_2/loss=108.728, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 291.80it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=49.284, player_2/loss=66.716, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 343.32it/s, env_step=3072, len=9, n/ep=7, n/st=64, player_1/loss=29.843, player_2/loss=38.713, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 334.78it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=102.135, player_2/loss=125.736, rew=5.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:03, 333.42it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=193.759, player_2/loss=184.339, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 330.60it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=211.949, player_2/loss=107.758, rew=16.67]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 338.45it/s, env_step=7168, len=15, n/ep=6, n/st=64, player_1/loss=229.352, player_2/loss=74.402, rew=8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 336.63it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=277.139, player_2/loss=84.959, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 336.36it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_2/loss=71.450, rew=25.00]          


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:03, 336.84it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=239.676, player_2/loss=78.973, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 310.19it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=216.263, player_2/loss=87.259, rew=16.67]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 292.12it/s, env_step=12288, len=12, n/ep=6, n/st=64, player_1/loss=249.223, player_2/loss=78.933, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:02, 355.92it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=299.772, player_2/loss=67.250, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:02, 368.61it/s, env_step=14336, len=12, n/ep=6, n/st=64, player_1/loss=318.323, player_2/loss=41.426, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:02, 371.62it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=280.774, player_2/loss=58.910, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:02, 371.16it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=312.616, player_2/loss=43.484, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:02, 371.29it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=274.783, player_2/loss=36.539, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:02, 367.69it/s, env_step=18432, len=11, n/ep=5, n/st=64, player_1/loss=202.446, player_2/loss=93.041, rew=-5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:02, 353.52it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=199.651, player_2/loss=117.178, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:02, 344.40it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=194.441, player_2/loss=47.255, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.79it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=190.127, player_2/loss=123.940, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.85it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=145.141, player_2/loss=255.099, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 333.97it/s, env_step=4096, len=12, n/ep=6, n/st=64, player_1/loss=83.944, player_2/loss=452.243, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 338.46it/s, env_step=5120, len=11, n/ep=5, n/st=64, player_1/loss=87.484, player_2/loss=512.099, rew=-5.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 334.23it/s, env_step=6144, len=10, n/ep=6, n/st=64, player_1/loss=79.831, player_2/loss=447.640, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 336.03it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=79.003, player_2/loss=451.596, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.82it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=118.993, player_2/loss=531.795, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 336.52it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=64.452, player_2/loss=513.653, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.04it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=50.336, player_2/loss=509.409, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 338.76it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=60.552, player_2/loss=437.119, rew=5.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 336.92it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=99.372, rew=15.00]        


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 339.61it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=77.276, player_2/loss=299.808, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 336.16it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=103.588, player_2/loss=208.637, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 339.10it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=92.994, player_2/loss=238.024, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 342.67it/s, env_step=16384, len=16, n/ep=5, n/st=64, player_1/loss=50.282, player_2/loss=264.156, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 339.72it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=69.137, player_2/loss=263.216, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.31it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=149.645, player_2/loss=293.146, rew=12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 335.96it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=123.296, player_2/loss=235.707, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 338.21it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=81.051, player_2/loss=190.255, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 339.67it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=93.548, player_2/loss=158.508, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 338.86it/s, env_step=3072, len=15, n/ep=5, n/st=64, player_1/loss=174.062, player_2/loss=131.116, rew=5.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 337.80it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=189.703, player_2/loss=141.862, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 340.20it/s, env_step=5120, len=15, n/ep=3, n/st=64, player_1/loss=73.523, player_2/loss=173.912, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 338.33it/s, env_step=6144, len=19, n/ep=4, n/st=64, player_1/loss=97.954, player_2/loss=162.215, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 340.26it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=77.626, player_2/loss=126.932, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 333.43it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=124.965, player_2/loss=129.184, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 337.91it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=166.163, player_2/loss=157.653, rew=-8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.58it/s, env_step=10240, len=26, n/ep=3, n/st=64, player_1/loss=168.688, player_2/loss=146.829, rew=-8.33]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 338.08it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=108.519, player_2/loss=110.267, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 336.73it/s, env_step=12288, len=22, n/ep=3, n/st=64, player_1/loss=118.358, player_2/loss=112.564, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 338.18it/s, env_step=13312, len=20, n/ep=4, n/st=64, player_1/loss=125.195, player_2/loss=100.571, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 336.91it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=161.562, player_2/loss=73.613, rew=0.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 335.71it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=191.902, player_2/loss=40.174, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.65it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=153.020, player_2/loss=30.559, rew=12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 337.45it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=130.326, player_2/loss=21.379, rew=5.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 336.42it/s, env_step=18432, len=17, n/ep=4, n/st=64, player_1/loss=109.565, player_2/loss=54.892, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 339.84it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=109.248, player_2/loss=82.262, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 331.99it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=140.464, player_2/loss=184.594, rew=0.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.41it/s, env_step=2048, len=20, n/ep=4, n/st=64, player_1/loss=107.280, player_2/loss=215.966, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.49it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=55.218, player_2/loss=221.334, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.86it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=70.344, player_2/loss=240.429, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.29it/s, env_step=5120, len=7, n/ep=8, n/st=64, player_2/loss=239.133, rew=25.00]          


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.63it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=30.200, player_2/loss=236.935, rew=17.86]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 335.97it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=19.853, player_2/loss=253.612, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 338.69it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=15.550, player_2/loss=264.020, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 335.59it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=19.609, rew=19.44]           


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 337.66it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=15.569, player_2/loss=267.854, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 334.57it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=6.065, player_2/loss=280.300, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 333.87it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=35.726, player_2/loss=265.198, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 335.55it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=51.374, player_2/loss=259.120, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.65it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=30.722, player_2/loss=253.765, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 337.77it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=25.865, player_2/loss=234.956, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 338.03it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=22.597, player_2/loss=242.393, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 349.28it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=55.223, player_2/loss=219.913, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.39it/s, env_step=18432, len=10, n/ep=8, n/st=64, player_1/loss=63.879, player_2/loss=237.649, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 334.92it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=15.196, player_2/loss=245.562, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 335.45it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=43.468, player_2/loss=255.979, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.82it/s, env_step=2048, len=7, n/ep=8, n/st=64, player_1/loss=326.102, player_2/loss=261.818, rew=18.75]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 335.39it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=610.115, player_2/loss=130.781, rew=18.75]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 334.73it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=725.251, rew=18.75]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 335.66it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=684.088, player_2/loss=54.312, rew=18.75]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 336.48it/s, env_step=6144, len=8, n/ep=9, n/st=64, player_1/loss=609.839, player_2/loss=58.882, rew=19.44]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 337.40it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=533.888, player_2/loss=61.126, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 336.30it/s, env_step=8192, len=9, n/ep=6, n/st=64, player_1/loss=669.623, player_2/loss=24.621, rew=16.67]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 337.57it/s, env_step=9216, len=12, n/ep=7, n/st=64, player_1/loss=739.350, player_2/loss=23.079, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 336.32it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=643.985, player_2/loss=39.996, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 337.79it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=574.242, player_2/loss=58.001, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 336.19it/s, env_step=12288, len=7, n/ep=7, n/st=64, player_1/loss=565.404, player_2/loss=49.282, rew=10.71]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 335.41it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=705.417, player_2/loss=20.599, rew=18.75]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 335.61it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=697.875, player_2/loss=31.307, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 334.40it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=529.959, player_2/loss=48.003, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 336.08it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=500.200, player_2/loss=49.369, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 337.56it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=549.191, player_2/loss=18.287, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 335.82it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=589.714, player_2/loss=19.812, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 335.64it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=647.211, player_2/loss=55.520, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 335.37it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=560.724, player_2/loss=64.180, rew=-18.75]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.69it/s, env_step=2048, len=9, n/ep=6, n/st=64, player_1/loss=442.438, player_2/loss=35.293, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.79it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=319.560, player_2/loss=42.870, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.64it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=253.147, player_2/loss=107.375, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.16it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=183.990, player_2/loss=125.628, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 327.26it/s, env_step=6144, len=11, n/ep=5, n/st=64, player_1/loss=130.028, player_2/loss=116.120, rew=5.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:03, 334.09it/s, env_step=7168, len=13, n/ep=5, n/st=64, player_1/loss=99.617, player_2/loss=157.608, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:03, 337.54it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=49.727, player_2/loss=186.145, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:03, 339.05it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=38.007, player_2/loss=220.480, rew=25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:03, 337.90it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=37.814, rew=25.00]        


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:03, 337.85it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=48.826, player_2/loss=212.045, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 336.55it/s, env_step=12288, len=27, n/ep=3, n/st=64, player_1/loss=98.181, player_2/loss=172.848, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:03, 338.00it/s, env_step=13312, len=23, n/ep=2, n/st=64, player_1/loss=108.273, player_2/loss=171.954, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:03, 338.27it/s, env_step=14336, len=22, n/ep=3, n/st=64, player_1/loss=73.802, player_2/loss=126.839, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:03, 338.45it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=47.558, player_2/loss=163.361, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:03, 339.38it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=44.871, player_2/loss=222.292, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:03, 336.91it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=39.589, player_2/loss=234.802, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:02, 350.42it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=41.070, player_2/loss=250.139, rew=12.50]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:03, 331.68it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=49.753, player_2/loss=267.049, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:03, 338.94it/s, env_step=1024, len=16, n/ep=3, n/st=64, player_1/loss=34.501, player_2/loss=163.066, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.40it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=26.392, player_2/loss=141.752, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 337.27it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=23.257, player_2/loss=114.358, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 337.31it/s, env_step=4096, len=17, n/ep=4, n/st=64, player_1/loss=22.818, player_2/loss=128.415, rew=-12.50]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 336.47it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=42.525, player_2/loss=117.455, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 338.15it/s, env_step=6144, len=20, n/ep=4, n/st=64, player_1/loss=48.226, player_2/loss=148.578, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 333.12it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=194.180, player_2/loss=137.510, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 332.68it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=234.491, player_2/loss=91.466, rew=-5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 339.23it/s, env_step=9216, len=17, n/ep=4, n/st=64, player_1/loss=91.933, player_2/loss=98.951, rew=-12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 338.76it/s, env_step=10240, len=17, n/ep=4, n/st=64, player_1/loss=104.772, player_2/loss=103.276, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 336.78it/s, env_step=11264, len=21, n/ep=2, n/st=64, player_1/loss=240.304, player_2/loss=79.691, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 338.54it/s, env_step=12288, len=18, n/ep=4, n/st=64, player_1/loss=228.748, player_2/loss=75.274, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 339.94it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=153.644, player_2/loss=65.611, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 336.93it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=199.867, player_2/loss=72.632, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 338.18it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=165.301, player_2/loss=116.511, rew=8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 339.32it/s, env_step=16384, len=19, n/ep=4, n/st=64, player_1/loss=174.418, player_2/loss=120.110, rew=12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 338.57it/s, env_step=17408, len=18, n/ep=3, n/st=64, player_1/loss=185.345, player_2/loss=100.785, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 338.07it/s, env_step=18432, len=19, n/ep=4, n/st=64, player_1/loss=170.533, player_2/loss=108.837, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 337.70it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=175.653, player_2/loss=78.808, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 335.20it/s, env_step=1024, len=11, n/ep=6, n/st=64, player_1/loss=57.302, player_2/loss=191.039, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 335.33it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=42.784, player_2/loss=214.505, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.98it/s, env_step=3072, len=8, n/ep=6, n/st=64, player_1/loss=38.868, player_2/loss=226.920, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 335.40it/s, env_step=4096, len=8, n/ep=7, n/st=64, player_1/loss=29.331, player_2/loss=256.112, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.72it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=16.303, player_2/loss=231.099, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.47it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=16.357, player_2/loss=228.557, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 335.14it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=10.571, player_2/loss=216.610, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 334.40it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=6.565, player_2/loss=228.544, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 336.90it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=7.519, player_2/loss=234.021, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.31it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=6.234, player_2/loss=258.520, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 333.57it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=57.543, player_2/loss=207.858, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 335.40it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=61.460, player_2/loss=233.799, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 339.11it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=8.273, player_2/loss=278.910, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 335.83it/s, env_step=14336, len=9, n/ep=8, n/st=64, player_1/loss=8.070, player_2/loss=248.927, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 338.85it/s, env_step=15360, len=10, n/ep=7, n/st=64, player_1/loss=7.120, player_2/loss=182.847, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 336.68it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=20.742, player_2/loss=194.770, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 337.56it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=20.277, player_2/loss=244.342, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 337.41it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=4.981, player_2/loss=241.941, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 349.36it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=5.901, player_2/loss=241.022, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 338.07it/s, env_step=1024, len=9, n/ep=7, n/st=64, player_1/loss=127.819, player_2/loss=194.657, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.38it/s, env_step=2048, len=17, n/ep=4, n/st=64, player_1/loss=84.127, player_2/loss=149.942, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 334.99it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=58.031, player_2/loss=115.995, rew=8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 335.29it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=185.513, player_2/loss=100.959, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:03, 336.59it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=285.088, player_2/loss=101.344, rew=-8.33]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 336.16it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=331.778, player_2/loss=83.026, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 335.75it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=511.776, player_2/loss=32.792, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 336.59it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=335.188, player_2/loss=25.062, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 335.43it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=284.603, player_2/loss=61.804, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:03, 339.06it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=414.995, player_2/loss=43.387, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 341.66it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=619.971, player_2/loss=49.973, rew=0.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 338.71it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=554.394, player_2/loss=27.356, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 338.60it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=473.720, player_2/loss=75.057, rew=12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 336.83it/s, env_step=14336, len=15, n/ep=4, n/st=64, player_1/loss=464.845, player_2/loss=69.124, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:03, 334.99it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=492.103, player_2/loss=20.107, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 334.96it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_1/loss=282.783, player_2/loss=16.013, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:03, 337.15it/s, env_step=17408, len=19, n/ep=3, n/st=64, player_1/loss=125.935, player_2/loss=14.249, rew=8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 337.85it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=148.379, player_2/loss=51.281, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:03, 339.61it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=158.568, rew=-8.33]       


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 337.56it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=11.732, player_2/loss=75.048, rew=12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.52it/s, env_step=2048, len=16, n/ep=3, n/st=64, player_1/loss=84.987, player_2/loss=84.391, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.17it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=147.838, player_2/loss=101.570, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 338.17it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=149.784, player_2/loss=105.609, rew=0.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.08it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=108.881, player_2/loss=113.776, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #6: 1025it [00:03, 336.26it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=66.856, player_2/loss=81.253, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #7: 1025it [00:03, 337.58it/s, env_step=7168, len=24, n/ep=3, n/st=64, player_1/loss=83.009, player_2/loss=108.188, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #8: 1025it [00:03, 333.98it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=101.705, player_2/loss=118.659, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #9: 1025it [00:03, 339.28it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=125.493, player_2/loss=131.920, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #10: 1025it [00:03, 340.23it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=80.700, player_2/loss=177.864, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #11: 1025it [00:03, 338.36it/s, env_step=11264, len=20, n/ep=4, n/st=64, player_1/loss=13.610, player_2/loss=233.566, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #12: 1025it [00:03, 339.69it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=50.406, player_2/loss=224.766, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #13: 1025it [00:03, 334.52it/s, env_step=13312, len=17, n/ep=3, n/st=64, player_1/loss=53.180, player_2/loss=253.867, rew=-8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #14: 1025it [00:03, 334.27it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=63.001, player_2/loss=268.186, rew=-15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #15: 1025it [00:03, 337.33it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=117.427, player_2/loss=237.690, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #16: 1025it [00:03, 337.44it/s, env_step=16384, len=11, n/ep=6, n/st=64, player_1/loss=129.599, player_2/loss=210.915, rew=16.67]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #17: 1025it [00:03, 337.19it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=86.376, player_2/loss=253.857, rew=10.71]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #18: 1025it [00:03, 335.08it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=67.037, player_2/loss=319.841, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #19: 1025it [00:03, 335.20it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=86.972, player_2/loss=301.130, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #5


Epoch #1: 1025it [00:02, 350.14it/s, env_step=1024, len=23, n/ep=3, n/st=64, player_1/loss=138.351, player_2/loss=86.014, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 340.80it/s, env_step=2048, len=13, n/ep=6, n/st=64, player_1/loss=133.976, player_2/loss=103.728, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 338.33it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=89.984, player_2/loss=95.968, rew=-19.44]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 339.68it/s, env_step=4096, len=17, n/ep=3, n/st=64, player_1/loss=110.418, player_2/loss=125.983, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 336.28it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=138.178, player_2/loss=98.371, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 342.32it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=108.483, player_2/loss=102.684, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.63it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=115.249, player_2/loss=102.173, rew=-18.75]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.18it/s, env_step=8192, len=7, n/ep=10, n/st=64, player_1/loss=154.949, player_2/loss=116.940, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 337.17it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=158.656, player_2/loss=140.897, rew=15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 337.60it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=120.479, player_2/loss=143.607, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 340.29it/s, env_step=11264, len=19, n/ep=4, n/st=64, player_1/loss=112.484, player_2/loss=171.163, rew=0.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 337.27it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=117.250, player_2/loss=143.239, rew=-12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 335.75it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=117.699, player_2/loss=126.662, rew=-19.44]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 339.63it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=94.487, player_2/loss=121.388, rew=-25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 341.02it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=121.109, player_2/loss=181.920, rew=-18.75]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.62it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=128.873, player_2/loss=178.492, rew=-5.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 337.28it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=108.574, player_2/loss=125.714, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 339.23it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=90.430, player_2/loss=93.611, rew=-12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 340.06it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=62.311, player_2/loss=91.684, rew=-18.75]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 336.41it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=48.895, player_2/loss=261.351, rew=19.44]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 333.55it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=89.108, player_2/loss=245.981, rew=13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.44it/s, env_step=3072, len=7, n/ep=8, n/st=64, player_1/loss=107.998, player_2/loss=252.989, rew=6.25]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.28it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=89.498, rew=25.00]           


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.13it/s, env_step=5120, len=9, n/ep=6, n/st=64, player_1/loss=78.024, player_2/loss=241.762, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.29it/s, env_step=6144, len=7, n/ep=7, n/st=64, player_1/loss=60.437, player_2/loss=224.781, rew=10.71]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.57it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=79.811, player_2/loss=190.578, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.56it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=57.655, player_2/loss=215.161, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 333.31it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=63.835, player_2/loss=265.363, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.37it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=98.882, player_2/loss=275.994, rew=17.86]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 338.43it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=64.867, player_2/loss=285.923, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.52it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=67.395, player_2/loss=240.683, rew=18.75]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 334.98it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=63.829, player_2/loss=239.674, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 336.57it/s, env_step=14336, len=10, n/ep=8, n/st=64, player_1/loss=47.680, player_2/loss=232.547, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 336.27it/s, env_step=15360, len=9, n/ep=8, n/st=64, player_1/loss=87.855, player_2/loss=186.636, rew=18.75]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 337.69it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=74.720, player_2/loss=209.194, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 336.36it/s, env_step=17408, len=7, n/ep=7, n/st=64, player_1/loss=91.866, player_2/loss=223.428, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 336.64it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=100.848, player_2/loss=230.893, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.71it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=64.467, player_2/loss=249.981, rew=12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 339.20it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=36.703, player_2/loss=194.712, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 353.32it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=93.882, player_2/loss=206.130, rew=-18.75]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 339.22it/s, env_step=3072, len=20, n/ep=3, n/st=64, player_1/loss=195.275, player_2/loss=237.482, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 338.84it/s, env_step=4096, len=15, n/ep=5, n/st=64, player_1/loss=174.302, player_2/loss=242.562, rew=15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 334.37it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=139.401, player_2/loss=143.948, rew=12.50]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 338.64it/s, env_step=6144, len=15, n/ep=4, n/st=64, player_1/loss=199.283, player_2/loss=121.293, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 336.97it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=294.229, player_2/loss=158.995, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 337.87it/s, env_step=8192, len=19, n/ep=3, n/st=64, player_1/loss=311.519, player_2/loss=107.390, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 337.12it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=274.976, player_2/loss=47.754, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 339.38it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=222.776, player_2/loss=39.621, rew=12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 337.42it/s, env_step=11264, len=16, n/ep=4, n/st=64, player_1/loss=277.141, player_2/loss=37.031, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 340.74it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=293.647, player_2/loss=93.995, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 338.23it/s, env_step=13312, len=8, n/ep=8, n/st=64, player_1/loss=212.884, player_2/loss=122.055, rew=-12.50]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 339.23it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=161.790, player_2/loss=96.966, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 337.17it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=156.752, player_2/loss=147.671, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 337.10it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=160.757, player_2/loss=104.997, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 338.70it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=182.819, player_2/loss=70.402, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 332.61it/s, env_step=18432, len=18, n/ep=3, n/st=64, player_1/loss=229.356, player_2/loss=45.984, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 336.49it/s, env_step=19456, len=16, n/ep=3, n/st=64, player_1/loss=244.363, player_2/loss=49.264, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 337.07it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=151.566, player_2/loss=141.069, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.04it/s, env_step=2048, len=18, n/ep=4, n/st=64, player_1/loss=148.840, player_2/loss=131.534, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.77it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=172.900, player_2/loss=122.532, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.72it/s, env_step=4096, len=24, n/ep=3, n/st=64, player_1/loss=187.832, player_2/loss=148.089, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:03, 338.66it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=147.361, player_2/loss=169.385, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 338.58it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=123.996, player_2/loss=174.083, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 338.34it/s, env_step=7168, len=26, n/ep=3, n/st=64, player_1/loss=106.774, player_2/loss=194.424, rew=16.67]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 337.55it/s, env_step=8192, len=18, n/ep=4, n/st=64, player_1/loss=106.619, player_2/loss=207.918, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 339.05it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=113.433, player_2/loss=199.402, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:03, 339.17it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=94.618, player_2/loss=142.059, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 337.49it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=110.152, player_2/loss=112.450, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 336.72it/s, env_step=12288, len=20, n/ep=4, n/st=64, player_1/loss=99.350, player_2/loss=135.282, rew=0.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 338.57it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=98.123, player_2/loss=102.518, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 337.64it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=93.059, player_2/loss=73.519, rew=-8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:03, 339.21it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=78.231, player_2/loss=91.099, rew=-12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 336.68it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=72.252, player_2/loss=83.519, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:03, 335.93it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=80.882, player_2/loss=72.607, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 335.28it/s, env_step=18432, len=14, n/ep=5, n/st=64, player_1/loss=71.614, player_2/loss=112.440, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:03, 337.17it/s, env_step=19456, len=18, n/ep=4, n/st=64, player_1/loss=55.645, player_2/loss=134.863, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 337.36it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=67.443, player_2/loss=130.913, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 341.01it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=76.318, player_2/loss=125.096, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 353.32it/s, env_step=3072, len=15, n/ep=4, n/st=64, player_1/loss=72.944, player_2/loss=108.029, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 334.46it/s, env_step=4096, len=15, n/ep=3, n/st=64, player_1/loss=81.974, player_2/loss=92.060, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 337.64it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=68.979, player_2/loss=68.728, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 338.45it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=71.815, player_2/loss=81.566, rew=-15.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 336.62it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=78.109, player_2/loss=95.145, rew=-8.33]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 338.62it/s, env_step=8192, len=14, n/ep=4, n/st=64, player_1/loss=119.866, player_2/loss=98.580, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 337.26it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=120.484, player_2/loss=87.106, rew=-12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 338.14it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=136.992, player_2/loss=114.646, rew=-12.50]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 340.25it/s, env_step=11264, len=16, n/ep=3, n/st=64, player_1/loss=155.613, player_2/loss=128.627, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 341.73it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=142.402, player_2/loss=122.018, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 334.79it/s, env_step=13312, len=26, n/ep=3, n/st=64, player_1/loss=172.731, rew=-8.33]       


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 339.68it/s, env_step=14336, len=23, n/ep=3, n/st=64, player_1/loss=161.813, player_2/loss=75.891, rew=8.33]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 338.45it/s, env_step=15360, len=19, n/ep=3, n/st=64, player_1/loss=138.026, player_2/loss=87.942, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 339.95it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=147.386, player_2/loss=112.023, rew=-8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 336.88it/s, env_step=17408, len=27, n/ep=3, n/st=64, player_1/loss=160.789, player_2/loss=193.976, rew=-25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 337.59it/s, env_step=18432, len=30, n/ep=2, n/st=64, player_1/loss=173.791, player_2/loss=151.899, rew=-25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 339.35it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=170.276, player_2/loss=66.503, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 336.18it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=66.430, player_2/loss=99.429, rew=-25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 341.43it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=73.419, player_2/loss=86.997, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 338.81it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=80.637, player_2/loss=70.312, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 336.65it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=87.036, player_2/loss=92.603, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.30it/s, env_step=5120, len=25, n/ep=2, n/st=64, player_1/loss=117.286, player_2/loss=99.117, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 336.12it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=141.204, player_2/loss=141.414, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.02it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=118.269, player_2/loss=150.866, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 334.01it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=56.205, player_2/loss=220.316, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 335.44it/s, env_step=9216, len=9, n/ep=6, n/st=64, player_1/loss=64.829, player_2/loss=210.365, rew=16.67]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 337.02it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=63.885, player_2/loss=238.620, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 336.62it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=66.109, player_2/loss=245.419, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 336.99it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=79.730, player_2/loss=242.703, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 339.28it/s, env_step=13312, len=10, n/ep=6, n/st=64, player_1/loss=50.840, player_2/loss=226.879, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 338.84it/s, env_step=14336, len=11, n/ep=5, n/st=64, player_1/loss=56.174, player_2/loss=256.110, rew=15.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 335.92it/s, env_step=15360, len=9, n/ep=8, n/st=64, player_1/loss=61.134, player_2/loss=286.443, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 338.47it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=38.542, player_2/loss=332.301, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.01it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=27.861, player_2/loss=374.590, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 338.93it/s, env_step=18432, len=9, n/ep=6, n/st=64, player_1/loss=51.437, player_2/loss=248.364, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 335.85it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=72.509, rew=25.00]        


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 336.17it/s, env_step=1024, len=17, n/ep=3, n/st=64, player_1/loss=51.305, player_2/loss=130.797, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 337.70it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=122.783, player_2/loss=104.196, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.02it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=180.533, player_2/loss=122.724, rew=15.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 351.55it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=263.123, player_2/loss=119.328, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 336.81it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=290.723, player_2/loss=66.217, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 338.10it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=248.543, player_2/loss=37.098, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 337.78it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=287.608, player_2/loss=37.979, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 337.39it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=236.434, player_2/loss=45.558, rew=5.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 338.75it/s, env_step=9216, len=13, n/ep=4, n/st=64, player_1/loss=226.181, player_2/loss=51.052, rew=12.50]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 337.35it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=233.785, player_2/loss=31.112, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 338.70it/s, env_step=11264, len=13, n/ep=4, n/st=64, player_1/loss=231.622, player_2/loss=26.666, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 323.88it/s, env_step=12288, len=16, n/ep=5, n/st=64, player_1/loss=227.407, player_2/loss=33.057, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 336.48it/s, env_step=13312, len=14, n/ep=4, n/st=64, player_1/loss=187.376, player_2/loss=78.910, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 334.99it/s, env_step=14336, len=14, n/ep=3, n/st=64, player_1/loss=212.297, player_2/loss=81.584, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 336.47it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=252.374, player_2/loss=37.853, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 338.98it/s, env_step=16384, len=12, n/ep=6, n/st=64, player_1/loss=200.123, player_2/loss=45.850, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.43it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=163.236, player_2/loss=30.302, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 335.14it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=180.892, player_2/loss=36.370, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 336.20it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=222.275, player_2/loss=39.352, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 335.83it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=140.102, player_2/loss=40.037, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.01it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=71.832, player_2/loss=72.640, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 337.69it/s, env_step=3072, len=19, n/ep=4, n/st=64, player_1/loss=48.481, player_2/loss=136.031, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 338.17it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=66.505, player_2/loss=145.718, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 334.97it/s, env_step=5120, len=15, n/ep=5, n/st=64, player_1/loss=41.246, player_2/loss=130.398, rew=15.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 339.83it/s, env_step=6144, len=15, n/ep=5, n/st=64, player_1/loss=25.592, player_2/loss=146.231, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 337.75it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=33.801, player_2/loss=167.142, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 336.38it/s, env_step=8192, len=19, n/ep=4, n/st=64, player_1/loss=32.883, player_2/loss=154.022, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 335.57it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=21.391, player_2/loss=141.040, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 336.82it/s, env_step=10240, len=18, n/ep=4, n/st=64, player_1/loss=11.129, player_2/loss=128.465, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 338.93it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=75.032, player_2/loss=129.021, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 338.08it/s, env_step=12288, len=16, n/ep=3, n/st=64, player_1/loss=78.976, player_2/loss=133.804, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:02, 342.09it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=15.144, player_2/loss=137.077, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 337.57it/s, env_step=14336, len=21, n/ep=3, n/st=64, player_1/loss=33.832, player_2/loss=114.144, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 339.60it/s, env_step=15360, len=15, n/ep=5, n/st=64, player_1/loss=28.801, player_2/loss=145.335, rew=15.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 332.61it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=23.930, player_2/loss=153.493, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 338.27it/s, env_step=17408, len=18, n/ep=4, n/st=64, player_2/loss=131.654, rew=12.50]       


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 338.70it/s, env_step=18432, len=15, n/ep=5, n/st=64, player_1/loss=25.630, player_2/loss=126.331, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 337.12it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=26.465, player_2/loss=139.445, rew=0.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 337.96it/s, env_step=1024, len=19, n/ep=3, n/st=64, player_1/loss=20.635, player_2/loss=114.649, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.35it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=35.840, player_2/loss=109.411, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.93it/s, env_step=3072, len=11, n/ep=6, n/st=64, player_1/loss=119.773, player_2/loss=117.728, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.85it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=202.451, player_2/loss=92.761, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 353.17it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=210.222, player_2/loss=73.086, rew=15.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 340.37it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=223.444, player_2/loss=84.927, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.45it/s, env_step=7168, len=15, n/ep=4, n/st=64, player_1/loss=232.012, player_2/loss=103.308, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 338.31it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=257.919, player_2/loss=88.739, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 334.87it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=220.408, player_2/loss=56.009, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.58it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=214.939, player_2/loss=60.773, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.18it/s, env_step=11264, len=12, n/ep=6, n/st=64, player_1/loss=202.039, player_2/loss=54.986, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.90it/s, env_step=12288, len=12, n/ep=6, n/st=64, player_2/loss=49.404, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 338.89it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=232.375, player_2/loss=46.508, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.73it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=178.168, player_2/loss=51.392, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 338.49it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=158.528, player_2/loss=85.769, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 336.86it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=198.285, player_2/loss=72.022, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 339.79it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=234.103, player_2/loss=39.732, rew=12.50]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.98it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=277.524, player_2/loss=65.180, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 339.00it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=246.953, player_2/loss=99.540, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 337.35it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=237.765, player_2/loss=122.163, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 333.97it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=170.762, player_2/loss=113.284, rew=-5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.15it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=149.567, player_2/loss=101.953, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.59it/s, env_step=4096, len=11, n/ep=5, n/st=64, player_1/loss=159.276, player_2/loss=139.285, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 335.11it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=108.193, player_2/loss=204.805, rew=19.44]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 335.73it/s, env_step=6144, len=7, n/ep=9, n/st=64, player_1/loss=40.164, rew=13.89]           


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 336.30it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=75.269, player_2/loss=331.424, rew=12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 339.05it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=72.838, player_2/loss=283.245, rew=13.89]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 338.19it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=42.156, player_2/loss=320.564, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 335.59it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=125.623, player_2/loss=327.959, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.07it/s, env_step=11264, len=8, n/ep=8, n/st=64, player_1/loss=99.836, player_2/loss=352.109, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 338.03it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=47.712, player_2/loss=335.416, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 333.78it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=58.378, player_2/loss=326.006, rew=18.75]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 336.32it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=11.772, player_2/loss=325.958, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 336.56it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=43.423, player_2/loss=301.619, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 335.86it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=67.970, player_2/loss=275.260, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 336.64it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=35.787, player_2/loss=297.462, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 337.86it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=11.129, player_2/loss=346.789, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 336.10it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=4.809, player_2/loss=366.418, rew=0.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 339.59it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=62.132, player_2/loss=293.781, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.22it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=39.458, player_2/loss=244.945, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.82it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=127.964, player_2/loss=161.335, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 339.04it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=268.217, player_2/loss=94.594, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 337.70it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=328.538, player_2/loss=57.850, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 341.27it/s, env_step=6144, len=14, n/ep=5, n/st=64, player_1/loss=284.579, player_2/loss=48.955, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 339.59it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=268.656, player_2/loss=66.898, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 339.33it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=272.795, rew=25.00]         


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 339.94it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=299.732, player_2/loss=42.410, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 339.26it/s, env_step=10240, len=15, n/ep=5, n/st=64, player_1/loss=309.622, player_2/loss=19.310, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 338.99it/s, env_step=11264, len=12, n/ep=5, n/st=64, player_1/loss=317.240, player_2/loss=43.120, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 339.03it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=308.041, player_2/loss=52.483, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 338.86it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=259.702, player_2/loss=71.387, rew=16.67]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 335.59it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=223.722, player_2/loss=46.397, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 334.39it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_2/loss=63.970, rew=15.00]        


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 338.42it/s, env_step=16384, len=11, n/ep=5, n/st=64, player_1/loss=264.666, player_2/loss=59.593, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 337.02it/s, env_step=17408, len=12, n/ep=6, n/st=64, player_1/loss=294.128, player_2/loss=47.864, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 339.27it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=309.274, player_2/loss=89.768, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 340.09it/s, env_step=19456, len=12, n/ep=5, n/st=64, player_1/loss=269.836, player_2/loss=81.590, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 338.59it/s, env_step=1024, len=12, n/ep=6, n/st=64, player_1/loss=90.780, player_2/loss=111.688, rew=16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.35it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=71.025, player_2/loss=166.859, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 336.84it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=71.421, rew=25.00]          


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 338.99it/s, env_step=4096, len=11, n/ep=6, n/st=64, player_1/loss=66.005, player_2/loss=280.691, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 336.45it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=35.882, player_2/loss=356.628, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 337.49it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=18.922, player_2/loss=338.144, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 338.15it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=15.681, player_2/loss=355.230, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 336.62it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=12.713, player_2/loss=353.177, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 337.73it/s, env_step=9216, len=10, n/ep=6, n/st=64, player_1/loss=12.122, player_2/loss=370.173, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 338.24it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=17.205, player_2/loss=418.085, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 337.63it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=16.364, player_2/loss=367.838, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 334.49it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=4.426, player_2/loss=365.186, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 337.69it/s, env_step=13312, len=10, n/ep=7, n/st=64, player_1/loss=3.924, player_2/loss=398.187, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 331.88it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=6.811, player_2/loss=373.300, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 336.57it/s, env_step=15360, len=13, n/ep=5, n/st=64, player_1/loss=18.252, player_2/loss=393.858, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 336.95it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=18.942, player_2/loss=363.542, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 337.74it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=21.534, player_2/loss=304.064, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 341.11it/s, env_step=18432, len=11, n/ep=6, n/st=64, player_2/loss=262.493, rew=25.00]       


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 339.15it/s, env_step=19456, len=13, n/ep=7, n/st=64, player_1/loss=40.839, player_2/loss=289.177, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 337.98it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=5.820, player_2/loss=305.814, rew=-16.67]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.90it/s, env_step=2048, len=23, n/ep=2, n/st=64, player_1/loss=39.261, player_2/loss=164.217, rew=0.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 339.34it/s, env_step=3072, len=27, n/ep=2, n/st=64, player_1/loss=101.861, player_2/loss=83.909, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 337.70it/s, env_step=4096, len=20, n/ep=3, n/st=64, player_1/loss=129.669, player_2/loss=124.805, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 332.74it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=121.346, player_2/loss=108.684, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 339.91it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=128.768, player_2/loss=102.604, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:02, 345.12it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=149.050, player_2/loss=74.675, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 345.09it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=159.994, player_2/loss=62.407, rew=-25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 339.42it/s, env_step=9216, len=15, n/ep=4, n/st=64, player_1/loss=175.689, player_2/loss=47.857, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 339.79it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=204.116, player_2/loss=28.665, rew=12.50]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 338.10it/s, env_step=11264, len=14, n/ep=4, n/st=64, player_1/loss=200.150, player_2/loss=35.255, rew=-12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 339.71it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_2/loss=41.934, rew=25.00]        


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 338.96it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=158.515, player_2/loss=51.465, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 339.53it/s, env_step=14336, len=16, n/ep=4, n/st=64, player_1/loss=173.494, player_2/loss=51.509, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 338.84it/s, env_step=15360, len=15, n/ep=3, n/st=64, player_1/loss=198.615, player_2/loss=28.730, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 337.61it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=224.618, player_2/loss=15.777, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 338.25it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=203.291, player_2/loss=19.408, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 338.65it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=215.276, player_2/loss=38.980, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 337.49it/s, env_step=19456, len=14, n/ep=4, n/st=64, player_1/loss=209.574, player_2/loss=34.924, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 335.67it/s, env_step=1024, len=16, n/ep=4, n/st=64, player_1/loss=127.790, player_2/loss=24.641, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.29it/s, env_step=2048, len=14, n/ep=5, n/st=64, player_1/loss=102.959, player_2/loss=55.683, rew=5.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.15it/s, env_step=3072, len=16, n/ep=5, n/st=64, player_1/loss=87.241, player_2/loss=85.950, rew=-15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.89it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=83.232, player_2/loss=113.919, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 339.90it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=95.730, player_2/loss=70.625, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 340.40it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=83.074, player_2/loss=40.082, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:03, 336.68it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=43.969, player_2/loss=18.684, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:03, 339.33it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=61.686, player_2/loss=48.756, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:03, 336.37it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=67.485, player_2/loss=64.703, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:03, 336.45it/s, env_step=10240, len=19, n/ep=3, n/st=64, player_1/loss=81.587, player_2/loss=36.687, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:03, 338.16it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=145.294, player_2/loss=146.798, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 335.16it/s, env_step=12288, len=9, n/ep=7, n/st=64, player_1/loss=148.602, player_2/loss=248.940, rew=17.86]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:03, 336.39it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=83.779, player_2/loss=265.325, rew=8.33]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:03, 337.11it/s, env_step=14336, len=8, n/ep=7, n/st=64, player_1/loss=52.666, player_2/loss=261.839, rew=17.86]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:03, 338.59it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=45.184, player_2/loss=305.971, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:03, 338.43it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=44.108, player_2/loss=287.886, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:03, 336.60it/s, env_step=17408, len=9, n/ep=7, n/st=64, player_1/loss=43.815, player_2/loss=267.995, rew=3.57]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:03, 336.95it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=39.530, player_2/loss=282.396, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:03, 339.68it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=24.151, player_2/loss=260.272, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:03, 337.12it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=38.663, player_2/loss=239.317, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 333.19it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=36.593, player_2/loss=233.180, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 337.07it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=49.603, player_2/loss=152.496, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 340.25it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=89.627, player_2/loss=94.343, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 338.53it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=115.124, player_2/loss=60.338, rew=8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 337.62it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=106.314, player_2/loss=53.185, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 335.49it/s, env_step=7168, len=19, n/ep=3, n/st=64, player_1/loss=140.018, player_2/loss=55.387, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:02, 341.96it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=137.685, player_2/loss=56.226, rew=25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 347.45it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=104.788, player_2/loss=42.254, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 337.40it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=101.310, player_2/loss=42.801, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 340.10it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=103.260, player_2/loss=35.262, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 338.37it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=83.070, player_2/loss=13.501, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 336.08it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=71.846, player_2/loss=12.400, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 335.94it/s, env_step=14336, len=26, n/ep=3, n/st=64, player_1/loss=57.611, player_2/loss=26.717, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 336.41it/s, env_step=15360, len=20, n/ep=2, n/st=64, player_1/loss=70.426, player_2/loss=35.773, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 338.63it/s, env_step=16384, len=26, n/ep=2, n/st=64, player_1/loss=149.620, player_2/loss=52.402, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 338.78it/s, env_step=17408, len=21, n/ep=3, n/st=64, player_1/loss=106.993, player_2/loss=60.695, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 338.62it/s, env_step=18432, len=16, n/ep=4, n/st=64, player_1/loss=28.354, player_2/loss=30.382, rew=0.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 338.62it/s, env_step=19456, len=27, n/ep=3, n/st=64, player_1/loss=62.845, player_2/loss=31.041, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 338.85it/s, env_step=1024, len=20, n/ep=4, n/st=64, player_1/loss=104.876, player_2/loss=13.250, rew=-12.50]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.03it/s, env_step=2048, len=24, n/ep=3, n/st=64, player_1/loss=65.472, player_2/loss=23.473, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.53it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=49.527, player_2/loss=47.740, rew=-8.33]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 341.83it/s, env_step=4096, len=22, n/ep=2, n/st=64, player_1/loss=51.774, player_2/loss=102.058, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #5: 1025it [00:03, 335.68it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=19.484, player_2/loss=73.530, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #6: 1025it [00:03, 340.58it/s, env_step=6144, len=22, n/ep=2, n/st=64, player_1/loss=19.565, player_2/loss=16.274, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #7: 1025it [00:03, 337.83it/s, env_step=7168, len=18, n/ep=4, n/st=64, player_1/loss=23.044, player_2/loss=34.642, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #8: 1025it [00:03, 338.58it/s, env_step=8192, len=27, n/ep=2, n/st=64, player_1/loss=40.877, player_2/loss=47.331, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #9: 1025it [00:03, 338.99it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=57.379, player_2/loss=30.866, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #10: 1025it [00:03, 339.32it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=72.895, player_2/loss=93.198, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #11: 1025it [00:03, 340.47it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=76.262, player_2/loss=104.934, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #12: 1025it [00:03, 338.95it/s, env_step=12288, len=17, n/ep=4, n/st=64, player_1/loss=69.342, player_2/loss=54.614, rew=-12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #13: 1025it [00:03, 338.93it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=37.987, player_2/loss=28.618, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #14: 1025it [00:03, 338.52it/s, env_step=14336, len=23, n/ep=2, n/st=64, player_1/loss=15.462, player_2/loss=19.703, rew=0.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #15: 1025it [00:03, 339.92it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=12.515, player_2/loss=20.158, rew=-8.33]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #16: 1025it [00:03, 336.66it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=38.589, player_2/loss=31.898, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #17: 1025it [00:03, 339.42it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=94.008, player_2/loss=67.771, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #18: 1025it [00:03, 339.48it/s, env_step=18432, len=20, n/ep=3, n/st=64, player_1/loss=101.229, player_2/loss=64.321, rew=-25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #19: 1025it [00:03, 340.42it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=84.565, player_2/loss=78.153, rew=-25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #4


Epoch #1: 1025it [00:03, 339.15it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=147.265, player_2/loss=92.590, rew=25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.87it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=157.633, player_2/loss=66.893, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 340.22it/s, env_step=3072, len=13, n/ep=6, n/st=64, player_1/loss=144.789, player_2/loss=93.447, rew=16.67]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.29it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=128.088, player_2/loss=119.761, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 339.16it/s, env_step=5120, len=10, n/ep=7, n/st=64, player_1/loss=146.648, player_2/loss=89.833, rew=17.86]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.63it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=165.361, player_2/loss=66.103, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 337.38it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=191.861, player_2/loss=77.852, rew=0.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 339.57it/s, env_step=8192, len=12, n/ep=6, n/st=64, player_1/loss=218.193, player_2/loss=94.131, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 331.87it/s, env_step=9216, len=8, n/ep=6, n/st=64, player_1/loss=161.355, player_2/loss=60.545, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 354.05it/s, env_step=10240, len=10, n/ep=6, n/st=64, player_1/loss=154.108, player_2/loss=72.122, rew=16.67]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 339.70it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=160.281, player_2/loss=97.230, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 340.92it/s, env_step=12288, len=10, n/ep=6, n/st=64, player_1/loss=147.565, player_2/loss=88.564, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 337.07it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=180.590, player_2/loss=97.534, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 339.02it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=146.618, player_2/loss=77.362, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 339.09it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=92.760, player_2/loss=55.031, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 338.14it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=106.198, player_2/loss=78.034, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 335.52it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=118.447, player_2/loss=51.014, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 339.53it/s, env_step=18432, len=9, n/ep=7, n/st=64, player_1/loss=145.226, player_2/loss=45.387, rew=17.86]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.80it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=140.999, player_2/loss=42.818, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 335.11it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=195.810, player_2/loss=296.942, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 335.12it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=264.454, player_2/loss=536.673, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.99it/s, env_step=3072, len=7, n/ep=6, n/st=64, player_1/loss=304.630, player_2/loss=681.418, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.25it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=198.957, rew=19.44]          


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 338.87it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=131.641, player_2/loss=640.441, rew=19.44]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 336.26it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=111.967, player_2/loss=761.256, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 335.48it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=107.458, player_2/loss=796.951, rew=19.44]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 333.04it/s, env_step=8192, len=7, n/ep=9, n/st=64, player_1/loss=77.471, player_2/loss=782.161, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 334.34it/s, env_step=9216, len=7, n/ep=7, n/st=64, player_1/loss=86.273, player_2/loss=751.431, rew=17.86]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 334.21it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=94.056, player_2/loss=639.598, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 334.45it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=59.635, player_2/loss=605.252, rew=19.44]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 329.38it/s, env_step=12288, len=10, n/ep=7, n/st=64, player_1/loss=61.450, player_2/loss=726.330, rew=10.71]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 333.62it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=57.031, player_2/loss=735.804, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 333.56it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=79.690, player_2/loss=661.063, rew=13.89]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 335.05it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=72.865, player_2/loss=656.650, rew=8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 337.36it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=59.552, player_2/loss=559.102, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 337.98it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=61.193, player_2/loss=511.447, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 335.88it/s, env_step=18432, len=7, n/ep=7, n/st=64, player_1/loss=62.050, player_2/loss=568.250, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 336.05it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=102.198, player_2/loss=622.345, rew=6.25]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 339.93it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=74.646, player_2/loss=588.726, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.97it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=102.629, player_2/loss=434.630, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 338.58it/s, env_step=3072, len=10, n/ep=7, n/st=64, player_1/loss=194.088, player_2/loss=240.705, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 333.55it/s, env_step=4096, len=9, n/ep=6, n/st=64, player_1/loss=263.963, player_2/loss=114.726, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 335.88it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=237.428, player_2/loss=83.535, rew=16.67]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 339.05it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=191.636, player_2/loss=42.094, rew=10.71]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 340.91it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=200.829, player_2/loss=37.316, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 338.22it/s, env_step=8192, len=9, n/ep=7, n/st=64, player_1/loss=234.754, player_2/loss=31.660, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 338.51it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=292.236, player_2/loss=32.988, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 334.35it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=263.245, player_2/loss=46.632, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:02, 351.24it/s, env_step=11264, len=10, n/ep=6, n/st=64, player_1/loss=229.978, player_2/loss=37.201, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 341.26it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=258.554, player_2/loss=23.908, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 337.83it/s, env_step=13312, len=15, n/ep=4, n/st=64, player_1/loss=249.665, player_2/loss=13.013, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 336.77it/s, env_step=14336, len=10, n/ep=7, n/st=64, player_1/loss=208.410, player_2/loss=53.960, rew=10.71]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 333.98it/s, env_step=15360, len=11, n/ep=5, n/st=64, player_1/loss=186.471, player_2/loss=65.195, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 339.13it/s, env_step=16384, len=10, n/ep=6, n/st=64, player_1/loss=189.846, player_2/loss=23.664, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 339.01it/s, env_step=17408, len=10, n/ep=7, n/st=64, player_1/loss=193.963, player_2/loss=9.552, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 338.71it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=217.763, player_2/loss=5.989, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 340.14it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=206.917, player_2/loss=17.191, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 339.05it/s, env_step=1024, len=10, n/ep=7, n/st=64, player_1/loss=154.593, player_2/loss=29.704, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.24it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=116.468, player_2/loss=21.540, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 337.61it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=118.759, player_2/loss=56.210, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.09it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=110.779, player_2/loss=79.243, rew=-5.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 338.82it/s, env_step=5120, len=12, n/ep=6, n/st=64, player_1/loss=137.180, player_2/loss=93.089, rew=-16.67]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.31it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=172.322, player_2/loss=49.415, rew=-17.86]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 332.57it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=85.375, player_2/loss=89.366, rew=-5.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 339.29it/s, env_step=8192, len=11, n/ep=7, n/st=64, player_1/loss=65.790, player_2/loss=94.555, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 337.88it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=148.429, player_2/loss=78.117, rew=19.44]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 337.21it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=227.447, player_2/loss=301.281, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.94it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=171.483, player_2/loss=490.617, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 335.81it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=131.240, player_2/loss=505.033, rew=19.44]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 338.65it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=124.019, player_2/loss=476.907, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.26it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=64.163, rew=19.44]         


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 336.82it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=40.169, player_2/loss=509.646, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 338.11it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=62.775, player_2/loss=440.225, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 335.50it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=85.099, player_2/loss=432.153, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 332.81it/s, env_step=18432, len=10, n/ep=6, n/st=64, player_1/loss=56.709, player_2/loss=458.446, rew=16.67]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 334.53it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=45.334, player_2/loss=510.838, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 337.39it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=64.501, player_2/loss=379.450, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 337.52it/s, env_step=2048, len=11, n/ep=6, n/st=64, player_1/loss=147.282, player_2/loss=224.197, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.79it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=190.798, player_2/loss=78.085, rew=25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 338.41it/s, env_step=4096, len=13, n/ep=6, n/st=64, player_1/loss=161.795, player_2/loss=67.017, rew=16.67]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 338.26it/s, env_step=5120, len=9, n/ep=5, n/st=64, player_1/loss=243.726, player_2/loss=61.185, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 336.95it/s, env_step=6144, len=10, n/ep=7, n/st=64, player_1/loss=225.313, player_2/loss=40.228, rew=10.71]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 340.06it/s, env_step=7168, len=9, n/ep=7, n/st=64, player_1/loss=216.378, player_2/loss=20.924, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 338.78it/s, env_step=8192, len=13, n/ep=6, n/st=64, player_1/loss=210.394, player_2/loss=20.333, rew=16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 337.63it/s, env_step=9216, len=9, n/ep=5, n/st=64, player_1/loss=210.978, player_2/loss=24.340, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 334.84it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=216.161, player_2/loss=38.164, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 336.66it/s, env_step=11264, len=9, n/ep=5, n/st=64, player_1/loss=248.095, player_2/loss=37.697, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 351.88it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=302.870, player_2/loss=23.908, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 338.83it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=329.862, player_2/loss=40.592, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 337.48it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=292.708, player_2/loss=47.037, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 337.52it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=274.480, player_2/loss=18.876, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 338.63it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=270.151, player_2/loss=7.091, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 339.12it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=291.655, player_2/loss=10.968, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 338.22it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=285.146, player_2/loss=58.461, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 338.60it/s, env_step=19456, len=9, n/ep=7, n/st=64, player_1/loss=310.095, player_2/loss=57.329, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 336.00it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=202.358, player_2/loss=25.485, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 331.88it/s, env_step=2048, len=10, n/ep=6, n/st=64, player_1/loss=152.484, player_2/loss=36.393, rew=-16.67]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.55it/s, env_step=3072, len=22, n/ep=3, n/st=64, player_1/loss=95.715, player_2/loss=40.390, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.02it/s, env_step=4096, len=21, n/ep=3, n/st=64, player_1/loss=70.225, player_2/loss=49.605, rew=-8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 341.28it/s, env_step=5120, len=22, n/ep=2, n/st=64, player_1/loss=59.462, player_2/loss=68.420, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.21it/s, env_step=6144, len=20, n/ep=3, n/st=64, player_1/loss=56.829, player_2/loss=54.286, rew=-8.33]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 339.54it/s, env_step=7168, len=22, n/ep=3, n/st=64, player_1/loss=110.241, player_2/loss=66.741, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 340.87it/s, env_step=8192, len=28, n/ep=3, n/st=64, player_1/loss=115.039, player_2/loss=102.802, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 341.74it/s, env_step=9216, len=24, n/ep=3, n/st=64, player_1/loss=53.072, player_2/loss=114.626, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 340.59it/s, env_step=10240, len=26, n/ep=2, n/st=64, player_1/loss=47.577, player_2/loss=65.579, rew=0.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 340.73it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=26.860, player_2/loss=53.208, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.66it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=25.934, player_2/loss=46.853, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 338.45it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=42.225, player_2/loss=53.014, rew=-8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 337.01it/s, env_step=14336, len=7, n/ep=8, n/st=64, player_1/loss=83.329, player_2/loss=188.267, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 338.02it/s, env_step=15360, len=10, n/ep=6, n/st=64, player_1/loss=85.893, player_2/loss=482.058, rew=16.67]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 339.20it/s, env_step=16384, len=7, n/ep=7, n/st=64, player_1/loss=85.957, player_2/loss=546.009, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 338.59it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=112.438, player_2/loss=492.640, rew=5.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 337.14it/s, env_step=18432, len=8, n/ep=8, n/st=64, player_1/loss=94.923, player_2/loss=404.215, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.83it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=48.694, player_2/loss=463.282, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 331.52it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=170.561, player_2/loss=167.228, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 338.52it/s, env_step=2048, len=24, n/ep=3, n/st=64, player_1/loss=200.757, player_2/loss=164.996, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 340.63it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=186.435, player_2/loss=108.626, rew=-12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 341.29it/s, env_step=4096, len=22, n/ep=3, n/st=64, player_1/loss=112.883, player_2/loss=64.113, rew=8.33]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 336.65it/s, env_step=5120, len=22, n/ep=3, n/st=64, player_1/loss=138.655, player_2/loss=34.233, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 336.84it/s, env_step=6144, len=24, n/ep=3, n/st=64, player_1/loss=123.838, player_2/loss=27.158, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 339.74it/s, env_step=7168, len=24, n/ep=2, n/st=64, player_1/loss=99.555, player_2/loss=37.873, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 340.36it/s, env_step=8192, len=17, n/ep=3, n/st=64, player_1/loss=104.004, player_2/loss=33.343, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 338.69it/s, env_step=9216, len=18, n/ep=4, n/st=64, player_1/loss=111.578, player_2/loss=63.095, rew=0.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 340.80it/s, env_step=10240, len=28, n/ep=2, n/st=64, player_1/loss=108.604, player_2/loss=45.827, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 337.37it/s, env_step=11264, len=18, n/ep=4, n/st=64, player_1/loss=102.100, player_2/loss=67.115, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 340.75it/s, env_step=12288, len=15, n/ep=4, n/st=64, player_1/loss=135.370, player_2/loss=146.473, rew=12.50]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 352.75it/s, env_step=13312, len=16, n/ep=4, n/st=64, player_1/loss=165.543, player_2/loss=175.283, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 339.89it/s, env_step=14336, len=14, n/ep=4, n/st=64, player_1/loss=214.582, player_2/loss=146.287, rew=12.50]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 339.40it/s, env_step=15360, len=14, n/ep=5, n/st=64, player_1/loss=206.872, player_2/loss=95.461, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.21it/s, env_step=16384, len=18, n/ep=4, n/st=64, player_1/loss=218.784, player_2/loss=91.957, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 333.99it/s, env_step=17408, len=13, n/ep=5, n/st=64, player_1/loss=184.933, player_2/loss=137.115, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 339.60it/s, env_step=18432, len=14, n/ep=4, n/st=64, player_1/loss=138.802, player_2/loss=123.718, rew=25.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 339.97it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=169.724, player_2/loss=82.072, rew=25.00]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 338.78it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=76.401, player_2/loss=77.864, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 340.83it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=74.824, player_2/loss=56.383, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.18it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=99.617, player_2/loss=35.028, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.81it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_1/loss=92.112, player_2/loss=33.940, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.86it/s, env_step=5120, len=12, n/ep=5, n/st=64, player_1/loss=73.588, player_2/loss=22.180, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.27it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=82.057, player_2/loss=45.505, rew=-12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 340.11it/s, env_step=7168, len=12, n/ep=5, n/st=64, player_1/loss=41.763, player_2/loss=55.799, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 339.03it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=43.070, player_2/loss=54.463, rew=-16.67]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 340.68it/s, env_step=9216, len=13, n/ep=5, n/st=64, player_1/loss=30.788, player_2/loss=48.837, rew=-15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 340.64it/s, env_step=10240, len=12, n/ep=6, n/st=64, player_1/loss=37.743, player_2/loss=48.456, rew=-25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 334.92it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=87.519, player_2/loss=76.782, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 339.77it/s, env_step=12288, len=18, n/ep=3, n/st=64, player_1/loss=83.527, player_2/loss=61.256, rew=-25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 339.17it/s, env_step=13312, len=20, n/ep=3, n/st=64, player_1/loss=72.862, player_2/loss=60.815, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.74it/s, env_step=14336, len=19, n/ep=4, n/st=64, player_1/loss=86.343, player_2/loss=55.345, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 336.84it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=90.470, player_2/loss=44.305, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 338.52it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=69.187, player_2/loss=61.598, rew=-12.50]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 336.42it/s, env_step=17408, len=24, n/ep=3, n/st=64, player_1/loss=49.549, player_2/loss=96.291, rew=-8.33]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.54it/s, env_step=18432, len=37, n/ep=2, n/st=64, player_1/loss=71.717, player_2/loss=102.880, rew=37.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 340.05it/s, env_step=19456, len=21, n/ep=3, n/st=64, player_1/loss=78.321, player_2/loss=159.057, rew=8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 339.05it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=187.658, player_2/loss=139.170, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 337.75it/s, env_step=2048, len=25, n/ep=2, n/st=64, player_1/loss=145.372, player_2/loss=130.051, rew=0.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 334.27it/s, env_step=3072, len=19, n/ep=3, n/st=64, player_1/loss=88.927, player_2/loss=120.256, rew=8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 339.40it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=105.137, player_2/loss=142.806, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 338.89it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=73.392, player_2/loss=98.466, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 337.08it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=41.732, player_2/loss=84.351, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.93it/s, env_step=7168, len=27, n/ep=2, n/st=64, player_1/loss=74.651, player_2/loss=42.200, rew=25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 339.67it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=115.016, player_2/loss=50.817, rew=8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 337.87it/s, env_step=9216, len=16, n/ep=5, n/st=64, player_1/loss=115.800, player_2/loss=53.544, rew=5.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.03it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=111.713, player_2/loss=36.884, rew=5.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 339.95it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=121.974, player_2/loss=32.833, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 336.80it/s, env_step=12288, len=19, n/ep=3, n/st=64, player_1/loss=123.395, player_2/loss=20.130, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 339.84it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=115.210, player_2/loss=32.218, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:02, 346.29it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=105.572, player_2/loss=52.986, rew=15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 336.24it/s, env_step=15360, len=15, n/ep=4, n/st=64, player_1/loss=93.394, player_2/loss=38.695, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.77it/s, env_step=16384, len=12, n/ep=5, n/st=64, player_1/loss=77.043, player_2/loss=21.881, rew=15.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 337.78it/s, env_step=17408, len=15, n/ep=5, n/st=64, player_1/loss=67.175, player_2/loss=22.703, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 337.47it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=72.535, player_2/loss=12.728, rew=5.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 336.08it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=94.486, player_2/loss=21.311, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 337.94it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=171.333, player_2/loss=95.350, rew=8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.16it/s, env_step=2048, len=20, n/ep=4, n/st=64, player_1/loss=151.797, player_2/loss=115.956, rew=12.50]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.08it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=95.051, player_2/loss=133.300, rew=-12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 333.84it/s, env_step=4096, len=23, n/ep=3, n/st=64, player_1/loss=86.781, player_2/loss=121.286, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 337.81it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=96.179, player_2/loss=93.783, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.23it/s, env_step=6144, len=18, n/ep=4, n/st=64, player_1/loss=101.971, player_2/loss=123.967, rew=0.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 338.84it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=62.595, player_2/loss=111.038, rew=-12.50]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 338.91it/s, env_step=8192, len=27, n/ep=2, n/st=64, player_1/loss=74.720, player_2/loss=82.582, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 337.42it/s, env_step=9216, len=20, n/ep=3, n/st=64, player_1/loss=95.608, player_2/loss=160.296, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 336.69it/s, env_step=10240, len=32, n/ep=2, n/st=64, player_1/loss=84.681, player_2/loss=157.432, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 336.73it/s, env_step=11264, len=28, n/ep=2, n/st=64, player_1/loss=124.729, player_2/loss=118.921, rew=-25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 338.35it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=124.303, player_2/loss=86.307, rew=8.33]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 337.38it/s, env_step=13312, len=29, n/ep=2, n/st=64, player_1/loss=130.771, player_2/loss=117.353, rew=-25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.69it/s, env_step=14336, len=25, n/ep=3, n/st=64, player_1/loss=128.958, player_2/loss=164.159, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 332.55it/s, env_step=15360, len=31, n/ep=2, n/st=64, player_1/loss=68.348, player_2/loss=190.878, rew=-25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 337.04it/s, env_step=16384, len=9, n/ep=8, n/st=64, player_1/loss=88.550, player_2/loss=360.468, rew=18.75]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 338.16it/s, env_step=17408, len=7, n/ep=8, n/st=64, player_1/loss=112.766, player_2/loss=434.486, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 338.44it/s, env_step=18432, len=9, n/ep=8, n/st=64, player_1/loss=84.046, player_2/loss=425.459, rew=18.75]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 334.42it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=62.616, player_2/loss=475.086, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 337.43it/s, env_step=1024, len=18, n/ep=4, n/st=64, player_1/loss=123.844, player_2/loss=261.190, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 339.45it/s, env_step=2048, len=22, n/ep=3, n/st=64, player_1/loss=134.169, player_2/loss=166.656, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 341.48it/s, env_step=3072, len=31, n/ep=2, n/st=64, player_1/loss=161.694, player_2/loss=69.603, rew=0.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 330.59it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=180.435, player_2/loss=44.771, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 339.17it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=162.563, player_2/loss=103.506, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 338.64it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=168.871, player_2/loss=79.819, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 337.61it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=179.894, player_2/loss=15.332, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 334.09it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=170.206, player_2/loss=25.990, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 338.04it/s, env_step=9216, len=16, n/ep=4, n/st=64, player_1/loss=167.479, player_2/loss=21.181, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 339.45it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=173.730, player_2/loss=9.654, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 340.89it/s, env_step=11264, len=24, n/ep=3, n/st=64, player_1/loss=194.968, player_2/loss=13.787, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 341.31it/s, env_step=12288, len=16, n/ep=4, n/st=64, player_1/loss=227.692, player_2/loss=27.060, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 339.46it/s, env_step=13312, len=17, n/ep=4, n/st=64, player_1/loss=195.230, player_2/loss=29.729, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 337.93it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=127.071, player_2/loss=83.127, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 355.31it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=129.329, player_2/loss=83.899, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 339.50it/s, env_step=16384, len=17, n/ep=4, n/st=64, player_1/loss=140.079, player_2/loss=44.134, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 340.69it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=175.564, player_2/loss=19.704, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 338.33it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=221.090, player_2/loss=22.516, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 333.96it/s, env_step=19456, len=14, n/ep=5, n/st=64, player_1/loss=208.115, player_2/loss=41.202, rew=15.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 339.22it/s, env_step=1024, len=17, n/ep=4, n/st=64, player_1/loss=142.193, player_2/loss=11.916, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.31it/s, env_step=2048, len=21, n/ep=3, n/st=64, player_1/loss=106.810, player_2/loss=16.215, rew=-8.33]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 335.15it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=108.322, player_2/loss=16.353, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 334.34it/s, env_step=4096, len=16, n/ep=4, n/st=64, player_1/loss=139.618, player_2/loss=36.218, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 338.14it/s, env_step=5120, len=16, n/ep=4, n/st=64, player_1/loss=128.601, player_2/loss=38.365, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 338.17it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=90.441, player_2/loss=35.188, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 340.50it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=136.476, player_2/loss=120.995, rew=-5.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 338.08it/s, env_step=8192, len=16, n/ep=4, n/st=64, player_1/loss=145.673, player_2/loss=131.102, rew=0.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 339.20it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=120.782, player_2/loss=162.518, rew=12.50]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 337.46it/s, env_step=10240, len=16, n/ep=4, n/st=64, player_1/loss=96.345, player_2/loss=279.135, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.34it/s, env_step=11264, len=15, n/ep=4, n/st=64, player_1/loss=68.186, player_2/loss=315.138, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 335.89it/s, env_step=12288, len=14, n/ep=5, n/st=64, player_1/loss=76.792, player_2/loss=215.897, rew=5.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 338.79it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=107.948, player_2/loss=279.997, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 336.77it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=102.000, player_2/loss=289.682, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 336.35it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=72.940, player_2/loss=399.126, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 339.73it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=66.890, player_2/loss=401.484, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 336.31it/s, env_step=17408, len=11, n/ep=6, n/st=64, player_1/loss=31.078, player_2/loss=358.848, rew=16.67]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 335.42it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=34.622, player_2/loss=353.027, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 336.02it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=65.827, player_2/loss=368.115, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 336.71it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=21.330, player_2/loss=410.503, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.43it/s, env_step=2048, len=8, n/ep=7, n/st=64, player_1/loss=15.625, player_2/loss=368.409, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.41it/s, env_step=3072, len=7, n/ep=7, n/st=64, player_1/loss=22.730, player_2/loss=299.861, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.05it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=45.635, player_2/loss=273.214, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 332.86it/s, env_step=5120, len=14, n/ep=4, n/st=64, player_1/loss=63.337, player_2/loss=214.506, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 336.93it/s, env_step=6144, len=14, n/ep=4, n/st=64, player_1/loss=115.493, player_2/loss=125.278, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 335.99it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=139.474, player_2/loss=121.026, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 337.54it/s, env_step=8192, len=18, n/ep=3, n/st=64, player_1/loss=106.475, player_2/loss=168.694, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 338.12it/s, env_step=9216, len=17, n/ep=3, n/st=64, player_1/loss=98.810, player_2/loss=173.554, rew=-8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.82it/s, env_step=10240, len=20, n/ep=3, n/st=64, player_1/loss=123.347, player_2/loss=72.250, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.88it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=140.927, player_2/loss=55.273, rew=12.50]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.13it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=108.758, player_2/loss=84.064, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 340.51it/s, env_step=13312, len=18, n/ep=3, n/st=64, player_1/loss=112.428, player_2/loss=46.300, rew=25.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 338.15it/s, env_step=14336, len=18, n/ep=3, n/st=64, player_1/loss=125.711, player_2/loss=27.153, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 338.43it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=116.620, player_2/loss=36.456, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 349.18it/s, env_step=16384, len=20, n/ep=3, n/st=64, player_1/loss=103.133, player_2/loss=42.146, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 337.34it/s, env_step=17408, len=27, n/ep=2, n/st=64, player_1/loss=109.281, player_2/loss=48.119, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 337.23it/s, env_step=18432, len=23, n/ep=3, n/st=64, player_1/loss=116.065, player_2/loss=62.871, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.89it/s, env_step=19456, len=24, n/ep=3, n/st=64, player_1/loss=92.516, player_2/loss=50.447, rew=-8.33]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 335.38it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=102.934, player_2/loss=119.396, rew=-8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 338.17it/s, env_step=2048, len=20, n/ep=3, n/st=64, player_1/loss=99.944, player_2/loss=185.130, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 336.12it/s, env_step=3072, len=17, n/ep=4, n/st=64, player_1/loss=102.791, player_2/loss=171.985, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 336.45it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=113.865, player_2/loss=211.665, rew=15.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 340.53it/s, env_step=5120, len=13, n/ep=5, n/st=64, player_1/loss=94.176, player_2/loss=209.329, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 338.38it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=67.421, player_2/loss=136.961, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 335.58it/s, env_step=7168, len=16, n/ep=4, n/st=64, player_1/loss=51.029, player_2/loss=96.389, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 338.06it/s, env_step=8192, len=22, n/ep=3, n/st=64, player_1/loss=22.193, player_2/loss=133.988, rew=8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 339.88it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=31.325, player_2/loss=120.475, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 339.34it/s, env_step=10240, len=13, n/ep=5, n/st=64, player_1/loss=35.120, player_2/loss=129.451, rew=25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 338.65it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=16.848, player_2/loss=152.509, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 341.64it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=28.299, player_2/loss=148.986, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 337.94it/s, env_step=13312, len=19, n/ep=3, n/st=64, player_1/loss=33.914, player_2/loss=111.123, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 338.27it/s, env_step=14336, len=17, n/ep=4, n/st=64, player_1/loss=24.807, player_2/loss=143.639, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 337.34it/s, env_step=15360, len=16, n/ep=4, n/st=64, player_1/loss=16.879, player_2/loss=216.701, rew=12.50]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 340.27it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=46.201, player_2/loss=249.073, rew=25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 336.82it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=46.619, player_2/loss=254.204, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 335.53it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=14.704, rew=25.00]        


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 337.97it/s, env_step=19456, len=13, n/ep=4, n/st=64, player_1/loss=21.146, player_2/loss=213.280, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 330.29it/s, env_step=1024, len=13, n/ep=5, n/st=64, player_1/loss=14.824, player_2/loss=143.387, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.20it/s, env_step=2048, len=16, n/ep=4, n/st=64, player_1/loss=35.950, player_2/loss=161.281, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.30it/s, env_step=3072, len=22, n/ep=2, n/st=64, player_1/loss=97.932, player_2/loss=150.811, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:03, 339.86it/s, env_step=4096, len=18, n/ep=3, n/st=64, player_1/loss=108.119, player_2/loss=101.717, rew=-25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:03, 338.51it/s, env_step=5120, len=28, n/ep=2, n/st=64, player_1/loss=89.841, player_2/loss=70.173, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:03, 337.17it/s, env_step=6144, len=20, n/ep=4, n/st=64, player_1/loss=109.626, player_2/loss=57.447, rew=12.50]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:03, 338.51it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=137.717, player_2/loss=79.929, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 338.97it/s, env_step=8192, len=21, n/ep=3, n/st=64, player_1/loss=138.402, player_2/loss=83.188, rew=-8.33]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 340.26it/s, env_step=9216, len=14, n/ep=5, n/st=64, player_1/loss=98.808, player_2/loss=103.967, rew=-15.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 340.76it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=120.605, player_2/loss=95.371, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:02, 343.97it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=128.225, player_2/loss=71.443, rew=8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 333.78it/s, env_step=12288, len=19, n/ep=4, n/st=64, player_1/loss=118.319, player_2/loss=65.320, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 336.65it/s, env_step=13312, len=24, n/ep=2, n/st=64, player_1/loss=106.386, player_2/loss=77.400, rew=0.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 337.74it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=128.629, player_2/loss=146.813, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 337.54it/s, env_step=15360, len=22, n/ep=3, n/st=64, player_1/loss=129.340, player_2/loss=160.437, rew=25.00]


Epoch #15: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 340.29it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=130.510, player_2/loss=137.315, rew=-25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:02, 353.87it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=122.143, player_2/loss=121.331, rew=25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 336.56it/s, env_step=18432, len=18, n/ep=4, n/st=64, player_1/loss=96.565, player_2/loss=124.936, rew=-12.50]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 339.47it/s, env_step=19456, len=19, n/ep=4, n/st=64, player_1/loss=100.859, player_2/loss=127.103, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #1: 1025it [00:03, 339.67it/s, env_step=1024, len=15, n/ep=4, n/st=64, player_1/loss=130.981, player_2/loss=93.778, rew=12.50]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 338.80it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=83.250, player_2/loss=137.435, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 336.59it/s, env_step=3072, len=14, n/ep=4, n/st=64, player_1/loss=47.147, player_2/loss=180.935, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 336.76it/s, env_step=4096, len=14, n/ep=4, n/st=64, player_1/loss=64.866, player_2/loss=137.119, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 338.76it/s, env_step=5120, len=14, n/ep=5, n/st=64, player_1/loss=44.091, player_2/loss=115.490, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 336.76it/s, env_step=6144, len=21, n/ep=3, n/st=64, player_1/loss=49.208, player_2/loss=151.203, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 337.59it/s, env_step=7168, len=12, n/ep=6, n/st=64, player_1/loss=42.309, rew=16.67]          


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 338.98it/s, env_step=8192, len=12, n/ep=5, n/st=64, player_1/loss=29.337, player_2/loss=196.187, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 339.23it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=55.833, player_2/loss=183.462, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.03it/s, env_step=10240, len=11, n/ep=5, n/st=64, player_1/loss=27.270, player_2/loss=146.945, rew=15.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:03, 341.30it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=27.170, player_2/loss=133.802, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 338.33it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=31.056, player_2/loss=180.168, rew=15.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 337.67it/s, env_step=13312, len=12, n/ep=5, n/st=64, player_1/loss=27.643, player_2/loss=170.870, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 336.39it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=19.974, player_2/loss=149.756, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:02, 342.67it/s, env_step=15360, len=12, n/ep=5, n/st=64, player_1/loss=18.198, player_2/loss=143.372, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 336.87it/s, env_step=16384, len=15, n/ep=4, n/st=64, player_1/loss=28.178, player_2/loss=139.062, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 338.42it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=28.545, rew=25.00]        


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 339.78it/s, env_step=18432, len=15, n/ep=4, n/st=64, player_1/loss=43.442, player_2/loss=173.306, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 338.46it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=30.736, player_2/loss=162.192, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 339.67it/s, env_step=1024, len=14, n/ep=4, n/st=64, player_1/loss=20.099, player_2/loss=244.374, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.13it/s, env_step=2048, len=13, n/ep=5, n/st=64, player_1/loss=12.072, player_2/loss=196.845, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.13it/s, env_step=3072, len=14, n/ep=5, n/st=64, player_1/loss=10.588, player_2/loss=174.173, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.38it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=24.376, player_2/loss=166.715, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 336.10it/s, env_step=5120, len=17, n/ep=4, n/st=64, player_1/loss=32.800, player_2/loss=193.432, rew=0.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 332.94it/s, env_step=6144, len=13, n/ep=5, n/st=64, player_1/loss=25.514, player_2/loss=186.678, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 336.37it/s, env_step=7168, len=13, n/ep=4, n/st=64, player_1/loss=55.372, player_2/loss=172.599, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #8: 1025it [00:03, 336.33it/s, env_step=8192, len=13, n/ep=4, n/st=64, player_1/loss=154.886, player_2/loss=194.987, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #9: 1025it [00:03, 339.67it/s, env_step=9216, len=14, n/ep=4, n/st=64, player_1/loss=229.047, player_2/loss=113.119, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #10: 1025it [00:03, 338.77it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=232.683, player_2/loss=66.108, rew=-5.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #11: 1025it [00:03, 338.98it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=329.501, player_2/loss=76.661, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #12: 1025it [00:03, 340.81it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=299.053, player_2/loss=141.941, rew=15.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #13: 1025it [00:03, 337.56it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=235.450, player_2/loss=160.351, rew=15.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #14: 1025it [00:03, 341.03it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=198.258, player_2/loss=159.670, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #15: 1025it [00:03, 340.48it/s, env_step=15360, len=14, n/ep=4, n/st=64, player_1/loss=281.460, player_2/loss=134.026, rew=0.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #16: 1025it [00:03, 341.37it/s, env_step=16384, len=14, n/ep=5, n/st=64, player_1/loss=373.065, player_2/loss=122.852, rew=5.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #17: 1025it [00:03, 335.74it/s, env_step=17408, len=14, n/ep=4, n/st=64, player_1/loss=316.109, player_2/loss=116.140, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #18: 1025it [00:02, 349.91it/s, env_step=18432, len=13, n/ep=4, n/st=64, player_1/loss=202.229, player_2/loss=68.113, rew=0.00]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #19: 1025it [00:03, 335.17it/s, env_step=19456, len=13, n/ep=5, n/st=64, player_1/loss=214.367, player_2/loss=80.034, rew=5.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #7


Epoch #1: 1025it [00:03, 335.88it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=129.384, player_2/loss=241.702, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 339.96it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=76.901, player_2/loss=254.416, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.16it/s, env_step=3072, len=8, n/ep=7, n/st=64, player_1/loss=33.812, player_2/loss=285.074, rew=17.86]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 335.83it/s, env_step=4096, len=9, n/ep=7, n/st=64, player_1/loss=28.299, player_2/loss=298.456, rew=25.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 339.42it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=17.235, player_2/loss=277.802, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.15it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=23.219, player_2/loss=270.990, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 334.56it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=13.499, player_2/loss=268.066, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 333.44it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=4.828, player_2/loss=301.304, rew=18.75]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 338.49it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=5.819, player_2/loss=312.355, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.13it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=6.775, player_2/loss=275.027, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 334.52it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=12.406, rew=25.00]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 338.09it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=16.920, player_2/loss=253.959, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 334.14it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=7.538, player_2/loss=256.012, rew=19.44]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 336.94it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=4.051, player_2/loss=265.952, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 335.63it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=18.518, player_2/loss=275.601, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 335.17it/s, env_step=16384, len=8, n/ep=7, n/st=64, player_1/loss=24.381, player_2/loss=270.308, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 335.44it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=7.108, player_2/loss=283.424, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 335.71it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=3.199, player_2/loss=307.812, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 335.50it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=4.069, player_2/loss=292.795, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 332.57it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=22.706, player_2/loss=279.010, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 338.63it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=62.063, player_2/loss=278.424, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 336.49it/s, env_step=3072, len=7, n/ep=10, n/st=64, player_1/loss=98.418, player_2/loss=266.979, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 335.97it/s, env_step=4096, len=7, n/ep=9, n/st=64, player_1/loss=62.893, player_2/loss=266.981, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 338.53it/s, env_step=5120, len=9, n/ep=7, n/st=64, player_1/loss=23.483, player_2/loss=288.140, rew=-17.86]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 335.79it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=37.167, player_2/loss=301.796, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:03, 335.77it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=268.683, player_2/loss=236.220, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:03, 334.95it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=520.275, player_2/loss=199.511, rew=6.25]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:03, 335.47it/s, env_step=9216, len=8, n/ep=8, n/st=64, player_1/loss=509.293, player_2/loss=167.123, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:03, 333.71it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=512.227, player_2/loss=143.758, rew=10.71]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:03, 337.77it/s, env_step=11264, len=7, n/ep=8, n/st=64, player_1/loss=611.061, player_2/loss=123.504, rew=12.50]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 335.33it/s, env_step=12288, len=7, n/ep=8, n/st=64, player_1/loss=624.866, player_2/loss=134.878, rew=18.75]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:03, 335.13it/s, env_step=13312, len=8, n/ep=7, n/st=64, player_1/loss=562.356, player_2/loss=156.842, rew=17.86]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:03, 334.17it/s, env_step=14336, len=9, n/ep=6, n/st=64, player_1/loss=586.394, player_2/loss=97.381, rew=8.33]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:03, 335.62it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=660.175, player_2/loss=36.903, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:03, 340.03it/s, env_step=16384, len=9, n/ep=7, n/st=64, player_1/loss=495.399, player_2/loss=70.255, rew=17.86]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:03, 335.08it/s, env_step=17408, len=8, n/ep=7, n/st=64, player_1/loss=528.560, player_2/loss=80.183, rew=17.86]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:03, 337.94it/s, env_step=18432, len=8, n/ep=7, n/st=64, player_1/loss=537.695, player_2/loss=57.686, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 351.65it/s, env_step=19456, len=8, n/ep=7, n/st=64, player_1/loss=612.771, player_2/loss=46.874, rew=17.86]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:03, 337.26it/s, env_step=1024, len=12, n/ep=5, n/st=64, player_1/loss=164.066, player_2/loss=225.666, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.56it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=102.795, player_2/loss=232.319, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.83it/s, env_step=3072, len=13, n/ep=5, n/st=64, player_1/loss=50.162, player_2/loss=255.565, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 336.60it/s, env_step=4096, len=14, n/ep=5, n/st=64, player_1/loss=39.262, player_2/loss=293.138, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 340.93it/s, env_step=5120, len=15, n/ep=4, n/st=64, player_1/loss=15.376, player_2/loss=241.946, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 337.44it/s, env_step=6144, len=7, n/ep=10, n/st=64, player_1/loss=34.125, player_2/loss=231.038, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 335.98it/s, env_step=7168, len=7, n/ep=8, n/st=64, player_1/loss=59.511, player_2/loss=447.676, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.84it/s, env_step=8192, len=8, n/ep=8, n/st=64, player_1/loss=94.335, player_2/loss=508.343, rew=12.50]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:03, 336.11it/s, env_step=9216, len=7, n/ep=8, n/st=64, player_1/loss=43.649, player_2/loss=512.577, rew=18.75]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:03, 338.11it/s, env_step=10240, len=8, n/ep=8, n/st=64, player_1/loss=11.913, player_2/loss=504.642, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 337.53it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=24.634, player_2/loss=483.890, rew=17.86]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:03, 337.47it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=23.355, player_2/loss=523.417, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:03, 336.62it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=30.324, player_2/loss=578.698, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:03, 334.05it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=15.589, player_2/loss=581.381, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:03, 338.21it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=10.969, player_2/loss=572.928, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:03, 337.27it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=14.097, player_2/loss=551.175, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 340.54it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=46.271, player_2/loss=508.131, rew=18.75]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:03, 336.38it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=49.907, player_2/loss=456.281, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.52it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=13.401, player_2/loss=518.384, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:03, 335.53it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=9.414, player_2/loss=454.451, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 337.94it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=55.923, player_2/loss=434.959, rew=-13.89]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 338.64it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=66.188, player_2/loss=396.432, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 337.99it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=50.890, player_2/loss=334.847, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 333.79it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=5.196, player_2/loss=298.314, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 336.51it/s, env_step=6144, len=7, n/ep=8, n/st=64, player_1/loss=42.437, player_2/loss=288.111, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #7: 1025it [00:03, 338.58it/s, env_step=7168, len=7, n/ep=9, n/st=64, player_1/loss=76.719, rew=-19.44]          


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #8: 1025it [00:03, 334.92it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=52.961, player_2/loss=286.417, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #9: 1025it [00:03, 336.73it/s, env_step=9216, len=7, n/ep=9, n/st=64, player_1/loss=28.919, player_2/loss=256.231, rew=-19.44]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #10: 1025it [00:03, 337.29it/s, env_step=10240, len=7, n/ep=9, n/st=64, player_1/loss=31.459, player_2/loss=227.485, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #11: 1025it [00:03, 256.44it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=35.071, player_2/loss=236.203, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #12: 1025it [00:03, 334.39it/s, env_step=12288, len=7, n/ep=9, n/st=64, player_1/loss=136.770, player_2/loss=259.420, rew=-13.89]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #13: 1025it [00:02, 379.25it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=194.518, player_2/loss=274.997, rew=-17.86]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #14: 1025it [00:02, 355.60it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=102.511, player_2/loss=286.735, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #15: 1025it [00:02, 342.99it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=10.323, player_2/loss=294.139, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #16: 1025it [00:03, 335.22it/s, env_step=16384, len=13, n/ep=4, n/st=64, player_1/loss=271.231, player_2/loss=179.181, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #17: 1025it [00:03, 333.88it/s, env_step=17408, len=11, n/ep=5, n/st=64, player_1/loss=465.403, player_2/loss=56.003, rew=15.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #18: 1025it [00:03, 329.25it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=332.603, player_2/loss=118.759, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #19: 1025it [00:02, 344.56it/s, env_step=19456, len=12, n/ep=6, n/st=64, player_1/loss=337.367, player_2/loss=151.894, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #6


Epoch #1: 1025it [00:02, 369.34it/s, env_step=1024, len=10, n/ep=6, n/st=64, player_1/loss=217.365, player_2/loss=99.289, rew=25.00]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 362.05it/s, env_step=2048, len=11, n/ep=5, n/st=64, player_1/loss=210.831, player_2/loss=183.454, rew=5.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 339.81it/s, env_step=3072, len=11, n/ep=5, n/st=64, player_1/loss=111.846, player_2/loss=251.028, rew=15.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:03, 339.06it/s, env_step=4096, len=12, n/ep=5, n/st=64, player_2/loss=228.897, rew=25.00]         


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:03, 330.46it/s, env_step=5120, len=10, n/ep=6, n/st=64, player_1/loss=79.837, player_2/loss=229.265, rew=0.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:03, 315.80it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=63.355, player_2/loss=240.310, rew=25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:03, 318.12it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=55.598, player_2/loss=270.956, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 349.54it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=64.858, player_2/loss=266.668, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 349.73it/s, env_step=9216, len=11, n/ep=6, n/st=64, player_1/loss=45.620, player_2/loss=229.832, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 347.77it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=13.511, player_2/loss=215.586, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:03, 339.01it/s, env_step=11264, len=11, n/ep=6, n/st=64, player_1/loss=42.132, player_2/loss=222.495, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 400.32it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=44.561, player_2/loss=207.067, rew=16.67]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 415.80it/s, env_step=13312, len=11, n/ep=6, n/st=64, player_1/loss=74.752, player_2/loss=261.853, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 402.82it/s, env_step=14336, len=19, n/ep=3, n/st=64, player_1/loss=94.873, player_2/loss=242.755, rew=-25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 422.33it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=93.037, rew=25.00]         


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 417.70it/s, env_step=16384, len=7, n/ep=7, n/st=64, player_1/loss=80.514, player_2/loss=204.307, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 415.36it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=60.936, player_2/loss=226.130, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 379.67it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=52.775, player_2/loss=234.438, rew=19.44]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 395.02it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=89.059, player_2/loss=268.353, rew=19.44]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 384.45it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=58.122, player_2/loss=237.375, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 380.75it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=60.960, player_2/loss=277.070, rew=-25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 343.17it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=80.340, player_2/loss=260.317, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 328.03it/s, env_step=4096, len=8, n/ep=8, n/st=64, player_1/loss=74.916, player_2/loss=231.885, rew=-18.75]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 330.26it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=82.445, player_2/loss=212.808, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 331.87it/s, env_step=6144, len=11, n/ep=6, n/st=64, player_1/loss=134.986, player_2/loss=166.761, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 339.33it/s, env_step=7168, len=18, n/ep=3, n/st=64, player_1/loss=187.277, player_2/loss=113.359, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 335.98it/s, env_step=8192, len=14, n/ep=3, n/st=64, player_1/loss=204.335, player_2/loss=94.046, rew=-8.33]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 333.11it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=185.039, player_2/loss=92.532, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 331.98it/s, env_step=10240, len=17, n/ep=3, n/st=64, player_1/loss=204.668, player_2/loss=69.865, rew=8.33]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 340.11it/s, env_step=11264, len=17, n/ep=4, n/st=64, player_1/loss=204.083, player_2/loss=54.778, rew=25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 333.41it/s, env_step=12288, len=20, n/ep=3, n/st=64, player_1/loss=169.422, player_2/loss=83.554, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 334.73it/s, env_step=13312, len=18, n/ep=4, n/st=64, player_1/loss=187.826, player_2/loss=59.151, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 337.80it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=199.082, player_2/loss=13.954, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 318.05it/s, env_step=15360, len=17, n/ep=4, n/st=64, player_1/loss=164.459, player_2/loss=50.881, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 336.33it/s, env_step=16384, len=16, n/ep=4, n/st=64, player_2/loss=83.547, rew=12.50]        


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 308.38it/s, env_step=17408, len=16, n/ep=4, n/st=64, player_1/loss=189.869, player_2/loss=79.943, rew=0.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 329.79it/s, env_step=18432, len=16, n/ep=3, n/st=64, player_1/loss=174.515, player_2/loss=55.625, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 361.87it/s, env_step=19456, len=17, n/ep=4, n/st=64, player_1/loss=128.348, player_2/loss=51.402, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 352.33it/s, env_step=1024, len=20, n/ep=3, n/st=64, player_1/loss=106.142, player_2/loss=23.235, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 358.43it/s, env_step=2048, len=15, n/ep=4, n/st=64, player_1/loss=100.583, player_2/loss=41.254, rew=12.50]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:03, 324.04it/s, env_step=3072, len=17, n/ep=3, n/st=64, player_1/loss=104.177, player_2/loss=98.235, rew=-8.33]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 433.60it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=101.936, player_2/loss=200.756, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 403.32it/s, env_step=5120, len=11, n/ep=6, n/st=64, player_1/loss=145.510, player_2/loss=222.933, rew=25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 396.10it/s, env_step=6144, len=17, n/ep=4, n/st=64, player_1/loss=153.919, player_2/loss=122.334, rew=0.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 405.98it/s, env_step=7168, len=11, n/ep=6, n/st=64, player_1/loss=115.988, player_2/loss=139.523, rew=16.67]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:03, 336.16it/s, env_step=8192, len=10, n/ep=6, n/st=64, player_1/loss=64.993, player_2/loss=195.862, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 391.61it/s, env_step=9216, len=10, n/ep=7, n/st=64, player_1/loss=73.906, player_2/loss=256.740, rew=10.71]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 415.69it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=74.842, player_2/loss=280.219, rew=18.75]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 393.53it/s, env_step=11264, len=7, n/ep=9, n/st=64, player_1/loss=42.811, rew=25.00]         


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 409.05it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=47.856, rew=25.00]         


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 392.45it/s, env_step=13312, len=7, n/ep=8, n/st=64, player_1/loss=35.004, player_2/loss=319.089, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 347.48it/s, env_step=14336, len=7, n/ep=9, n/st=64, player_1/loss=15.091, player_2/loss=335.092, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 392.99it/s, env_step=15360, len=8, n/ep=8, n/st=64, player_1/loss=17.690, player_2/loss=288.469, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 344.99it/s, env_step=16384, len=7, n/ep=8, n/st=64, player_1/loss=25.415, player_2/loss=254.675, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:03, 330.50it/s, env_step=17408, len=8, n/ep=9, n/st=64, player_1/loss=11.115, player_2/loss=269.775, rew=19.44]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 406.63it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=9.811, player_2/loss=305.417, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:03, 337.14it/s, env_step=19456, len=7, n/ep=8, n/st=64, player_1/loss=7.178, player_2/loss=329.354, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 385.42it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=24.284, player_2/loss=286.740, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 327.24it/s, env_step=2048, len=8, n/ep=8, n/st=64, player_1/loss=24.420, player_2/loss=285.228, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 416.52it/s, env_step=3072, len=7, n/ep=10, n/st=64, player_1/loss=43.512, player_2/loss=237.321, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 356.77it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=73.558, player_2/loss=255.709, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 386.75it/s, env_step=5120, len=7, n/ep=9, n/st=64, player_1/loss=45.478, player_2/loss=242.403, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 346.62it/s, env_step=6144, len=8, n/ep=8, n/st=64, player_1/loss=26.052, player_2/loss=259.160, rew=-18.75]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 410.63it/s, env_step=7168, len=11, n/ep=5, n/st=64, player_1/loss=58.731, player_2/loss=272.732, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 397.15it/s, env_step=8192, len=11, n/ep=5, n/st=64, player_1/loss=144.151, player_2/loss=210.830, rew=15.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 366.85it/s, env_step=9216, len=18, n/ep=3, n/st=64, player_1/loss=147.672, player_2/loss=168.448, rew=8.33]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 386.05it/s, env_step=10240, len=22, n/ep=3, n/st=64, player_1/loss=138.395, player_2/loss=130.953, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 387.57it/s, env_step=11264, len=20, n/ep=3, n/st=64, player_1/loss=144.354, player_2/loss=116.469, rew=-8.33]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 384.35it/s, env_step=12288, len=17, n/ep=3, n/st=64, player_1/loss=206.557, player_2/loss=105.107, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 393.51it/s, env_step=13312, len=19, n/ep=4, n/st=64, player_1/loss=168.095, player_2/loss=92.795, rew=0.00]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 409.44it/s, env_step=14336, len=31, n/ep=2, n/st=64, player_1/loss=89.160, player_2/loss=91.871, rew=25.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 404.28it/s, env_step=15360, len=24, n/ep=3, n/st=64, player_1/loss=121.307, player_2/loss=92.201, rew=-8.33]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #16: 1025it [00:02, 399.46it/s, env_step=16384, len=21, n/ep=3, n/st=64, player_1/loss=99.262, rew=-25.00]       


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #17: 1025it [00:02, 408.28it/s, env_step=17408, len=22, n/ep=3, n/st=64, player_1/loss=81.399, player_2/loss=74.647, rew=-8.33]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #18: 1025it [00:02, 396.87it/s, env_step=18432, len=19, n/ep=3, n/st=64, player_1/loss=94.495, player_2/loss=82.968, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #19: 1025it [00:02, 383.70it/s, env_step=19456, len=15, n/ep=4, n/st=64, player_1/loss=108.464, player_2/loss=78.038, rew=-12.50]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #0


Epoch #1: 1025it [00:02, 372.48it/s, env_step=1024, len=21, n/ep=3, n/st=64, player_1/loss=173.513, player_2/loss=119.569, rew=-8.33]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 359.96it/s, env_step=2048, len=18, n/ep=3, n/st=64, player_1/loss=140.955, player_2/loss=123.599, rew=8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:02, 347.01it/s, env_step=3072, len=18, n/ep=4, n/st=64, player_1/loss=82.987, player_2/loss=107.119, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 338.76it/s, env_step=4096, len=23, n/ep=2, n/st=64, player_1/loss=76.197, player_2/loss=118.448, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 338.48it/s, env_step=5120, len=18, n/ep=3, n/st=64, player_1/loss=79.541, player_2/loss=105.189, rew=-8.33]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:02, 341.77it/s, env_step=6144, len=16, n/ep=4, n/st=64, player_1/loss=61.319, player_2/loss=107.842, rew=12.50]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 337.96it/s, env_step=7168, len=17, n/ep=4, n/st=64, player_1/loss=66.271, player_2/loss=123.697, rew=12.50]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 336.91it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=62.681, player_2/loss=85.419, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:02, 354.87it/s, env_step=9216, len=15, n/ep=5, n/st=64, player_1/loss=79.727, player_2/loss=112.512, rew=25.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 329.19it/s, env_step=10240, len=18, n/ep=3, n/st=64, player_1/loss=46.105, player_2/loss=117.521, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 329.02it/s, env_step=11264, len=17, n/ep=3, n/st=64, player_1/loss=49.203, player_2/loss=135.257, rew=-8.33]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 330.28it/s, env_step=12288, len=14, n/ep=4, n/st=64, player_1/loss=125.643, player_2/loss=141.839, rew=12.50]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 338.93it/s, env_step=13312, len=7, n/ep=9, n/st=64, player_1/loss=143.613, player_2/loss=171.555, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 321.89it/s, env_step=14336, len=9, n/ep=9, n/st=64, player_1/loss=91.400, player_2/loss=201.630, rew=19.44]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 327.67it/s, env_step=15360, len=9, n/ep=6, n/st=64, player_1/loss=52.205, player_2/loss=218.050, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 317.36it/s, env_step=16384, len=7, n/ep=10, n/st=64, player_1/loss=26.225, player_2/loss=231.115, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 324.36it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=39.556, player_2/loss=300.357, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 337.81it/s, env_step=18432, len=7, n/ep=7, n/st=64, player_1/loss=72.706, player_2/loss=259.625, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 314.90it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=44.236, player_2/loss=266.811, rew=25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 360.46it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=45.227, player_2/loss=260.683, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 355.50it/s, env_step=2048, len=9, n/ep=7, n/st=64, player_1/loss=56.678, player_2/loss=241.456, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 372.33it/s, env_step=3072, len=7, n/ep=9, n/st=64, player_1/loss=57.194, player_2/loss=258.300, rew=-25.00]


Epoch #3: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #4: 1025it [00:02, 372.29it/s, env_step=4096, len=7, n/ep=8, n/st=64, player_1/loss=41.968, player_2/loss=211.067, rew=-25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #5: 1025it [00:02, 370.19it/s, env_step=5120, len=8, n/ep=7, n/st=64, player_1/loss=61.438, player_2/loss=180.325, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #6: 1025it [00:02, 372.79it/s, env_step=6144, len=8, n/ep=9, n/st=64, player_1/loss=52.253, player_2/loss=137.400, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #7: 1025it [00:02, 371.87it/s, env_step=7168, len=10, n/ep=6, n/st=64, player_1/loss=39.672, player_2/loss=98.234, rew=-25.00]


Epoch #7: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #8: 1025it [00:02, 370.20it/s, env_step=8192, len=8, n/ep=7, n/st=64, player_1/loss=62.776, player_2/loss=71.424, rew=-25.00]


Epoch #8: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #9: 1025it [00:02, 371.88it/s, env_step=9216, len=9, n/ep=6, n/st=64, player_1/loss=61.497, player_2/loss=38.461, rew=-25.00]


Epoch #9: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #10: 1025it [00:02, 370.48it/s, env_step=10240, len=9, n/ep=7, n/st=64, player_1/loss=44.560, player_2/loss=12.707, rew=-25.00]


Epoch #10: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #11: 1025it [00:02, 370.30it/s, env_step=11264, len=9, n/ep=7, n/st=64, player_1/loss=42.040, player_2/loss=11.156, rew=-25.00]


Epoch #11: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #12: 1025it [00:02, 373.52it/s, env_step=12288, len=11, n/ep=6, n/st=64, player_1/loss=31.284, player_2/loss=12.873, rew=-16.67]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #13: 1025it [00:02, 372.64it/s, env_step=13312, len=12, n/ep=6, n/st=64, player_1/loss=25.235, player_2/loss=19.858, rew=-16.67]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #14: 1025it [00:02, 371.64it/s, env_step=14336, len=13, n/ep=5, n/st=64, player_1/loss=39.212, player_2/loss=54.128, rew=-15.00]


Epoch #14: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #15: 1025it [00:02, 361.75it/s, env_step=15360, len=21, n/ep=3, n/st=64, player_1/loss=81.208, player_2/loss=112.523, rew=-25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #16: 1025it [00:02, 354.09it/s, env_step=16384, len=17, n/ep=3, n/st=64, player_1/loss=116.401, player_2/loss=116.554, rew=-25.00]


Epoch #16: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #17: 1025it [00:02, 343.69it/s, env_step=17408, len=17, n/ep=3, n/st=64, player_1/loss=127.201, player_2/loss=89.660, rew=-25.00]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #18: 1025it [00:03, 336.20it/s, env_step=18432, len=21, n/ep=3, n/st=64, player_1/loss=125.890, player_2/loss=108.327, rew=8.33]


Epoch #18: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #19: 1025it [00:03, 337.70it/s, env_step=19456, len=20, n/ep=3, n/st=64, player_1/loss=118.088, player_2/loss=85.401, rew=-25.00]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #15


Epoch #1: 1025it [00:03, 334.67it/s, env_step=1024, len=22, n/ep=3, n/st=64, player_1/loss=126.211, player_2/loss=144.876, rew=8.33]


Epoch #1: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:03, 339.29it/s, env_step=2048, len=26, n/ep=2, n/st=64, player_1/loss=140.838, player_2/loss=119.648, rew=-25.00]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:03, 337.39it/s, env_step=3072, len=27, n/ep=3, n/st=64, player_1/loss=117.515, player_2/loss=100.850, rew=-25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:03, 338.05it/s, env_step=4096, len=24, n/ep=2, n/st=64, player_1/loss=116.738, player_2/loss=127.247, rew=0.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:03, 338.36it/s, env_step=5120, len=26, n/ep=2, n/st=64, player_1/loss=116.252, player_2/loss=115.373, rew=-25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:03, 337.98it/s, env_step=6144, len=25, n/ep=2, n/st=64, player_1/loss=71.154, player_2/loss=82.146, rew=-25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:03, 338.38it/s, env_step=7168, len=27, n/ep=3, n/st=64, player_1/loss=110.232, player_2/loss=58.763, rew=-8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:03, 335.81it/s, env_step=8192, len=20, n/ep=3, n/st=64, player_1/loss=103.089, player_2/loss=50.368, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:03, 335.75it/s, env_step=9216, len=22, n/ep=3, n/st=64, player_1/loss=54.700, player_2/loss=52.875, rew=8.33]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:03, 338.61it/s, env_step=10240, len=21, n/ep=3, n/st=64, player_1/loss=16.446, player_2/loss=60.372, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 352.04it/s, env_step=11264, len=19, n/ep=3, n/st=64, player_1/loss=24.438, player_2/loss=51.779, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:03, 336.82it/s, env_step=12288, len=20, n/ep=4, n/st=64, player_1/loss=56.826, player_2/loss=96.626, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:03, 336.40it/s, env_step=13312, len=21, n/ep=3, n/st=64, player_1/loss=71.755, player_2/loss=136.599, rew=8.33]


Epoch #13: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #14: 1025it [00:03, 339.45it/s, env_step=14336, len=18, n/ep=4, n/st=64, player_1/loss=73.797, player_2/loss=98.455, rew=12.50]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #15: 1025it [00:03, 335.96it/s, env_step=15360, len=7, n/ep=9, n/st=64, player_1/loss=117.165, player_2/loss=165.209, rew=19.44]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #16: 1025it [00:03, 337.72it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=111.546, player_2/loss=230.029, rew=8.33]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #17: 1025it [00:03, 334.51it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=103.119, player_2/loss=223.204, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #18: 1025it [00:03, 336.93it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=140.923, player_2/loss=221.095, rew=8.33]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #19: 1025it [00:03, 337.74it/s, env_step=19456, len=8, n/ep=8, n/st=64, player_1/loss=84.983, player_2/loss=218.065, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #1


Epoch #1: 1025it [00:03, 332.26it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=123.468, player_2/loss=117.022, rew=-25.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 335.67it/s, env_step=2048, len=12, n/ep=5, n/st=64, player_1/loss=150.689, player_2/loss=115.255, rew=25.00]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 337.28it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=253.641, player_2/loss=149.174, rew=12.50]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 336.58it/s, env_step=4096, len=19, n/ep=3, n/st=64, player_1/loss=228.512, player_2/loss=135.496, rew=8.33]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 336.66it/s, env_step=5120, len=19, n/ep=3, n/st=64, player_1/loss=126.525, player_2/loss=110.419, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 339.95it/s, env_step=6144, len=22, n/ep=3, n/st=64, player_1/loss=85.915, player_2/loss=101.060, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 339.77it/s, env_step=7168, len=21, n/ep=3, n/st=64, player_1/loss=86.879, player_2/loss=98.346, rew=8.33]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 338.46it/s, env_step=8192, len=15, n/ep=4, n/st=64, player_1/loss=117.546, player_2/loss=56.085, rew=0.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 336.13it/s, env_step=9216, len=12, n/ep=5, n/st=64, player_1/loss=162.638, player_2/loss=44.702, rew=5.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 334.81it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=182.893, player_2/loss=51.895, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 337.36it/s, env_step=11264, len=14, n/ep=5, n/st=64, player_1/loss=211.928, player_2/loss=53.388, rew=15.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:03, 337.20it/s, env_step=12288, len=13, n/ep=5, n/st=64, player_1/loss=204.362, player_2/loss=40.515, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 333.68it/s, env_step=13312, len=13, n/ep=4, n/st=64, player_1/loss=198.467, player_2/loss=48.359, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:03, 335.59it/s, env_step=14336, len=11, n/ep=6, n/st=64, player_1/loss=235.463, player_2/loss=51.765, rew=16.67]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:03, 336.38it/s, env_step=15360, len=13, n/ep=4, n/st=64, player_1/loss=236.678, player_2/loss=33.807, rew=25.00]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:03, 337.37it/s, env_step=16384, len=14, n/ep=4, n/st=64, player_1/loss=261.221, player_2/loss=32.182, rew=12.50]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 337.32it/s, env_step=17408, len=15, n/ep=4, n/st=64, player_1/loss=246.581, player_2/loss=38.817, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:03, 338.72it/s, env_step=18432, len=12, n/ep=5, n/st=64, player_1/loss=199.195, player_2/loss=66.569, rew=25.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:03, 335.88it/s, env_step=19456, len=16, n/ep=4, n/st=64, player_1/loss=158.355, player_2/loss=65.129, rew=12.50]


Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:03, 337.59it/s, env_step=1024, len=15, n/ep=5, n/st=64, player_1/loss=140.555, player_2/loss=18.100, rew=-15.00]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:03, 336.56it/s, env_step=2048, len=19, n/ep=3, n/st=64, player_1/loss=153.570, player_2/loss=52.205, rew=-8.33]


Epoch #2: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #3: 1025it [00:03, 341.50it/s, env_step=3072, len=16, n/ep=4, n/st=64, player_1/loss=132.352, player_2/loss=160.862, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #4: 1025it [00:03, 336.49it/s, env_step=4096, len=15, n/ep=4, n/st=64, player_1/loss=82.628, player_2/loss=244.183, rew=25.00]


Epoch #4: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #5: 1025it [00:03, 335.32it/s, env_step=5120, len=20, n/ep=3, n/st=64, player_1/loss=128.806, player_2/loss=234.541, rew=-25.00]


Epoch #5: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #6: 1025it [00:03, 338.85it/s, env_step=6144, len=18, n/ep=3, n/st=64, player_1/loss=164.297, player_2/loss=153.254, rew=-25.00]


Epoch #6: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #7: 1025it [00:03, 338.95it/s, env_step=7168, len=14, n/ep=5, n/st=64, player_1/loss=153.547, player_2/loss=212.717, rew=15.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #8: 1025it [00:03, 335.27it/s, env_step=8192, len=11, n/ep=6, n/st=64, player_1/loss=119.012, player_2/loss=296.724, rew=25.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #9: 1025it [00:03, 336.87it/s, env_step=9216, len=11, n/ep=5, n/st=64, player_1/loss=85.517, player_2/loss=366.134, rew=15.00]


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #10: 1025it [00:03, 327.16it/s, env_step=10240, len=11, n/ep=6, n/st=64, player_1/loss=62.210, player_2/loss=362.151, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #11: 1025it [00:03, 339.65it/s, env_step=11264, len=11, n/ep=5, n/st=64, player_1/loss=52.082, player_2/loss=326.074, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #12: 1025it [00:02, 345.48it/s, env_step=12288, len=9, n/ep=6, n/st=64, player_1/loss=59.243, player_2/loss=219.406, rew=25.00]


Epoch #12: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #13: 1025it [00:03, 336.03it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=73.206, player_2/loss=262.204, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #14: 1025it [00:02, 348.16it/s, env_step=14336, len=9, n/ep=7, n/st=64, player_1/loss=57.217, player_2/loss=392.958, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #15: 1025it [00:02, 343.53it/s, env_step=15360, len=7, n/ep=8, n/st=64, player_1/loss=45.808, player_2/loss=407.054, rew=18.75]


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #16: 1025it [00:02, 346.40it/s, env_step=16384, len=7, n/ep=9, n/st=64, player_1/loss=33.670, player_2/loss=383.246, rew=19.44]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #17: 1025it [00:03, 341.10it/s, env_step=17408, len=7, n/ep=9, n/st=64, player_1/loss=67.037, player_2/loss=391.336, rew=19.44]


Epoch #17: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #18: 1025it [00:02, 344.24it/s, env_step=18432, len=7, n/ep=8, n/st=64, player_1/loss=87.447, player_2/loss=413.077, rew=6.25]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #19: 1025it [00:02, 345.62it/s, env_step=19456, len=7, n/ep=9, n/st=64, player_1/loss=77.436, player_2/loss=393.366, rew=19.44]


Epoch #19: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #2


Epoch #1: 1025it [00:02, 346.57it/s, env_step=1024, len=7, n/ep=9, n/st=64, player_1/loss=93.158, player_2/loss=305.682, rew=-19.44]


Epoch #1: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #2: 1025it [00:02, 347.44it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=80.700, player_2/loss=256.407, rew=-19.44]


Epoch #2: test_reward: -25.000000 ± 0.000000, best_reward: -25.000000 ± 0.000000 in #0


Epoch #3: 1025it [00:02, 350.31it/s, env_step=3072, len=12, n/ep=5, n/st=64, player_1/loss=97.503, player_2/loss=177.365, rew=25.00]


Epoch #3: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #4: 1025it [00:02, 348.74it/s, env_step=4096, len=13, n/ep=5, n/st=64, player_1/loss=124.099, player_2/loss=98.933, rew=15.00]


Epoch #4: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #5: 1025it [00:02, 350.05it/s, env_step=5120, len=18, n/ep=4, n/st=64, player_1/loss=132.578, player_2/loss=96.824, rew=25.00]


Epoch #5: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #6: 1025it [00:02, 350.11it/s, env_step=6144, len=12, n/ep=5, n/st=64, player_1/loss=131.753, player_2/loss=80.924, rew=25.00]


Epoch #6: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #7: 1025it [00:02, 348.00it/s, env_step=7168, len=14, n/ep=4, n/st=64, player_1/loss=159.704, player_2/loss=51.857, rew=25.00]


Epoch #7: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #8: 1025it [00:03, 339.51it/s, env_step=8192, len=13, n/ep=5, n/st=64, player_1/loss=191.878, player_2/loss=46.156, rew=15.00]


Epoch #8: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #9: 1025it [00:03, 333.89it/s, env_step=9216, len=12, n/ep=6, n/st=64, player_2/loss=50.546, rew=25.00]          


Epoch #9: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #10: 1025it [00:03, 336.77it/s, env_step=10240, len=12, n/ep=5, n/st=64, player_1/loss=160.721, player_2/loss=33.704, rew=25.00]


Epoch #10: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #11: 1025it [00:03, 336.71it/s, env_step=11264, len=12, n/ep=4, n/st=64, player_1/loss=148.703, player_2/loss=29.580, rew=25.00]


Epoch #11: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #12: 1025it [00:03, 338.89it/s, env_step=12288, len=12, n/ep=5, n/st=64, player_1/loss=155.628, player_2/loss=32.342, rew=25.00]


Epoch #12: test_reward: -25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #13: 1025it [00:03, 339.31it/s, env_step=13312, len=13, n/ep=5, n/st=64, player_1/loss=162.550, player_2/loss=28.405, rew=25.00]


Epoch #13: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #14: 1025it [00:03, 338.06it/s, env_step=14336, len=12, n/ep=5, n/st=64, player_1/loss=175.309, player_2/loss=23.817, rew=25.00]


Epoch #14: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #15: 1025it [00:03, 339.04it/s, env_step=15360, len=12, n/ep=4, n/st=64, player_1/loss=149.944, rew=25.00]       


Epoch #15: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #16: 1025it [00:03, 337.14it/s, env_step=16384, len=13, n/ep=5, n/st=64, player_1/loss=146.001, player_2/loss=11.056, rew=25.00]


Epoch #16: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #17: 1025it [00:03, 336.51it/s, env_step=17408, len=12, n/ep=5, n/st=64, player_1/loss=163.031, player_2/loss=5.963, rew=25.00]


Epoch #17: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #18: 1025it [00:03, 334.56it/s, env_step=18432, len=13, n/ep=5, n/st=64, player_1/loss=207.497, player_2/loss=47.402, rew=15.00]


Epoch #18: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3


Epoch #19: 1025it [00:03, 338.96it/s, env_step=19456, len=11, n/ep=5, n/st=64, player_1/loss=181.844, player_2/loss=80.734, rew=15.00]

Epoch #19: test_reward: 25.000000 ± 0.000000, best_reward: 25.000000 ± 0.000000 in #3





In [40]:
####################################################
# EXPERIMENT: VIEWING THE BEST LEARNED POLICY
####################################################

# Get the environment settings
env = get_env()
observation_space = env.observation_space['observation'] if isinstance(env.observation_space, gym.spaces.Dict) else env.observation_space
state_shape = observation_space.shape or observation_space.n
action_shape = env.action_space.shape or env.action_space.n

# Configure the best agent
best_agent1 = cf_cnn_dqn_policy(state_shape= state_shape,
                                action_shape= action_shape)
best_agent1.load_state_dict(torch.load("./saved_variables/paper_notebooks/7/7-20epoch_500loop/7-looping-iteration-499/best_policy_agent1.pth"))
best_agent1.set_eps(0)


best_agent2 = cf_cnn_dqn_policy(state_shape= state_shape,
                                action_shape= action_shape)
best_agent2.load_state_dict(torch.load("./saved_variables/paper_notebooks/7/7-20epoch_500loop/7-looping-iteration-499/best_policy_agent2.pth"))
best_agent2.set_eps(0)

# Watch the best agent at work
watch(numer_of_games= 3,
      render_speed= 0.3,
      agent_player1= best_agent1,
      agent_player2= best_agent2)



Average steps of game:  11.0
Final mean reward agent 1: -8.333333333333334, std: 23.570226039551585
Final mean reward agent 2: 8.333333333333334, std: 23.570226039551585


In [41]:
####################################################
# EXPERIMENT: VIEWING THE LAST LEARNED POLICY
####################################################

# Configure the final agent
final_agent_player1 = cf_cnn_dqn_policy(state_shape= state_shape,
                                        action_shape= action_shape)
final_agent_player1.load_state_dict(torch.load("./saved_variables/paper_notebooks/7/7-20epoch_500loop/7-looping-iteration-499/final_policy_agent1.pth"))
best_agent1.set_eps(0)

final_agent_player2 = cf_cnn_dqn_policy(state_shape= state_shape,
                                        action_shape= action_shape)
final_agent_player2.load_state_dict(torch.load("./saved_variables/paper_notebooks/7/7-20epoch_500loop/7-looping-iteration-499/final_policy_agent2.pth"))
best_agent2.set_eps(0)

# Watch the best agent at work
watch(numer_of_games= 3,
      render_speed= 0.3,
      agent_player1= final_agent_player1,
      agent_player2= final_agent_player2)



Average steps of game:  12.0
Final mean reward agent 1: -25.0, std: 0.0
Final mean reward agent 2: 25.0, std: 0.0


<hr><hr>

## Discussion

We see that the agent can learn quickly to win against a fixed strategy oponent but the overall performance of the agent is still weak, making human play of very poor quality once again.

In [13]:
####################################################
# CLEAN VARIABLES
####################################################

del action_shape
del agent1
del agent2
del best_agent1
del best_agent2
del env
del final_agent_player1
del final_agent_player2
del observation_space
del off_policy_traininer_results
del state_shape
