# DQN using a CNN

The previous notebook, `5-improving-dqn-architecture`, has shown that a different network strategy can improve the performance of the agent.
However, the agent still plays on a rather amateur level.
To incentivise learning the rules of having four coins in a row, we will use a CNN which will scan the board in a 4x4 manner.

<hr><hr>

## Table of Contents

- Contact information
- Checking requirements
  - Correct Anaconda environment
  - Correct module access
  - Correct CUDA access
- Training two DQN agents on connect four Gym
  - Building the environment
  - Implementing the DQN policy
  - Building agents
  - Function for letting agents learn
  - Function for watching learned agent
  - Doing the experiment
- Discussion

<hr><hr>

## Contact information

| Name             | Student ID | VUB mail                                                  | Personal mail                                               |
| ---------------- | ---------- | --------------------------------------------------------- | ----------------------------------------------------------- |
| Lennert Bontinck | 0568702    | [lennert.bontinck@vub.be](mailto:lennert.bontinck@vub.be) | [info@lennertbontinck.com](mailto:info@lennertbontinck.com) |



<hr><hr>

## Checking requirements

### Correct Anaconda environment

The `rl-project` anaconda environment should be active to ensure proper support. Installation instructions are available on [the GitHub repository of the RL course project and homeworks](https://github.com/pikawika/vub-rl).

In [1]:
####################################################
# CHECKING FOR RIGHT ANACONDA ENVIRONMENT
####################################################

import os
from platform import python_version

print(f"Active environment: {os.environ['CONDA_DEFAULT_ENV']}")
print(f"Correct environment: {os.environ['CONDA_DEFAULT_ENV'] == 'rl-project'}")
print(f"\nPython version: {python_version()}")
print(f"Correct Python version: {python_version() == '3.8.10'}")

Active environment: rl-project
Correct environment: True

Python version: 3.8.10
Correct Python version: True


<hr>

### Correct module access

The following code block will load in all required modules and show if the versions match those that are recommended.

In [2]:
####################################################
# LOADING MODULES
####################################################

# Allow reloading of libraries
import importlib

# Plotting
import matplotlib; print(f"Matplotlib version (3.5.1 recommended): {matplotlib.__version__}")
import matplotlib.pyplot as plt

# Argparser
import argparse

# More data types
import typing
import numpy as np

# Pygame
import pygame; print(f"Pygame version (2.1.2 recommended): {pygame.__version__}")

# Gym environment
import gym; print(f"Gym version (0.21.0 recommended): {gym.__version__}")

# Tianshou for RL algorithms
import tianshou as ts; print(f"Tianshou version (0.4.8 recommended): {ts.__version__}")

# Torch is a popular DL framework
import torch; print(f"Torch version (1.12.0 recommended): {torch.__version__}")

# PPrint is a pretty print for variables
from pprint import pprint

# Our custom connect four gym environment
import sys
sys.path.append('../')
import gym_connect4_pygame.envs.ConnectFourPygameEnvV2 as cfgym
importlib.invalidate_caches()
importlib.reload(cfgym)

# Time for allowing "freezes" in execution
import time;

# Allow for copying objects in a non reference manner
import copy

# Used for updating notebook display
from IPython.display import clear_output

Matplotlib version (3.5.1 recommended): 3.5.1
pygame 2.1.2 (SDL 2.0.18, Python 3.8.10)
Hello from the pygame community. https://www.pygame.org/contribute.html
Pygame version (2.1.2 recommended): 2.1.2
Gym version (0.21.0 recommended): 0.21.0


  from .autonotebook import tqdm as notebook_tqdm


Tianshou version (0.4.8 recommended): 0.4.8
Torch version (1.12.0 recommended): 1.12.0.dev20220520+cu116


<hr>

### Correct CUDA access

The installation instructions specify how to install PyTorch with CUDA 11.6.
The following code block tests if this was done successfully.

In [3]:
####################################################
# CUDA VALIDATION
####################################################

# Check cuda available
print(f"CUDA is available: {torch.cuda.is_available()}")

# Show cuda devices
print(f"\nAmount of connected devices supporting CUDA: {torch.cuda.device_count()}")

# Show current cuda device
print(f"\nCurrent CUDA device: {torch.cuda.current_device()}")

# Show cuda device name
print(f"Cuda device 0 name: {torch.cuda.get_device_name(0)}")

CUDA is available: True

Amount of connected devices supporting CUDA: 1

Current CUDA device: 0
Cuda device 0 name: NVIDIA GeForce GTX 970


<hr><hr>

## Training two DQN agents on connect four Gym

Our connect four gym setup requires two agents, one for each player.
To reduce complexity, agents will always play as the same player, e.g. always as player 1.
It is important to note that connect four is a *solved game*.
According to [The Washington Post](https://www.washingtonpost.com/news/wonk/wp/2015/05/08/how-to-win-any-popular-game-according-to-data-scientists/):

> Connect Four is what mathematicians call a "solved game," meaning you can play it perfectly every time, no matter what your opponent does. You will need to get the first move, but as long as you do so, you can always win within 41 moves.

<hr>

### Building the environment

This code is taken from previous notebooks.
We don't allow invalid moves to make the problem easier for now.

In [4]:
####################################################
# CONNECT FOUR V2 ENVIRONMENT
####################################################

def get_env():
    """
    Returns the connect four gym environment V2 altered for Tianshou and Petting Zoo compatibility.
    Already wrapped with a ts.env.PettingZooEnv wrapper.
    """
    return ts.env.PettingZooEnv(cfgym.env(reward_move= 1, # Set to 1 for reward to make moves (incentivise longer games)
                                          reward_invalid= -3,
                                          reward_draw= 15,
                                          reward_win= 25,
                                          reward_loss= -25,
                                          allow_invalid_move= False))
    
    
# Test the environment
env = get_env()
print(f"Observation space: {env.observation_space}")
print(f"\nAction space: {env.action_space}")

# Reset the environment to start from a clean state, returns the initial observation
observation = env.reset()

print("\n Initial player id:")
print(observation["agent_id"])

print("\n Initial observation:")
print(observation["obs"])

print("\n Initial mask:")
print(observation["mask"])

# Clean unused variables
del observation
del env

Observation space: Dict(action_mask:Box([0 0 0 0 0 0 0], [1 1 1 1 1 1 1], (7,), int8), observation:Box([[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]], [[2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]], (6, 7), int8))

Action space: Discrete(7)

 Initial player id:
player_1

 Initial observation:
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]

 Initial mask:
[True, True, True, True, True, True, True]


<hr>

### Implementing the DQN policy

The DQN policy for the agent is configured and set up below.
The model is now based on a CNN.

In [5]:
####################################################
# DQN ARCHITECTURE
####################################################

class CNNBasedDQN(torch.nn.Module):
    """
    Custom DQN using a model based on CNN
    """
    def __init__(self,
                 state_shape: typing.Sequence[int],
                 action_shape: typing.Sequence[int],
                 device: typing.Union[str, int, torch.device] = 'cuda' if torch.cuda.is_available() else 'cpu',):
        # Parent call
        super().__init__()
        
        # Save device (e.g. cuda)
        self.device = device
        
        # Number of input channels
        input_channels_cnn = 1
        output_channels_cnn = 32
        flatten_size = (state_shape[0] - 3) * (state_shape[1] - 3) * output_channels_cnn
        output_size= np.prod(action_shape)
        
        self.model = torch.nn.Sequential(
            torch.nn.Conv2d(in_channels= input_channels_cnn, out_channels= output_channels_cnn, kernel_size= 4, stride= 1), torch.nn.ReLU(inplace=True),
            torch.nn.Flatten(0,-1),
            torch.nn.Unflatten(0, (1, flatten_size)),
            torch.nn.Linear(flatten_size, 128), torch.nn.ReLU(inplace=True),
            torch.nn.Linear(128, 128), torch.nn.ReLU(inplace=True),
            torch.nn.Linear(128, output_size),
        )

    def forward(self, obs, state=None, info={}):
        if not isinstance(obs, torch.Tensor):
            obs = torch.tensor(obs, dtype=torch.float, device=self.device)
        
        logits = self.model(obs)
        return logits, state


In [6]:
####################################################
# DQN POLICY
####################################################

def cf_dqn_policy(state_shape: tuple,
                  action_shape: tuple,
                  optim: typing.Optional[torch.optim.Optimizer] = None,
                  learning_rate: float =  0.0001,
                  gamma: float = 0.9, # Smaller gamma favours "faster" win
                  n_step: int = 1, # Number of steps to look ahead
                  target_update_freq: int = 320):
    # Use cuda device if possible
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    # Network to be used for DQN
    net = CNNBasedDQN(state_shape, action_shape, device= device).to(device)
    
    # Default optimizer is an adam optimizer with the argparser learning rate
    if optim is None:
        optim = torch.optim.Adam(net.parameters(), lr= learning_rate)
        
    # Our agent DQN policy
    return ts.policy.DQNPolicy(model= net,
                               optim= optim,
                               discount_factor= gamma,
                               estimation_step= n_step,
                               target_update_freq= target_update_freq)

<hr>

### Building agents

Identical to the previous notebook.

In [7]:
####################################################
# AGENT CREATION
####################################################

def get_agents(agent_player1: typing.Optional[ts.policy.BasePolicy] = None,
               agent_player2: typing.Optional[ts.policy.BasePolicy] = None,
               optim: typing.Optional[torch.optim.Optimizer] = None,
               resume_path_player_1: str = '', # Path to file to resume agent training from
               resume_path_player_2: str = '', 
               ) -> typing.Tuple[ts.policy.BasePolicy, torch.optim.Optimizer, list]:
    """
    Gets a multi agent policy manager, optimizer and player ids for the connect four V2 gym environment.
    Per default this returns 
        - Multi agent manager for 2 agents using DQN
        - Adam optimizer
        - ['player_1', 'player_2'] from the connect four environment
    """
    
    # Get the environment to play in (Connect four gym V2)
    env = get_env()
    
    # Get the observation space from the environment, depending on typo of space (ternary operator)
    observation_space = env.observation_space['observation'] if isinstance(env.observation_space, gym.spaces.Dict) else env.observation_space
    
    # Set the arguments
    state_shape = observation_space.shape or observation_space.n
    action_shape = env.action_space.shape or env.action_space.n
    
    # Configure agent player 1 to be a DQN if no policy is passed.
    if agent_player1 is None:
        # Our agent1 uses a DQN policy
        agent_player1 = cf_dqn_policy(state_shape= state_shape,
                                      action_shape= action_shape,
                                      optim= optim)
        
        # If we resume our agent we need to load the previous config
        if resume_path_player_1:
            agent_player1.load_state_dict(torch.load(resume_path_player_1))
    
    # Configure agent player 2 to be a DQN if no policy is passed.
    if agent_player2 is None:
        # Our agent1 uses a DQN policy
        agent_player2 = cf_dqn_policy(state_shape= state_shape,
                                      action_shape= action_shape,
                                      optim= optim)
        
        # If we resume our agent we need to load the previous config
        if resume_path_player_2:
            agent_player2.load_state_dict(torch.load(resume_path_player_2))

    # Both our agents are DQN agents by default
    agents = [agent_player1, agent_player2]
        
    # Our policy depends on the order of the agents
    policy = ts.policy.MultiAgentPolicyManager(agents, env)
    
    # Return our policy, optimizer and the available agents in the environment
    # Per default: 
    #   - Multi agent manager for 2 agents using DQN
    #   - Adam optimizer
    #   - ['player_1', 'player_2'] from the connect four environment
    
    return policy, optim, env.agents

<hr>

### Function for letting agents learn

Identical to the previous notebook.

In [8]:
####################################################
# AGENT TRAINING
####################################################

def train_agent(filename: str = "dqn_vs_dqn_cnn_based",
                agent_player1: typing.Optional[ts.policy.BasePolicy] = None,
                agent_player2: typing.Optional[ts.policy.BasePolicy] = None,
                optim: typing.Optional[torch.optim.Optimizer] = None,
                training_env_num: int = 1,
                testing_env_num: int = 1,
                buffer_size: int = 2^14,
                batch_size: int = 1, 
                epochs: int = 50, #50
                step_per_epoch: int = 1024, #1024
                step_per_collect: int = 64, # transition before update
                update_per_step: float = 0.1,
                testing_eps: float = 0.05,
                training_eps: float = 0.1,
                ) -> typing.Tuple[dict, ts.policy.BasePolicy]:
    """
    Trains two agents in the connect four V2 environment and saves their best model and logs.
    Returns:
        - result from offpolicy_trainer
        - final version of agent 1
        - final version of agent 2
    """

    # ======== notebook specific =========
    notebook_version = '6' # Used for foldering logs and models

    # ======== environment setup =========
    train_envs = ts.env.DummyVectorEnv([get_env for _ in range(training_env_num)])
    test_envs = ts.env.DummyVectorEnv([get_env for _ in range(testing_env_num)])
    
    # set the seed for reproducibility
    np.random.seed(1998)
    torch.manual_seed(1998)
    train_envs.seed(1998)
    test_envs.seed(1998)

    # ======== agent setup =========
    # Gets our agents from the previously made function
    # Per default: 
    #   - Multi agent manager for 2 agents using DQN
    #   - Adam optimizer
    #   - ['player_1', 'player_2'] from the connect four environment
    policy, optim, agents = get_agents(agent_player1=agent_player1,
                                       agent_player2=agent_player2,
                                       optim=optim)

    # ======== collector setup =========
    # Make a collector for the training environments
    train_collector = ts.data.Collector(policy= policy,
                                        env= train_envs,
                                        buffer= ts.data.VectorReplayBuffer(buffer_size, len(train_envs)),
                                        exploration_noise= True)
    
    # Make a collector for the testing environments
    test_collector = ts.data.Collector(policy= policy,
                                       env= test_envs,
                                       buffer= ts.data.VectorReplayBuffer(buffer_size, len(test_envs)),
                                       exploration_noise= True)
    
    # Uncomment below if you want to set epsilon in epsilon policy
    # policy.set_eps(1)
    
    # Collect data fot the training evnironments
    train_collector.collect(n_step= batch_size * training_env_num)
    
    # ======== ensure folders exist =========
    if not os.path.exists(os.path.join('./logs', 'paper_notebooks', notebook_version, filename)):
        os.makedirs(os.path.join('./logs', 'paper_notebooks', notebook_version, filename))
    if not os.path.exists(os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename)):
        os.makedirs(os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename))

    # ======== tensorboard logging setup =========
    # Allows to save the training progress to tensorboard compatable logs
    log_path = os.path.join('./logs', 'paper_notebooks', notebook_version, filename)
    writer = torch.utils.tensorboard.SummaryWriter(log_path)
    logger = ts.utils.TensorboardLogger(writer)

    # ======== callback functions used during training =========
    # We want to save our best policy
    def save_best_fn(policy):
        """
        Callback to save the best model
        """
        # Save best agent 1
        model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'best_policy_agent1.pth')
        torch.save(policy.policies[agents[0]].state_dict(), model_save_path)
        
        # Save best agent 2
        model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'best_policy_agent2.pth')
        torch.save(policy.policies[agents[1]].state_dict(), model_save_path)
        
        # Save agent2

    def stop_fn(mean_rewards):
        """
        Callback to stop training when we've reached the win rate
        """
        return mean_rewards >= 7 # (win = 10, 70% win without invalid moves = mean of 7)

    def train_fn(epoch, env_step):
        """
        Callback before training
        """        
        # Before training we want to configure the epsilon for the agents
        # In general more exploratory than the test case
        policy.policies[agents[0]].set_eps(training_eps)
        policy.policies[agents[1]].set_eps(training_eps)

    def test_fn(epoch, env_step):
        """
        Callback beore testing
        """        
        # Before testing we want to configure the epsilon for the agents
        # In general more greedy than the train case but not
        #   to avoid getting stuck on invalid moves
        policy.policies[agents[0]].set_eps(testing_eps)
        policy.policies[agents[1]].set_eps(testing_eps)

    def reward_metric(rews):
        """
        Callback for reward collection
        """
        # We are interested in having a high total total reward,
        #   as this would mean equally good agents.
        return rews[:, 0] + rews[:, 1]

    # trainer
    result = ts.trainer.offpolicy_trainer(policy= policy,
                                          train_collector= train_collector,
                                          test_collector= test_collector,
                                          max_epoch= epochs,
                                          step_per_epoch= step_per_epoch,
                                          step_per_collect= step_per_collect,
                                          episode_per_test= testing_env_num,
                                          batch_size= batch_size,
                                          train_fn= train_fn,
                                          test_fn= test_fn,
                                          # Stop function to stop before specified amount of epochs
                                          #stop_fn= stop_fn
                                          save_best_fn= save_best_fn,
                                          update_per_step= update_per_step,
                                          logger= logger,
                                          test_in_train= False,
                                          reward_metric= reward_metric)
    
    # Save final agent 1
    model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'final_policy_agent1.pth')
    torch.save(policy.policies[agents[0]].state_dict(), model_save_path)

    # Save final agent 2
    model_save_path = os.path.join('./saved_variables', 'paper_notebooks', notebook_version, filename, 'final_policy_agent2.pth')
    torch.save(policy.policies[agents[1]].state_dict(), model_save_path)

    return result, policy.policies[agents[0]], policy.policies[agents[1]]

<hr>

### Function for watching learned agent

Identical to the previous notebook.

In [9]:
####################################################
# WATCHING THE LEARNED POLICY IN ACTION
####################################################

def watch(numer_of_games: int = 3,
          agent_player1: typing.Optional[ts.policy.BasePolicy] = None,
          agent_player2: typing.Optional[ts.policy.BasePolicy] = None,
          test_epsilon: float = 0.05, # For the watching we act completely greedy but low random for not getting stuck on invalid move
          render_speed: float = 0.15, # Amount of seconds to update frame/ do a step
          ) -> None:
    
    # Get the connect four V2 environment (must be a list)
    env= ts.env.DummyVectorEnv([get_env])
    
    # Get the agents from the trained agents
    policy, optim, agents = get_agents(agent_player1= agent_player1,
                                       agent_player2= agent_player2)
    
    # Evaluate the policy
    policy.eval()
    
    # Set the testing policy epsilon for our agents
    policy.policies[agents[0]].set_eps(test_epsilon)
    policy.policies[agents[1]].set_eps(test_epsilon)
    
    # Collect the test data
    collector = ts.data.Collector(policy= policy,
                                  env= env,
                                  exploration_noise= True)
    
    # Render games in human mode to see how it plays
    result = collector.collect(n_episode= numer_of_games, render= render_speed)
    
    # Close the environment aftering collecting the results
    # This closes the pygame window after completion
    env.close()
    
    # Get the rewards and length from the test trials
    rewards, length = result["rews"], result["lens"]
    
    # Print the final reward for the first agent
    print(f"Average steps of game:  {length.mean()}")
    print(f"Final mean reward agent 1: {rewards[:, 0].mean()}, std: {rewards[:, 0].std()}")
    print(f"Final mean reward agent 2: {rewards[:, 1].mean()}, std: {rewards[:, 1].std()}")

<hr>

### Doing the experiment

We now do the experiment with using our previously created functions.
We update some parameter settings to find if we can improve our DQN agents.

In [10]:
####################################################
# EXPERIMENT: TRAINING AGENTS
####################################################

# Get the environment settings
env = get_env()
observation_space = env.observation_space['observation'] if isinstance(env.observation_space, gym.spaces.Dict) else env.observation_space
state_shape = observation_space.shape or observation_space.n
action_shape = env.action_space.shape or env.action_space.n

# Configure the agents
agent1 = cf_dqn_policy(state_shape= state_shape,
                       action_shape= action_shape,
                       gamma= 0.95, # Favour shorter solutions if small
                       n_step= 6)


agent2 = cf_dqn_policy(state_shape= state_shape,
                       action_shape= action_shape,
                       gamma= 0.95, # Favour shorter solutions if small
                       n_step= 6)

# Train the agent
off_policy_traininer_results, final_agent_player1, final_agent_player2 = train_agent(epochs= 5000,
                                                                                     training_eps= 0.2)

Epoch #1: 1025it [00:03, 312.55it/s, env_step=1024, len=8, n/ep=8, n/st=64, player_1/loss=211.823, player_2/loss=143.704, rew=77.75]


Epoch #1: test_reward: 70.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #2: 1025it [00:02, 449.46it/s, env_step=2048, len=7, n/ep=9, n/st=64, player_1/loss=230.406, player_2/loss=108.724, rew=63.11]


Epoch #2: test_reward: 70.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #3: 1025it [00:02, 455.30it/s, env_step=3072, len=8, n/ep=8, n/st=64, player_1/loss=244.803, player_2/loss=121.406, rew=78.50]


Epoch #3: test_reward: 54.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #4: 1025it [00:02, 454.40it/s, env_step=4096, len=10, n/ep=6, n/st=64, player_1/loss=220.438, player_2/loss=92.457, rew=118.67]


Epoch #4: test_reward: 54.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #5: 1025it [00:02, 454.61it/s, env_step=5120, len=8, n/ep=8, n/st=64, player_1/loss=190.600, player_2/loss=83.077, rew=87.50]


Epoch #5: test_reward: 54.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #6: 1025it [00:02, 453.56it/s, env_step=6144, len=9, n/ep=7, n/st=64, player_1/loss=121.326, player_2/loss=99.075, rew=106.29]


Epoch #6: test_reward: 54.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #7: 1025it [00:02, 451.81it/s, env_step=7168, len=8, n/ep=8, n/st=64, player_1/loss=98.074, player_2/loss=110.373, rew=73.25]


Epoch #7: test_reward: 54.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #8: 1025it [00:02, 454.29it/s, env_step=8192, len=7, n/ep=8, n/st=64, player_1/loss=57.565, player_2/loss=87.461, rew=66.50]


Epoch #8: test_reward: 54.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #9: 1025it [00:02, 455.84it/s, env_step=9216, len=9, n/ep=7, n/st=64, player_1/loss=45.730, player_2/loss=55.404, rew=94.57]


Epoch #9: test_reward: 54.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #10: 1025it [00:02, 453.94it/s, env_step=10240, len=7, n/ep=8, n/st=64, player_1/loss=39.185, player_2/loss=36.590, rew=67.00]


Epoch #10: test_reward: 54.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #11: 1025it [00:02, 454.52it/s, env_step=11264, len=8, n/ep=7, n/st=64, player_1/loss=28.220, player_2/loss=37.073, rew=76.29]


Epoch #11: test_reward: 54.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #12: 1025it [00:02, 456.42it/s, env_step=12288, len=8, n/ep=8, n/st=64, player_1/loss=28.875, player_2/loss=29.216, rew=77.25]


Epoch #12: test_reward: 70.000000 ± 0.000000, best_reward: 70.000000 ± 0.000000 in #1


Epoch #13: 1025it [00:02, 454.40it/s, env_step=13312, len=9, n/ep=7, n/st=64, player_1/loss=43.946, player_2/loss=35.153, rew=101.71]


Epoch #13: test_reward: 88.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #14: 1025it [00:02, 442.54it/s, env_step=14336, len=10, n/ep=6, n/st=64, player_1/loss=62.434, player_2/loss=53.332, rew=152.00]


Epoch #14: test_reward: 70.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #15: 1025it [00:02, 415.13it/s, env_step=15360, len=8, n/ep=7, n/st=64, player_1/loss=51.411, player_2/loss=55.600, rew=80.57]


Epoch #15: test_reward: 70.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #16: 1025it [00:02, 393.04it/s, env_step=16384, len=8, n/ep=8, n/st=64, player_1/loss=35.074, player_2/loss=83.379, rew=81.00]


Epoch #16: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #17: 1025it [00:02, 398.48it/s, env_step=17408, len=8, n/ep=8, n/st=64, player_1/loss=39.974, player_2/loss=70.936, rew=71.75]


Epoch #17: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #18: 1025it [00:02, 391.76it/s, env_step=18432, len=7, n/ep=9, n/st=64, player_1/loss=45.595, player_2/loss=44.963, rew=67.33]


Epoch #18: test_reward: 88.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #19: 1025it [00:02, 379.46it/s, env_step=19456, len=10, n/ep=6, n/st=64, player_1/loss=48.861, player_2/loss=50.664, rew=119.33]


Epoch #19: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #20: 1025it [00:02, 365.72it/s, env_step=20480, len=8, n/ep=8, n/st=64, player_1/loss=56.719, player_2/loss=41.846, rew=78.00]


Epoch #20: test_reward: 88.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #21: 1025it [00:02, 355.94it/s, env_step=21504, len=8, n/ep=9, n/st=64, player_1/loss=60.398, player_2/loss=59.584, rew=97.33]


Epoch #21: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #22: 1025it [00:02, 356.77it/s, env_step=22528, len=8, n/ep=8, n/st=64, player_1/loss=49.098, player_2/loss=52.037, rew=73.25]


Epoch #22: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #23: 1025it [00:02, 356.86it/s, env_step=23552, len=8, n/ep=7, n/st=64, player_1/loss=29.372, player_2/loss=61.004, rew=75.43]


Epoch #23: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #24: 1025it [00:02, 356.24it/s, env_step=24576, len=8, n/ep=7, n/st=64, player_1/loss=22.004, player_2/loss=52.106, rew=80.86]


Epoch #24: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #25: 1025it [00:02, 355.87it/s, env_step=25600, len=7, n/ep=8, n/st=64, player_1/loss=40.930, player_2/loss=53.629, rew=67.00]


Epoch #25: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #26: 1025it [00:02, 357.36it/s, env_step=26624, len=8, n/ep=8, n/st=64, player_1/loss=33.677, player_2/loss=56.985, rew=78.00]


Epoch #26: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #27: 1025it [00:02, 356.20it/s, env_step=27648, len=9, n/ep=7, n/st=64, player_1/loss=23.274, player_2/loss=51.016, rew=90.29]


Epoch #27: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #28: 1025it [00:02, 354.67it/s, env_step=28672, len=8, n/ep=8, n/st=64, player_1/loss=33.183, player_2/loss=46.958, rew=71.50]


Epoch #28: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #29: 1025it [00:02, 355.38it/s, env_step=29696, len=8, n/ep=7, n/st=64, player_1/loss=22.104, player_2/loss=53.923, rew=85.43]


Epoch #29: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #30: 1025it [00:02, 357.54it/s, env_step=30720, len=8, n/ep=7, n/st=64, player_1/loss=17.063, player_2/loss=65.718, rew=85.14]


Epoch #30: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #31: 1025it [00:02, 356.96it/s, env_step=31744, len=8, n/ep=8, n/st=64, player_1/loss=30.848, player_2/loss=77.396, rew=78.25]


Epoch #31: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #32: 1025it [00:02, 355.55it/s, env_step=32768, len=7, n/ep=8, n/st=64, player_1/loss=49.769, player_2/loss=83.865, rew=60.25]


Epoch #32: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #33: 1025it [00:02, 356.14it/s, env_step=33792, len=8, n/ep=7, n/st=64, player_1/loss=29.657, player_2/loss=65.420, rew=80.86]


Epoch #33: test_reward: 70.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #34: 1025it [00:02, 355.19it/s, env_step=34816, len=9, n/ep=7, n/st=64, player_1/loss=17.390, player_2/loss=59.816, rew=94.00]


Epoch #34: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #35: 1025it [00:02, 356.77it/s, env_step=35840, len=9, n/ep=7, n/st=64, player_1/loss=15.121, player_2/loss=63.150, rew=125.43]


Epoch #35: test_reward: 70.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #36: 1025it [00:02, 356.73it/s, env_step=36864, len=10, n/ep=6, n/st=64, player_1/loss=18.797, player_2/loss=58.592, rew=137.33]


Epoch #36: test_reward: 70.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #37: 1025it [00:02, 354.78it/s, env_step=37888, len=8, n/ep=8, n/st=64, player_1/loss=31.488, player_2/loss=50.038, rew=73.75]


Epoch #37: test_reward: 70.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #38: 1025it [00:02, 354.56it/s, env_step=38912, len=9, n/ep=7, n/st=64, player_1/loss=25.194, player_2/loss=47.631, rew=89.43]


Epoch #38: test_reward: 54.000000 ± 0.000000, best_reward: 88.000000 ± 0.000000 in #13


Epoch #39: 1025it [00:02, 356.10it/s, env_step=39936, len=13, n/ep=5, n/st=64, player_1/loss=13.547, player_2/loss=56.649, rew=191.20]


Epoch #39: test_reward: 130.000000 ± 0.000000, best_reward: 130.000000 ± 0.000000 in #39


Epoch #40: 1025it [00:02, 354.73it/s, env_step=40960, len=8, n/ep=7, n/st=64, player_1/loss=16.288, player_2/loss=44.465, rew=78.29]


Epoch #40: test_reward: 54.000000 ± 0.000000, best_reward: 130.000000 ± 0.000000 in #39


Epoch #41: 1025it [00:02, 356.38it/s, env_step=41984, len=12, n/ep=5, n/st=64, player_1/loss=16.689, player_2/loss=33.588, rew=165.20]


Epoch #41: test_reward: 130.000000 ± 0.000000, best_reward: 130.000000 ± 0.000000 in #39


Epoch #42: 1025it [00:02, 356.64it/s, env_step=43008, len=16, n/ep=4, n/st=64, player_1/loss=28.236, player_2/loss=49.591, rew=275.50]


Epoch #42: test_reward: 340.000000 ± 0.000000, best_reward: 340.000000 ± 0.000000 in #42


Epoch #43: 1025it [00:02, 357.12it/s, env_step=44032, len=15, n/ep=4, n/st=64, player_1/loss=27.268, player_2/loss=48.868, rew=250.00]


Epoch #43: test_reward: 238.000000 ± 0.000000, best_reward: 340.000000 ± 0.000000 in #42


Epoch #44: 1025it [00:02, 356.06it/s, env_step=45056, len=7, n/ep=8, n/st=64, player_1/loss=27.270, player_2/loss=40.406, rew=68.25]


Epoch #44: test_reward: 54.000000 ± 0.000000, best_reward: 340.000000 ± 0.000000 in #42


Epoch #45: 1025it [00:02, 356.90it/s, env_step=46080, len=19, n/ep=4, n/st=64, player_1/loss=41.937, player_2/loss=51.056, rew=386.00]


Epoch #45: test_reward: 418.000000 ± 0.000000, best_reward: 418.000000 ± 0.000000 in #45


Epoch #46: 1025it [00:02, 356.02it/s, env_step=47104, len=19, n/ep=4, n/st=64, player_1/loss=47.082, player_2/loss=51.652, rew=482.50]


Epoch #46: test_reward: 810.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #47: 1025it [00:02, 356.71it/s, env_step=48128, len=10, n/ep=6, n/st=64, player_1/loss=35.396, player_2/loss=25.999, rew=126.33]


Epoch #47: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #48: 1025it [00:02, 356.30it/s, env_step=49152, len=7, n/ep=8, n/st=64, player_1/loss=20.236, player_2/loss=28.801, rew=66.25]


Epoch #48: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #49: 1025it [00:02, 357.02it/s, env_step=50176, len=10, n/ep=5, n/st=64, player_1/loss=14.333, player_2/loss=33.491, rew=128.40]


Epoch #49: test_reward: 130.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #50: 1025it [00:02, 354.99it/s, env_step=51200, len=14, n/ep=4, n/st=64, player_1/loss=21.676, player_2/loss=23.281, rew=215.50]


Epoch #50: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #51: 1025it [00:02, 356.10it/s, env_step=52224, len=15, n/ep=4, n/st=64, player_1/loss=36.852, player_2/loss=22.031, rew=268.50]


Epoch #51: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #52: 1025it [00:02, 354.32it/s, env_step=53248, len=11, n/ep=6, n/st=64, player_1/loss=36.817, player_2/loss=15.195, rew=189.00]


Epoch #52: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #53: 1025it [00:02, 356.73it/s, env_step=54272, len=7, n/ep=9, n/st=64, player_1/loss=28.583, player_2/loss=14.604, rew=57.78]


Epoch #53: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #54: 1025it [00:02, 356.47it/s, env_step=55296, len=12, n/ep=5, n/st=64, player_1/loss=22.516, player_2/loss=18.777, rew=242.40]


Epoch #54: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #55: 1025it [00:02, 354.81it/s, env_step=56320, len=7, n/ep=9, n/st=64, player_1/loss=26.756, player_2/loss=35.250, rew=64.22]


Epoch #55: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #56: 1025it [00:02, 355.36it/s, env_step=57344, len=8, n/ep=8, n/st=64, player_1/loss=19.735, player_2/loss=27.194, rew=84.25]


Epoch #56: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #57: 1025it [00:02, 349.45it/s, env_step=58368, len=8, n/ep=7, n/st=64, player_1/loss=16.955, player_2/loss=31.138, rew=86.57]


Epoch #57: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #58: 1025it [00:02, 363.39it/s, env_step=59392, len=9, n/ep=6, n/st=64, player_1/loss=12.455, player_2/loss=29.039, rew=93.33]


Epoch #58: test_reward: 88.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #59: 1025it [00:02, 368.43it/s, env_step=60416, len=7, n/ep=9, n/st=64, player_1/loss=28.098, player_2/loss=52.614, rew=63.33]


Epoch #59: test_reward: 154.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #60: 1025it [00:02, 366.84it/s, env_step=61440, len=9, n/ep=7, n/st=64, player_1/loss=38.682, player_2/loss=78.727, rew=96.29]


Epoch #60: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #61: 1025it [00:02, 366.46it/s, env_step=62464, len=8, n/ep=8, n/st=64, player_1/loss=31.523, player_2/loss=82.187, rew=86.25]


Epoch #61: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #62: 1025it [00:02, 368.17it/s, env_step=63488, len=10, n/ep=6, n/st=64, player_1/loss=22.148, player_2/loss=60.605, rew=131.67]


Epoch #62: test_reward: 378.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #63: 1025it [00:02, 368.50it/s, env_step=64512, len=7, n/ep=8, n/st=64, player_1/loss=11.633, player_2/loss=50.771, rew=66.75]


Epoch #63: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #64: 1025it [00:02, 356.62it/s, env_step=65536, len=8, n/ep=7, n/st=64, player_1/loss=27.377, player_2/loss=139.585, rew=82.57]


Epoch #64: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #65: 1025it [00:02, 355.32it/s, env_step=66560, len=9, n/ep=7, n/st=64, player_1/loss=40.737, player_2/loss=152.238, rew=104.57]


Epoch #65: test_reward: 130.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #66: 1025it [00:02, 355.19it/s, env_step=67584, len=13, n/ep=5, n/st=64, player_1/loss=50.992, player_2/loss=109.293, rew=232.00]


Epoch #66: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #67: 1025it [00:02, 356.80it/s, env_step=68608, len=27, n/ep=2, n/st=64, player_1/loss=60.407, player_2/loss=101.541, rew=782.00]


Epoch #67: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #68: 1025it [00:02, 353.20it/s, env_step=69632, len=9, n/ep=7, n/st=64, player_1/loss=39.628, player_2/loss=69.226, rew=110.29]


Epoch #68: test_reward: 88.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #69: 1025it [00:02, 355.94it/s, env_step=70656, len=21, n/ep=3, n/st=64, player_1/loss=92.310, player_2/loss=93.460, rew=460.00]


Epoch #69: test_reward: 130.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #70: 1025it [00:02, 356.52it/s, env_step=71680, len=8, n/ep=7, n/st=64, player_1/loss=84.351, player_2/loss=130.991, rew=88.86]


Epoch #70: test_reward: 550.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #71: 1025it [00:02, 354.88it/s, env_step=72704, len=15, n/ep=4, n/st=64, player_1/loss=34.613, player_2/loss=93.990, rew=259.50]


Epoch #71: test_reward: 340.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #72: 1025it [00:02, 356.91it/s, env_step=73728, len=15, n/ep=4, n/st=64, player_1/loss=46.135, player_2/loss=68.311, rew=271.00]


Epoch #72: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #73: 1025it [00:02, 354.11it/s, env_step=74752, len=16, n/ep=4, n/st=64, player_1/loss=43.478, player_2/loss=86.064, rew=295.00]


Epoch #73: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #74: 1025it [00:02, 355.48it/s, env_step=75776, len=16, n/ep=4, n/st=64, player_1/loss=11.376, player_2/loss=249.340, rew=293.00]


Epoch #74: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #75: 1025it [00:02, 354.00it/s, env_step=76800, len=13, n/ep=4, n/st=64, player_1/loss=30.476, player_2/loss=218.133, rew=198.50]


Epoch #75: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #76: 1025it [00:02, 354.87it/s, env_step=77824, len=14, n/ep=5, n/st=64, player_1/loss=97.737, player_2/loss=125.768, rew=222.40]


Epoch #76: test_reward: 340.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #77: 1025it [00:02, 357.62it/s, env_step=78848, len=17, n/ep=4, n/st=64, player_2/loss=96.497, rew=329.50]       


Epoch #77: test_reward: 270.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #78: 1025it [00:02, 354.53it/s, env_step=79872, len=13, n/ep=5, n/st=64, player_1/loss=72.153, player_2/loss=78.943, rew=191.20]


Epoch #78: test_reward: 208.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #79: 1025it [00:02, 356.97it/s, env_step=80896, len=8, n/ep=8, n/st=64, player_1/loss=46.267, player_2/loss=81.834, rew=75.25]


Epoch #79: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #80: 1025it [00:02, 355.63it/s, env_step=81920, len=17, n/ep=4, n/st=64, player_1/loss=63.984, player_2/loss=119.183, rew=313.00]


Epoch #80: test_reward: 598.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #81: 1025it [00:02, 355.29it/s, env_step=82944, len=8, n/ep=8, n/st=64, player_1/loss=66.793, player_2/loss=254.136, rew=81.50]


Epoch #81: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #82: 1025it [00:02, 355.40it/s, env_step=83968, len=14, n/ep=4, n/st=64, player_1/loss=52.536, player_2/loss=218.160, rew=237.50]


Epoch #82: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #83: 1025it [00:02, 355.83it/s, env_step=84992, len=15, n/ep=4, n/st=64, player_1/loss=74.359, player_2/loss=318.820, rew=246.00]


Epoch #83: test_reward: 208.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #84: 1025it [00:02, 356.42it/s, env_step=86016, len=13, n/ep=5, n/st=64, player_1/loss=53.544, player_2/loss=225.162, rew=194.40]


Epoch #84: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #85: 1025it [00:02, 356.57it/s, env_step=87040, len=15, n/ep=4, n/st=64, player_1/loss=86.651, player_2/loss=189.636, rew=263.50]


Epoch #85: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #86: 1025it [00:02, 356.53it/s, env_step=88064, len=16, n/ep=3, n/st=64, player_1/loss=124.823, player_2/loss=175.122, rew=300.67]


Epoch #86: test_reward: 304.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #87: 1025it [00:02, 355.82it/s, env_step=89088, len=14, n/ep=5, n/st=64, player_1/loss=93.191, player_2/loss=91.940, rew=220.40]


Epoch #87: test_reward: 270.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #88: 1025it [00:02, 356.95it/s, env_step=90112, len=15, n/ep=4, n/st=64, player_1/loss=91.661, player_2/loss=53.821, rew=257.00]


Epoch #88: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #89: 1025it [00:02, 357.20it/s, env_step=91136, len=22, n/ep=3, n/st=64, player_1/loss=138.960, player_2/loss=106.787, rew=535.33]


Epoch #89: test_reward: 504.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #90: 1025it [00:02, 355.52it/s, env_step=92160, len=19, n/ep=3, n/st=64, player_1/loss=136.268, player_2/loss=165.979, rew=390.67]


Epoch #90: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #91: 1025it [00:02, 356.75it/s, env_step=93184, len=16, n/ep=4, n/st=64, player_1/loss=104.897, player_2/loss=200.224, rew=294.00]


Epoch #91: test_reward: 418.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #92: 1025it [00:02, 354.49it/s, env_step=94208, len=21, n/ep=3, n/st=64, player_1/loss=101.488, player_2/loss=175.304, rew=476.00]


Epoch #92: test_reward: 340.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #93: 1025it [00:02, 356.57it/s, env_step=95232, len=14, n/ep=4, n/st=64, player_1/loss=48.527, player_2/loss=188.979, rew=232.00]


Epoch #93: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #94: 1025it [00:02, 357.34it/s, env_step=96256, len=23, n/ep=3, n/st=64, player_1/loss=77.197, player_2/loss=151.046, rew=552.00]


Epoch #94: test_reward: 460.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #95: 1025it [00:02, 357.45it/s, env_step=97280, len=21, n/ep=3, n/st=64, player_1/loss=139.704, player_2/loss=74.382, rew=477.33]


Epoch #95: test_reward: 378.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #96: 1025it [00:02, 354.75it/s, env_step=98304, len=22, n/ep=3, n/st=64, player_1/loss=77.399, player_2/loss=21.410, rew=525.33]


Epoch #96: test_reward: 418.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #97: 1025it [00:02, 355.13it/s, env_step=99328, len=21, n/ep=3, n/st=64, player_1/loss=31.096, player_2/loss=13.520, rew=481.33]


Epoch #97: test_reward: 340.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #98: 1025it [00:02, 351.38it/s, env_step=100352, len=12, n/ep=5, n/st=64, player_1/loss=146.173, player_2/loss=163.999, rew=201.60]


Epoch #98: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #99: 1025it [00:02, 343.32it/s, env_step=101376, len=15, n/ep=4, n/st=64, player_1/loss=114.250, player_2/loss=247.035, rew=246.50]


Epoch #99: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #100: 1025it [00:02, 363.44it/s, env_step=102400, len=8, n/ep=8, n/st=64, player_1/loss=95.116, player_2/loss=237.713, rew=71.25]


Epoch #100: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #101: 1025it [00:02, 367.22it/s, env_step=103424, len=8, n/ep=9, n/st=64, player_2/loss=319.919, rew=87.33]      


Epoch #101: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #102: 1025it [00:02, 366.29it/s, env_step=104448, len=9, n/ep=7, n/st=64, player_1/loss=34.994, player_2/loss=264.835, rew=99.43]


Epoch #102: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #103: 1025it [00:02, 369.22it/s, env_step=105472, len=10, n/ep=6, n/st=64, player_1/loss=39.995, player_2/loss=165.014, rew=137.67]


Epoch #103: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #104: 1025it [00:02, 367.25it/s, env_step=106496, len=9, n/ep=6, n/st=64, player_1/loss=58.130, player_2/loss=201.838, rew=99.00]


Epoch #104: test_reward: 88.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #105: 1025it [00:02, 366.40it/s, env_step=107520, len=9, n/ep=6, n/st=64, player_1/loss=56.798, player_2/loss=140.035, rew=112.67]


Epoch #105: test_reward: 88.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #106: 1025it [00:02, 369.12it/s, env_step=108544, len=9, n/ep=7, n/st=64, player_1/loss=106.675, player_2/loss=88.942, rew=98.00]


Epoch #106: test_reward: 108.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #107: 1025it [00:02, 369.20it/s, env_step=109568, len=9, n/ep=7, n/st=64, player_1/loss=134.903, player_2/loss=142.200, rew=116.29]


Epoch #107: test_reward: 88.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #108: 1025it [00:02, 367.71it/s, env_step=110592, len=18, n/ep=4, n/st=64, player_1/loss=215.788, player_2/loss=166.758, rew=350.00]


Epoch #108: test_reward: 130.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #109: 1025it [00:02, 366.32it/s, env_step=111616, len=8, n/ep=8, n/st=64, player_1/loss=219.094, player_2/loss=95.776, rew=81.50]


Epoch #109: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #110: 1025it [00:02, 367.41it/s, env_step=112640, len=12, n/ep=5, n/st=64, player_1/loss=96.900, player_2/loss=135.536, rew=184.80]


Epoch #110: test_reward: 130.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #111: 1025it [00:02, 366.72it/s, env_step=113664, len=13, n/ep=6, n/st=64, player_1/loss=54.261, player_2/loss=98.659, rew=190.67]


Epoch #111: test_reward: 88.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #112: 1025it [00:02, 368.37it/s, env_step=114688, len=14, n/ep=4, n/st=64, player_1/loss=60.331, player_2/loss=133.183, rew=234.00]


Epoch #112: test_reward: 304.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #113: 1025it [00:02, 367.86it/s, env_step=115712, len=16, n/ep=4, n/st=64, player_1/loss=99.591, player_2/loss=201.505, rew=297.50]


Epoch #113: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #114: 1025it [00:02, 367.76it/s, env_step=116736, len=9, n/ep=7, n/st=64, player_1/loss=89.487, player_2/loss=232.189, rew=92.86]


Epoch #114: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #115: 1025it [00:02, 368.13it/s, env_step=117760, len=12, n/ep=5, n/st=64, player_1/loss=98.326, player_2/loss=161.211, rew=210.40]


Epoch #115: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #116: 1025it [00:02, 367.37it/s, env_step=118784, len=12, n/ep=6, n/st=64, player_1/loss=133.773, player_2/loss=127.766, rew=208.33]


Epoch #116: test_reward: 54.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #117: 1025it [00:02, 368.51it/s, env_step=119808, len=21, n/ep=3, n/st=64, player_1/loss=174.673, player_2/loss=140.137, rew=481.33]


Epoch #117: test_reward: 378.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #118: 1025it [00:02, 368.31it/s, env_step=120832, len=7, n/ep=8, n/st=64, player_1/loss=182.872, player_2/loss=231.656, rew=66.75]


Epoch #118: test_reward: 130.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #119: 1025it [00:02, 368.27it/s, env_step=121856, len=13, n/ep=5, n/st=64, player_1/loss=133.143, player_2/loss=250.214, rew=197.60]


Epoch #119: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #120: 1025it [00:02, 367.22it/s, env_step=122880, len=15, n/ep=4, n/st=64, player_1/loss=163.117, player_2/loss=101.355, rew=248.50]


Epoch #120: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #121: 1025it [00:02, 366.50it/s, env_step=123904, len=14, n/ep=5, n/st=64, player_1/loss=138.480, player_2/loss=39.175, rew=216.00]


Epoch #121: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #122: 1025it [00:02, 367.77it/s, env_step=124928, len=15, n/ep=4, n/st=64, player_1/loss=94.581, player_2/loss=57.234, rew=265.50]


Epoch #122: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #123: 1025it [00:02, 367.97it/s, env_step=125952, len=15, n/ep=5, n/st=64, player_1/loss=66.945, player_2/loss=203.445, rew=252.80]


Epoch #123: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #124: 1025it [00:02, 367.37it/s, env_step=126976, len=15, n/ep=4, n/st=64, player_1/loss=125.232, player_2/loss=166.209, rew=240.00]


Epoch #124: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #125: 1025it [00:02, 367.47it/s, env_step=128000, len=15, n/ep=4, n/st=64, player_1/loss=192.550, player_2/loss=156.651, rew=240.00]


Epoch #125: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #126: 1025it [00:02, 367.43it/s, env_step=129024, len=17, n/ep=3, n/st=64, player_1/loss=181.955, player_2/loss=246.436, rew=316.00]


Epoch #126: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #127: 1025it [00:02, 368.61it/s, env_step=130048, len=22, n/ep=3, n/st=64, player_1/loss=123.542, player_2/loss=290.823, rew=506.00]


Epoch #127: test_reward: 648.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #128: 1025it [00:02, 368.27it/s, env_step=131072, len=21, n/ep=3, n/st=64, player_1/loss=80.240, player_2/loss=365.238, rew=460.67]


Epoch #128: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #129: 1025it [00:02, 368.01it/s, env_step=132096, len=21, n/ep=3, n/st=64, player_1/loss=48.765, player_2/loss=241.344, rew=460.67]


Epoch #129: test_reward: 378.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #130: 1025it [00:02, 366.74it/s, env_step=133120, len=30, n/ep=2, n/st=64, player_1/loss=109.085, player_2/loss=287.526, rew=965.00]


Epoch #130: test_reward: 754.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #131: 1025it [00:02, 369.02it/s, env_step=134144, len=19, n/ep=4, n/st=64, player_1/loss=134.435, player_2/loss=449.254, rew=406.00]


Epoch #131: test_reward: 700.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #132: 1025it [00:02, 368.32it/s, env_step=135168, len=16, n/ep=4, n/st=64, player_1/loss=95.705, player_2/loss=446.564, rew=318.50]


Epoch #132: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #133: 1025it [00:02, 366.74it/s, env_step=136192, len=16, n/ep=4, n/st=64, player_1/loss=128.634, player_2/loss=336.300, rew=286.00]


Epoch #133: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #134: 1025it [00:02, 367.49it/s, env_step=137216, len=25, n/ep=2, n/st=64, player_1/loss=123.337, player_2/loss=323.728, rew=649.00]


Epoch #134: test_reward: 460.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #135: 1025it [00:02, 368.80it/s, env_step=138240, len=20, n/ep=3, n/st=64, player_1/loss=122.454, player_2/loss=131.030, rew=432.67]


Epoch #135: test_reward: 88.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #136: 1025it [00:02, 366.54it/s, env_step=139264, len=21, n/ep=3, n/st=64, player_1/loss=85.945, player_2/loss=206.683, rew=468.67]


Epoch #136: test_reward: 340.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #137: 1025it [00:02, 370.21it/s, env_step=140288, len=20, n/ep=4, n/st=64, player_1/loss=77.606, player_2/loss=362.705, rew=429.50]


Epoch #137: test_reward: 378.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #138: 1025it [00:02, 367.97it/s, env_step=141312, len=21, n/ep=3, n/st=64, player_1/loss=81.305, player_2/loss=259.740, rew=496.67]


Epoch #138: test_reward: 598.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #139: 1025it [00:02, 368.19it/s, env_step=142336, len=21, n/ep=3, n/st=64, player_1/loss=89.649, player_2/loss=196.930, rew=478.00]


Epoch #139: test_reward: 598.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #140: 1025it [00:02, 367.60it/s, env_step=143360, len=18, n/ep=3, n/st=64, player_1/loss=154.043, player_2/loss=238.858, rew=364.67]


Epoch #140: test_reward: 460.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #141: 1025it [00:02, 370.31it/s, env_step=144384, len=16, n/ep=3, n/st=64, player_1/loss=230.663, player_2/loss=138.684, rew=282.67]


Epoch #141: test_reward: 700.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #142: 1025it [00:02, 378.53it/s, env_step=145408, len=16, n/ep=3, n/st=64, player_1/loss=214.484, player_2/loss=282.425, rew=284.67]


Epoch #142: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #143: 1025it [00:02, 368.38it/s, env_step=146432, len=17, n/ep=4, n/st=64, player_1/loss=152.569, player_2/loss=488.486, rew=324.00]


Epoch #143: test_reward: 418.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #144: 1025it [00:02, 367.74it/s, env_step=147456, len=16, n/ep=4, n/st=64, player_1/loss=141.498, player_2/loss=311.111, rew=293.00]


Epoch #144: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #145: 1025it [00:02, 367.28it/s, env_step=148480, len=14, n/ep=4, n/st=64, player_1/loss=146.162, player_2/loss=90.386, rew=217.00]


Epoch #145: test_reward: 304.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #146: 1025it [00:02, 367.98it/s, env_step=149504, len=14, n/ep=5, n/st=64, player_1/loss=154.115, player_2/loss=247.621, rew=227.60]


Epoch #146: test_reward: 238.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #147: 1025it [00:02, 366.97it/s, env_step=150528, len=14, n/ep=5, n/st=64, player_1/loss=85.494, player_2/loss=191.465, rew=227.60]


Epoch #147: test_reward: 154.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #148: 1025it [00:02, 368.71it/s, env_step=151552, len=13, n/ep=4, n/st=64, player_1/loss=47.652, player_2/loss=199.612, rew=201.00]


Epoch #148: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #149: 1025it [00:02, 368.08it/s, env_step=152576, len=14, n/ep=5, n/st=64, player_1/loss=48.344, player_2/loss=175.486, rew=210.80]


Epoch #149: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #150: 1025it [00:02, 369.05it/s, env_step=153600, len=15, n/ep=4, n/st=64, player_1/loss=60.486, player_2/loss=104.666, rew=264.00]


Epoch #150: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #151: 1025it [00:02, 368.33it/s, env_step=154624, len=19, n/ep=3, n/st=64, player_1/loss=185.473, player_2/loss=122.162, rew=395.33]


Epoch #151: test_reward: 754.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #152: 1025it [00:02, 367.81it/s, env_step=155648, len=14, n/ep=4, n/st=64, player_1/loss=305.468, player_2/loss=66.388, rew=209.00]


Epoch #152: test_reward: 180.000000 ± 0.000000, best_reward: 810.000000 ± 0.000000 in #46


Epoch #153: 1025it [00:02, 366.57it/s, env_step=156672, len=19, n/ep=3, n/st=64, player_1/loss=299.788, player_2/loss=130.527, rew=392.00]


Epoch #153: test_reward: 1188.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #154: 1025it [00:02, 367.89it/s, env_step=157696, len=16, n/ep=4, n/st=64, player_1/loss=215.002, player_2/loss=231.897, rew=291.50]


Epoch #154: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #155: 1025it [00:02, 366.87it/s, env_step=158720, len=14, n/ep=4, n/st=64, player_1/loss=140.217, player_2/loss=237.871, rew=217.00]


Epoch #155: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #156: 1025it [00:02, 367.53it/s, env_step=159744, len=17, n/ep=4, n/st=64, player_1/loss=114.174, rew=306.00]    


Epoch #156: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #157: 1025it [00:02, 367.88it/s, env_step=160768, len=18, n/ep=3, n/st=64, player_1/loss=74.239, player_2/loss=237.846, rew=368.00]


Epoch #157: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #158: 1025it [00:02, 368.09it/s, env_step=161792, len=15, n/ep=4, n/st=64, player_1/loss=22.129, player_2/loss=316.130, rew=269.00]


Epoch #158: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #159: 1025it [00:02, 368.23it/s, env_step=162816, len=19, n/ep=3, n/st=64, player_1/loss=141.726, player_2/loss=197.128, rew=378.67]


Epoch #159: test_reward: 418.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #160: 1025it [00:02, 368.88it/s, env_step=163840, len=27, n/ep=2, n/st=64, player_1/loss=202.799, player_2/loss=309.083, rew=758.00]


Epoch #160: test_reward: 868.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #161: 1025it [00:02, 367.11it/s, env_step=164864, len=15, n/ep=4, n/st=64, player_1/loss=143.830, player_2/loss=359.687, rew=253.00]


Epoch #161: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #162: 1025it [00:02, 367.94it/s, env_step=165888, len=22, n/ep=3, n/st=64, player_1/loss=94.849, player_2/loss=163.579, rew=522.00]


Epoch #162: test_reward: 418.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #163: 1025it [00:02, 368.68it/s, env_step=166912, len=16, n/ep=3, n/st=64, player_1/loss=103.688, player_2/loss=150.952, rew=287.33]


Epoch #163: test_reward: 208.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #164: 1025it [00:02, 367.48it/s, env_step=167936, len=21, n/ep=3, n/st=64, player_1/loss=164.854, player_2/loss=68.541, rew=511.33]


Epoch #164: test_reward: 648.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #165: 1025it [00:02, 369.37it/s, env_step=168960, len=25, n/ep=3, n/st=64, player_1/loss=174.034, player_2/loss=104.871, rew=668.00]


Epoch #165: test_reward: 460.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #166: 1025it [00:02, 367.17it/s, env_step=169984, len=28, n/ep=2, n/st=64, player_1/loss=208.344, player_2/loss=69.116, rew=814.00]


Epoch #166: test_reward: 648.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #167: 1025it [00:02, 369.11it/s, env_step=171008, len=23, n/ep=2, n/st=64, player_1/loss=202.768, player_2/loss=67.539, rew=604.00]


Epoch #167: test_reward: 754.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #168: 1025it [00:02, 368.71it/s, env_step=172032, len=32, n/ep=3, n/st=64, player_1/loss=211.723, player_2/loss=18.024, rew=1092.67]


Epoch #168: test_reward: 1054.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #169: 1025it [00:02, 367.48it/s, env_step=173056, len=33, n/ep=2, n/st=64, player_1/loss=259.455, player_2/loss=12.512, rew=1120.00]


Epoch #169: test_reward: 378.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #170: 1025it [00:02, 367.12it/s, env_step=174080, len=20, n/ep=3, n/st=64, player_2/loss=65.196, rew=496.67]     


Epoch #170: test_reward: 810.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #171: 1025it [00:02, 366.97it/s, env_step=175104, len=13, n/ep=5, n/st=64, player_1/loss=263.808, player_2/loss=89.775, rew=206.40]


Epoch #171: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #172: 1025it [00:02, 367.87it/s, env_step=176128, len=15, n/ep=4, n/st=64, player_1/loss=319.792, player_2/loss=87.381, rew=255.00]


Epoch #172: test_reward: 270.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #173: 1025it [00:02, 368.67it/s, env_step=177152, len=24, n/ep=2, n/st=64, player_1/loss=295.352, player_2/loss=153.808, rew=647.00]


Epoch #173: test_reward: 378.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #174: 1025it [00:02, 365.93it/s, env_step=178176, len=17, n/ep=3, n/st=64, player_1/loss=143.368, player_2/loss=205.945, rew=312.67]


Epoch #174: test_reward: 928.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #175: 1025it [00:02, 366.67it/s, env_step=179200, len=14, n/ep=5, n/st=64, player_1/loss=155.633, player_2/loss=283.919, rew=208.80]


Epoch #175: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #176: 1025it [00:02, 369.55it/s, env_step=180224, len=15, n/ep=4, n/st=64, player_1/loss=242.721, player_2/loss=334.353, rew=240.00]


Epoch #176: test_reward: 504.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #177: 1025it [00:02, 370.45it/s, env_step=181248, len=15, n/ep=4, n/st=64, player_1/loss=248.576, rew=257.00]    


Epoch #177: test_reward: 270.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #178: 1025it [00:02, 367.42it/s, env_step=182272, len=17, n/ep=3, n/st=64, player_1/loss=212.255, player_2/loss=313.889, rew=328.67]


Epoch #178: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #179: 1025it [00:02, 367.51it/s, env_step=183296, len=16, n/ep=4, n/st=64, player_1/loss=205.241, player_2/loss=197.820, rew=280.50]


Epoch #179: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #180: 1025it [00:02, 366.61it/s, env_step=184320, len=14, n/ep=5, n/st=64, player_1/loss=215.657, player_2/loss=119.626, rew=234.40]


Epoch #180: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #181: 1025it [00:02, 366.83it/s, env_step=185344, len=19, n/ep=4, n/st=64, player_1/loss=267.986, player_2/loss=209.083, rew=416.50]


Epoch #181: test_reward: 460.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #182: 1025it [00:02, 370.32it/s, env_step=186368, len=21, n/ep=4, n/st=64, player_1/loss=231.943, player_2/loss=248.229, rew=548.00]


Epoch #182: test_reward: 418.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #183: 1025it [00:02, 361.60it/s, env_step=187392, len=27, n/ep=3, n/st=64, player_1/loss=240.020, player_2/loss=328.564, rew=772.00]


Epoch #183: test_reward: 1120.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #184: 1025it [00:02, 364.09it/s, env_step=188416, len=21, n/ep=4, n/st=64, player_1/loss=299.636, player_2/loss=290.541, rew=493.00]


Epoch #184: test_reward: 504.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #185: 1025it [00:02, 369.38it/s, env_step=189440, len=14, n/ep=5, n/st=64, player_1/loss=252.042, player_2/loss=236.008, rew=220.40]


Epoch #185: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #186: 1025it [00:02, 367.88it/s, env_step=190464, len=16, n/ep=4, n/st=64, player_1/loss=196.844, player_2/loss=168.428, rew=293.50]


Epoch #186: test_reward: 990.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #187: 1025it [00:02, 369.64it/s, env_step=191488, len=20, n/ep=3, n/st=64, player_1/loss=222.555, player_2/loss=265.909, rew=418.67]


Epoch #187: test_reward: 378.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #188: 1025it [00:02, 369.65it/s, env_step=192512, len=16, n/ep=4, n/st=64, player_1/loss=272.603, player_2/loss=375.309, rew=288.00]


Epoch #188: test_reward: 1054.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #189: 1025it [00:02, 367.19it/s, env_step=193536, len=10, n/ep=6, n/st=64, player_1/loss=313.660, player_2/loss=312.148, rew=112.33]


Epoch #189: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #190: 1025it [00:02, 368.11it/s, env_step=194560, len=8, n/ep=8, n/st=64, player_1/loss=330.118, player_2/loss=352.242, rew=87.25]


Epoch #190: test_reward: 208.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #191: 1025it [00:02, 366.94it/s, env_step=195584, len=10, n/ep=7, n/st=64, player_1/loss=157.766, player_2/loss=207.202, rew=113.43]


Epoch #191: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #192: 1025it [00:02, 365.49it/s, env_step=196608, len=18, n/ep=3, n/st=64, player_1/loss=106.970, player_2/loss=162.344, rew=353.33]


Epoch #192: test_reward: 418.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #193: 1025it [00:02, 363.43it/s, env_step=197632, len=10, n/ep=6, n/st=64, player_1/loss=66.917, player_2/loss=304.154, rew=122.33]


Epoch #193: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #194: 1025it [00:02, 365.93it/s, env_step=198656, len=13, n/ep=4, n/st=64, player_1/loss=131.278, player_2/loss=324.200, rew=195.00]


Epoch #194: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #195: 1025it [00:02, 366.99it/s, env_step=199680, len=9, n/ep=7, n/st=64, player_1/loss=112.657, player_2/loss=263.337, rew=102.86]


Epoch #195: test_reward: 70.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #196: 1025it [00:02, 366.86it/s, env_step=200704, len=8, n/ep=8, n/st=64, player_1/loss=57.609, player_2/loss=318.243, rew=71.00]


Epoch #196: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #197: 1025it [00:02, 369.29it/s, env_step=201728, len=9, n/ep=7, n/st=64, player_1/loss=54.698, player_2/loss=338.069, rew=104.86]


Epoch #197: test_reward: 70.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #198: 1025it [00:02, 366.24it/s, env_step=202752, len=9, n/ep=7, n/st=64, player_1/loss=40.681, player_2/loss=314.632, rew=110.86]


Epoch #198: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #199: 1025it [00:02, 367.63it/s, env_step=203776, len=9, n/ep=7, n/st=64, player_1/loss=136.958, player_2/loss=284.010, rew=96.57]


Epoch #199: test_reward: 108.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #200: 1025it [00:02, 368.57it/s, env_step=204800, len=9, n/ep=6, n/st=64, player_1/loss=170.464, player_2/loss=236.070, rew=99.67]


Epoch #200: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #201: 1025it [00:02, 367.86it/s, env_step=205824, len=15, n/ep=4, n/st=64, player_1/loss=95.100, player_2/loss=232.652, rew=263.50]


Epoch #201: test_reward: 208.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #202: 1025it [00:02, 356.26it/s, env_step=206848, len=9, n/ep=7, n/st=64, player_1/loss=101.530, player_2/loss=274.903, rew=91.43]


Epoch #202: test_reward: 460.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #203: 1025it [00:02, 356.60it/s, env_step=207872, len=22, n/ep=3, n/st=64, player_1/loss=164.011, player_2/loss=252.228, rew=538.67]


Epoch #203: test_reward: 418.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #204: 1025it [00:02, 355.25it/s, env_step=208896, len=11, n/ep=6, n/st=64, player_1/loss=172.834, player_2/loss=291.329, rew=153.67]


Epoch #204: test_reward: 70.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #205: 1025it [00:02, 350.16it/s, env_step=209920, len=8, n/ep=7, n/st=64, player_1/loss=149.731, player_2/loss=266.087, rew=83.14]


Epoch #205: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #206: 1025it [00:02, 354.21it/s, env_step=210944, len=11, n/ep=6, n/st=64, player_1/loss=58.637, player_2/loss=186.148, rew=142.33]


Epoch #206: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #207: 1025it [00:02, 354.63it/s, env_step=211968, len=8, n/ep=7, n/st=64, player_1/loss=57.998, player_2/loss=186.463, rew=86.00]


Epoch #207: test_reward: 70.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #208: 1025it [00:02, 342.00it/s, env_step=212992, len=8, n/ep=7, n/st=64, player_2/loss=178.923, rew=81.71]      


Epoch #208: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #209: 1025it [00:02, 354.88it/s, env_step=214016, len=13, n/ep=5, n/st=64, player_1/loss=113.569, player_2/loss=300.422, rew=186.80]


Epoch #209: test_reward: 270.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #210: 1025it [00:02, 355.32it/s, env_step=215040, len=15, n/ep=4, n/st=64, player_1/loss=98.076, player_2/loss=294.493, rew=241.50]


Epoch #210: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #211: 1025it [00:02, 354.78it/s, env_step=216064, len=14, n/ep=4, n/st=64, player_1/loss=104.279, player_2/loss=269.080, rew=230.50]


Epoch #211: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #212: 1025it [00:02, 355.79it/s, env_step=217088, len=11, n/ep=7, n/st=64, player_1/loss=138.923, player_2/loss=161.935, rew=168.00]


Epoch #212: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #213: 1025it [00:02, 355.57it/s, env_step=218112, len=17, n/ep=4, n/st=64, player_1/loss=104.752, player_2/loss=139.501, rew=336.50]


Epoch #213: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #214: 1025it [00:02, 354.89it/s, env_step=219136, len=12, n/ep=5, n/st=64, player_1/loss=66.793, player_2/loss=147.070, rew=178.80]


Epoch #214: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #215: 1025it [00:02, 357.61it/s, env_step=220160, len=9, n/ep=8, n/st=64, player_1/loss=65.168, player_2/loss=97.219, rew=91.25]


Epoch #215: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #216: 1025it [00:02, 355.91it/s, env_step=221184, len=13, n/ep=5, n/st=64, player_1/loss=167.518, player_2/loss=115.601, rew=200.40]


Epoch #216: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #217: 1025it [00:02, 355.04it/s, env_step=222208, len=8, n/ep=6, n/st=64, player_1/loss=163.958, player_2/loss=64.554, rew=89.00]


Epoch #217: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #218: 1025it [00:02, 356.18it/s, env_step=223232, len=8, n/ep=7, n/st=64, player_1/loss=69.133, player_2/loss=91.129, rew=77.43]


Epoch #218: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #219: 1025it [00:02, 354.09it/s, env_step=224256, len=9, n/ep=6, n/st=64, player_1/loss=78.116, player_2/loss=117.411, rew=89.33]


Epoch #219: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #220: 1025it [00:02, 355.94it/s, env_step=225280, len=7, n/ep=9, n/st=64, player_1/loss=80.992, player_2/loss=111.033, rew=66.89]


Epoch #220: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #221: 1025it [00:02, 355.17it/s, env_step=226304, len=9, n/ep=7, n/st=64, player_1/loss=62.564, player_2/loss=97.113, rew=104.29]


Epoch #221: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #222: 1025it [00:02, 355.47it/s, env_step=227328, len=8, n/ep=8, n/st=64, player_1/loss=48.767, player_2/loss=86.768, rew=81.00]


Epoch #222: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #223: 1025it [00:02, 355.96it/s, env_step=228352, len=13, n/ep=5, n/st=64, player_1/loss=68.343, rew=201.20]     


Epoch #223: test_reward: 154.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #224: 1025it [00:02, 357.11it/s, env_step=229376, len=14, n/ep=4, n/st=64, player_1/loss=72.524, player_2/loss=152.017, rew=232.50]


Epoch #224: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #225: 1025it [00:02, 384.20it/s, env_step=230400, len=9, n/ep=6, n/st=64, player_1/loss=88.337, player_2/loss=196.018, rew=109.00]


Epoch #225: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #226: 1025it [00:02, 418.71it/s, env_step=231424, len=16, n/ep=4, n/st=64, player_1/loss=96.849, player_2/loss=134.081, rew=317.00]


Epoch #226: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #227: 1025it [00:02, 403.33it/s, env_step=232448, len=15, n/ep=4, n/st=64, player_1/loss=78.470, player_2/loss=128.942, rew=330.00]


Epoch #227: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #228: 1025it [00:02, 390.26it/s, env_step=233472, len=8, n/ep=7, n/st=64, player_1/loss=128.819, player_2/loss=118.017, rew=82.29]


Epoch #228: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #229: 1025it [00:02, 375.14it/s, env_step=234496, len=8, n/ep=7, n/st=64, player_1/loss=116.021, player_2/loss=96.498, rew=84.29]


Epoch #229: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #230: 1025it [00:02, 368.40it/s, env_step=235520, len=20, n/ep=3, n/st=64, player_1/loss=63.367, player_2/loss=131.599, rew=446.67]


Epoch #230: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #231: 1025it [00:02, 369.64it/s, env_step=236544, len=16, n/ep=3, n/st=64, player_1/loss=84.825, player_2/loss=190.909, rew=278.00]


Epoch #231: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #232: 1025it [00:02, 367.87it/s, env_step=237568, len=9, n/ep=7, n/st=64, player_2/loss=269.781, rew=100.29]     


Epoch #232: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #233: 1025it [00:02, 361.95it/s, env_step=238592, len=14, n/ep=5, n/st=64, player_1/loss=79.277, player_2/loss=164.184, rew=234.80]


Epoch #233: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #234: 1025it [00:02, 355.94it/s, env_step=239616, len=12, n/ep=6, n/st=64, player_1/loss=96.645, player_2/loss=43.803, rew=156.00]


Epoch #234: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #235: 1025it [00:02, 356.84it/s, env_step=240640, len=22, n/ep=3, n/st=64, player_1/loss=99.791, player_2/loss=147.610, rew=508.67]


Epoch #235: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #236: 1025it [00:02, 355.98it/s, env_step=241664, len=17, n/ep=3, n/st=64, player_1/loss=70.734, player_2/loss=262.024, rew=308.67]


Epoch #236: test_reward: 270.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #237: 1025it [00:02, 355.79it/s, env_step=242688, len=21, n/ep=3, n/st=64, player_1/loss=78.945, player_2/loss=241.361, rew=475.33]


Epoch #237: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #238: 1025it [00:02, 354.98it/s, env_step=243712, len=14, n/ep=5, n/st=64, player_1/loss=97.173, player_2/loss=210.008, rew=227.20]


Epoch #238: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #239: 1025it [00:02, 355.66it/s, env_step=244736, len=21, n/ep=3, n/st=64, player_1/loss=167.551, player_2/loss=152.417, rew=478.67]


Epoch #239: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #240: 1025it [00:02, 354.99it/s, env_step=245760, len=27, n/ep=2, n/st=64, player_1/loss=148.826, rew=755.00]    


Epoch #240: test_reward: 270.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #241: 1025it [00:02, 354.25it/s, env_step=246784, len=16, n/ep=4, n/st=64, player_1/loss=97.207, player_2/loss=128.038, rew=330.50]


Epoch #241: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #242: 1025it [00:02, 355.46it/s, env_step=247808, len=8, n/ep=7, n/st=64, player_1/loss=67.984, player_2/loss=76.573, rew=86.57]


Epoch #242: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #243: 1025it [00:02, 355.77it/s, env_step=248832, len=17, n/ep=4, n/st=64, player_1/loss=53.989, player_2/loss=75.937, rew=314.50]


Epoch #243: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #244: 1025it [00:02, 356.49it/s, env_step=249856, len=22, n/ep=3, n/st=64, player_1/loss=70.092, rew=504.67]     


Epoch #244: test_reward: 418.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #245: 1025it [00:02, 356.11it/s, env_step=250880, len=11, n/ep=6, n/st=64, player_1/loss=57.701, player_2/loss=25.331, rew=138.33]


Epoch #245: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #246: 1025it [00:02, 355.55it/s, env_step=251904, len=8, n/ep=7, n/st=64, player_1/loss=63.837, player_2/loss=78.249, rew=87.71]


Epoch #246: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #247: 1025it [00:02, 354.50it/s, env_step=252928, len=18, n/ep=4, n/st=64, player_1/loss=67.222, player_2/loss=95.697, rew=397.00]


Epoch #247: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #248: 1025it [00:02, 355.37it/s, env_step=253952, len=10, n/ep=6, n/st=64, player_1/loss=70.164, player_2/loss=107.881, rew=134.33]


Epoch #248: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #249: 1025it [00:02, 354.30it/s, env_step=254976, len=9, n/ep=7, n/st=64, player_1/loss=56.453, player_2/loss=75.942, rew=104.29]


Epoch #249: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #250: 1025it [00:02, 356.61it/s, env_step=256000, len=8, n/ep=9, n/st=64, player_1/loss=50.583, player_2/loss=119.862, rew=74.67]


Epoch #250: test_reward: 208.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #251: 1025it [00:02, 353.32it/s, env_step=257024, len=12, n/ep=5, n/st=64, player_1/loss=68.638, player_2/loss=119.918, rew=198.00]


Epoch #251: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #252: 1025it [00:02, 355.84it/s, env_step=258048, len=19, n/ep=3, n/st=64, player_1/loss=92.296, player_2/loss=175.637, rew=407.33]


Epoch #252: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #253: 1025it [00:02, 355.15it/s, env_step=259072, len=26, n/ep=3, n/st=64, player_1/loss=181.517, player_2/loss=192.502, rew=708.67]


Epoch #253: test_reward: 598.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #254: 1025it [00:02, 353.94it/s, env_step=260096, len=12, n/ep=5, n/st=64, player_1/loss=160.902, player_2/loss=164.241, rew=158.00]


Epoch #254: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #255: 1025it [00:02, 353.71it/s, env_step=261120, len=9, n/ep=7, n/st=64, player_1/loss=78.664, player_2/loss=118.661, rew=93.71]


Epoch #255: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #256: 1025it [00:02, 354.46it/s, env_step=262144, len=14, n/ep=4, n/st=64, player_2/loss=154.940, rew=240.00]    


Epoch #256: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #257: 1025it [00:02, 355.47it/s, env_step=263168, len=11, n/ep=6, n/st=64, player_1/loss=59.906, player_2/loss=263.424, rew=139.00]


Epoch #257: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #258: 1025it [00:02, 355.80it/s, env_step=264192, len=13, n/ep=4, n/st=64, player_2/loss=201.132, rew=193.50]    


Epoch #258: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #259: 1025it [00:02, 354.50it/s, env_step=265216, len=20, n/ep=3, n/st=64, player_1/loss=39.791, player_2/loss=193.107, rew=446.67]


Epoch #259: test_reward: 550.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #260: 1025it [00:02, 354.29it/s, env_step=266240, len=13, n/ep=5, n/st=64, player_1/loss=34.868, player_2/loss=213.823, rew=197.60]


Epoch #260: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #261: 1025it [00:02, 355.29it/s, env_step=267264, len=10, n/ep=6, n/st=64, player_1/loss=18.334, player_2/loss=136.558, rew=153.67]


Epoch #261: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #262: 1025it [00:02, 355.40it/s, env_step=268288, len=15, n/ep=4, n/st=64, player_1/loss=27.750, player_2/loss=151.952, rew=255.50]


Epoch #262: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #263: 1025it [00:02, 354.08it/s, env_step=269312, len=8, n/ep=7, n/st=64, player_1/loss=25.609, player_2/loss=122.569, rew=85.71]


Epoch #263: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #264: 1025it [00:02, 354.40it/s, env_step=270336, len=8, n/ep=8, n/st=64, player_1/loss=47.834, player_2/loss=98.552, rew=77.25]


Epoch #264: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #265: 1025it [00:02, 356.04it/s, env_step=271360, len=10, n/ep=6, n/st=64, player_1/loss=83.702, player_2/loss=92.529, rew=129.33]


Epoch #265: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #266: 1025it [00:02, 354.82it/s, env_step=272384, len=8, n/ep=7, n/st=64, player_1/loss=55.568, player_2/loss=68.500, rew=84.57]


Epoch #266: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #267: 1025it [00:02, 406.62it/s, env_step=273408, len=12, n/ep=6, n/st=64, player_1/loss=20.881, player_2/loss=91.444, rew=157.67]


Epoch #267: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #268: 1025it [00:02, 412.98it/s, env_step=274432, len=16, n/ep=3, n/st=64, player_1/loss=37.874, player_2/loss=148.749, rew=410.67]


Epoch #268: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #269: 1025it [00:02, 395.70it/s, env_step=275456, len=12, n/ep=6, n/st=64, player_1/loss=68.078, player_2/loss=120.122, rew=175.33]


Epoch #269: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #270: 1025it [00:02, 383.22it/s, env_step=276480, len=11, n/ep=6, n/st=64, player_1/loss=62.802, player_2/loss=60.848, rew=139.33]


Epoch #270: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #271: 1025it [00:02, 370.60it/s, env_step=277504, len=8, n/ep=6, n/st=64, player_1/loss=63.623, player_2/loss=53.949, rew=83.67]


Epoch #271: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #272: 1025it [00:02, 363.33it/s, env_step=278528, len=13, n/ep=6, n/st=64, player_1/loss=66.924, player_2/loss=53.046, rew=231.00]


Epoch #272: test_reward: 550.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #273: 1025it [00:02, 367.08it/s, env_step=279552, len=9, n/ep=5, n/st=64, player_1/loss=70.343, player_2/loss=67.376, rew=100.40]


Epoch #273: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #274: 1025it [00:02, 365.15it/s, env_step=280576, len=8, n/ep=7, n/st=64, player_1/loss=58.234, player_2/loss=40.906, rew=90.00]


Epoch #274: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #275: 1025it [00:02, 365.96it/s, env_step=281600, len=15, n/ep=5, n/st=64, player_1/loss=59.501, player_2/loss=153.536, rew=245.20]


Epoch #275: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #276: 1025it [00:02, 368.09it/s, env_step=282624, len=17, n/ep=4, n/st=64, player_1/loss=75.909, player_2/loss=188.695, rew=318.00]


Epoch #276: test_reward: 550.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #277: 1025it [00:02, 366.98it/s, env_step=283648, len=16, n/ep=4, n/st=64, player_1/loss=89.395, player_2/loss=145.477, rew=272.00]


Epoch #277: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #278: 1025it [00:02, 365.40it/s, env_step=284672, len=10, n/ep=6, n/st=64, player_1/loss=62.430, player_2/loss=105.224, rew=120.00]


Epoch #278: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #279: 1025it [00:02, 366.46it/s, env_step=285696, len=16, n/ep=3, n/st=64, player_1/loss=45.001, player_2/loss=93.293, rew=284.00]


Epoch #279: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #280: 1025it [00:02, 366.06it/s, env_step=286720, len=14, n/ep=4, n/st=64, player_1/loss=134.348, player_2/loss=85.820, rew=208.50]


Epoch #280: test_reward: 340.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #281: 1025it [00:02, 365.36it/s, env_step=287744, len=18, n/ep=4, n/st=64, player_1/loss=122.191, player_2/loss=116.568, rew=349.50]


Epoch #281: test_reward: 418.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #282: 1025it [00:02, 366.94it/s, env_step=288768, len=15, n/ep=4, n/st=64, player_1/loss=134.989, player_2/loss=144.692, rew=282.50]


Epoch #282: test_reward: 270.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #283: 1025it [00:02, 365.91it/s, env_step=289792, len=10, n/ep=6, n/st=64, player_1/loss=163.800, player_2/loss=98.342, rew=133.00]


Epoch #283: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #284: 1025it [00:02, 365.77it/s, env_step=290816, len=12, n/ep=6, n/st=64, player_1/loss=73.541, player_2/loss=84.570, rew=169.33]


Epoch #284: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #285: 1025it [00:02, 365.06it/s, env_step=291840, len=23, n/ep=3, n/st=64, player_1/loss=34.319, player_2/loss=144.247, rew=556.00]


Epoch #285: test_reward: 550.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #286: 1025it [00:02, 365.44it/s, env_step=292864, len=12, n/ep=6, n/st=64, player_1/loss=65.902, player_2/loss=156.297, rew=166.33]


Epoch #286: test_reward: 208.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #287: 1025it [00:02, 359.49it/s, env_step=293888, len=8, n/ep=8, n/st=64, player_1/loss=90.951, player_2/loss=149.176, rew=72.00]


Epoch #287: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #288: 1025it [00:02, 355.67it/s, env_step=294912, len=12, n/ep=5, n/st=64, player_1/loss=73.377, player_2/loss=134.969, rew=191.60]


Epoch #288: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #289: 1025it [00:02, 354.02it/s, env_step=295936, len=10, n/ep=6, n/st=64, player_1/loss=100.299, player_2/loss=102.303, rew=121.33]


Epoch #289: test_reward: 154.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #290: 1025it [00:02, 353.78it/s, env_step=296960, len=14, n/ep=4, n/st=64, player_1/loss=79.762, player_2/loss=126.015, rew=225.00]


Epoch #290: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #291: 1025it [00:02, 354.47it/s, env_step=297984, len=9, n/ep=6, n/st=64, player_1/loss=66.535, player_2/loss=108.314, rew=103.67]


Epoch #291: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #292: 1025it [00:02, 354.13it/s, env_step=299008, len=9, n/ep=7, n/st=64, player_1/loss=63.059, player_2/loss=75.394, rew=111.43]


Epoch #292: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #293: 1025it [00:02, 354.19it/s, env_step=300032, len=11, n/ep=6, n/st=64, player_1/loss=61.017, player_2/loss=87.508, rew=140.67]


Epoch #293: test_reward: 108.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #294: 1025it [00:02, 356.48it/s, env_step=301056, len=10, n/ep=5, n/st=64, player_1/loss=58.394, player_2/loss=133.073, rew=122.00]


Epoch #294: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #295: 1025it [00:02, 353.76it/s, env_step=302080, len=9, n/ep=7, n/st=64, player_1/loss=47.900, player_2/loss=130.322, rew=90.29]


Epoch #295: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #296: 1025it [00:02, 355.49it/s, env_step=303104, len=8, n/ep=8, n/st=64, player_1/loss=82.589, rew=73.50]       


Epoch #296: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #297: 1025it [00:02, 355.41it/s, env_step=304128, len=8, n/ep=8, n/st=64, player_1/loss=70.958, player_2/loss=89.858, rew=84.25]


Epoch #297: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #298: 1025it [00:02, 352.32it/s, env_step=305152, len=10, n/ep=6, n/st=64, player_1/loss=63.365, player_2/loss=75.001, rew=121.33]


Epoch #298: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #299: 1025it [00:02, 353.67it/s, env_step=306176, len=12, n/ep=5, n/st=64, player_1/loss=37.421, player_2/loss=82.725, rew=168.00]


Epoch #299: test_reward: 108.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #300: 1025it [00:02, 355.60it/s, env_step=307200, len=10, n/ep=6, n/st=64, player_1/loss=32.065, player_2/loss=77.358, rew=137.00]


Epoch #300: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #301: 1025it [00:02, 353.91it/s, env_step=308224, len=12, n/ep=5, n/st=64, player_1/loss=31.410, player_2/loss=71.425, rew=179.60]


Epoch #301: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #302: 1025it [00:02, 353.72it/s, env_step=309248, len=17, n/ep=3, n/st=64, player_1/loss=34.694, player_2/loss=55.379, rew=328.67]


Epoch #302: test_reward: 378.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #303: 1025it [00:02, 354.97it/s, env_step=310272, len=9, n/ep=7, n/st=64, player_2/loss=51.649, rew=96.57]       


Epoch #303: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #304: 1025it [00:02, 345.44it/s, env_step=311296, len=8, n/ep=7, n/st=64, player_1/loss=61.135, player_2/loss=49.629, rew=88.86]


Epoch #304: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #305: 1025it [00:02, 351.39it/s, env_step=312320, len=12, n/ep=5, n/st=64, player_1/loss=55.707, player_2/loss=40.475, rew=178.00]


Epoch #305: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #306: 1025it [00:02, 364.52it/s, env_step=313344, len=8, n/ep=7, n/st=64, player_1/loss=38.622, player_2/loss=37.300, rew=80.57]


Epoch #306: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #307: 1025it [00:02, 351.09it/s, env_step=314368, len=10, n/ep=6, n/st=64, player_1/loss=56.606, player_2/loss=49.062, rew=121.67]


Epoch #307: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #308: 1025it [00:02, 351.71it/s, env_step=315392, len=9, n/ep=7, n/st=64, player_1/loss=76.189, player_2/loss=84.603, rew=108.29]


Epoch #308: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #309: 1025it [00:02, 352.08it/s, env_step=316416, len=8, n/ep=7, n/st=64, player_1/loss=70.747, player_2/loss=100.146, rew=80.57]


Epoch #309: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #310: 1025it [00:02, 350.60it/s, env_step=317440, len=14, n/ep=4, n/st=64, player_1/loss=180.045, player_2/loss=122.112, rew=215.50]


Epoch #310: test_reward: 208.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #311: 1025it [00:02, 365.12it/s, env_step=318464, len=14, n/ep=5, n/st=64, player_1/loss=190.290, player_2/loss=196.594, rew=243.20]


Epoch #311: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #312: 1025it [00:02, 365.38it/s, env_step=319488, len=10, n/ep=7, n/st=64, player_1/loss=194.651, player_2/loss=203.245, rew=131.43]


Epoch #312: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #313: 1025it [00:02, 365.05it/s, env_step=320512, len=21, n/ep=3, n/st=64, player_1/loss=223.142, player_2/loss=198.016, rew=512.67]


Epoch #313: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #314: 1025it [00:02, 366.25it/s, env_step=321536, len=15, n/ep=4, n/st=64, player_1/loss=167.256, player_2/loss=121.324, rew=263.00]


Epoch #314: test_reward: 378.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #315: 1025it [00:02, 365.59it/s, env_step=322560, len=14, n/ep=5, n/st=64, player_1/loss=178.801, player_2/loss=121.755, rew=228.00]


Epoch #315: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #316: 1025it [00:02, 365.29it/s, env_step=323584, len=9, n/ep=7, n/st=64, player_1/loss=141.483, rew=105.14]     


Epoch #316: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #317: 1025it [00:02, 367.50it/s, env_step=324608, len=12, n/ep=5, n/st=64, player_1/loss=75.408, player_2/loss=147.379, rew=194.80]


Epoch #317: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #318: 1025it [00:02, 368.31it/s, env_step=325632, len=8, n/ep=8, n/st=64, player_1/loss=57.508, player_2/loss=141.537, rew=77.50]


Epoch #318: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #319: 1025it [00:02, 367.49it/s, env_step=326656, len=9, n/ep=5, n/st=64, player_1/loss=71.078, player_2/loss=145.021, rew=102.00]


Epoch #319: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #320: 1025it [00:02, 365.00it/s, env_step=327680, len=10, n/ep=7, n/st=64, player_1/loss=68.943, player_2/loss=96.472, rew=112.29]


Epoch #320: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #321: 1025it [00:02, 366.81it/s, env_step=328704, len=12, n/ep=6, n/st=64, player_1/loss=61.763, player_2/loss=113.633, rew=181.00]


Epoch #321: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #322: 1025it [00:02, 362.91it/s, env_step=329728, len=9, n/ep=7, n/st=64, player_1/loss=53.974, player_2/loss=93.619, rew=100.00]


Epoch #322: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #323: 1025it [00:02, 364.57it/s, env_step=330752, len=13, n/ep=5, n/st=64, player_1/loss=45.415, player_2/loss=112.530, rew=195.20]


Epoch #323: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #324: 1025it [00:02, 359.69it/s, env_step=331776, len=10, n/ep=7, n/st=64, player_1/loss=101.531, player_2/loss=87.148, rew=132.57]


Epoch #324: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #325: 1025it [00:02, 363.47it/s, env_step=332800, len=13, n/ep=6, n/st=64, player_1/loss=113.546, player_2/loss=89.256, rew=221.00]


Epoch #325: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #326: 1025it [00:02, 363.21it/s, env_step=333824, len=10, n/ep=6, n/st=64, player_1/loss=91.031, player_2/loss=43.314, rew=135.33]


Epoch #326: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #327: 1025it [00:02, 366.67it/s, env_step=334848, len=8, n/ep=7, n/st=64, player_1/loss=70.001, player_2/loss=86.779, rew=78.29]


Epoch #327: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #328: 1025it [00:02, 366.65it/s, env_step=335872, len=18, n/ep=3, n/st=64, player_1/loss=71.664, player_2/loss=115.143, rew=360.67]


Epoch #328: test_reward: 270.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #329: 1025it [00:02, 366.16it/s, env_step=336896, len=16, n/ep=4, n/st=64, player_1/loss=113.468, player_2/loss=166.060, rew=271.00]


Epoch #329: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #330: 1025it [00:02, 365.27it/s, env_step=337920, len=8, n/ep=8, n/st=64, player_1/loss=132.933, player_2/loss=138.824, rew=71.00]


Epoch #330: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #331: 1025it [00:02, 366.63it/s, env_step=338944, len=15, n/ep=3, n/st=64, player_1/loss=108.271, player_2/loss=136.671, rew=258.67]


Epoch #331: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #332: 1025it [00:02, 365.39it/s, env_step=339968, len=10, n/ep=6, n/st=64, player_1/loss=147.114, player_2/loss=186.994, rew=120.00]


Epoch #332: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #333: 1025it [00:02, 365.71it/s, env_step=340992, len=9, n/ep=7, n/st=64, player_1/loss=124.111, player_2/loss=166.580, rew=109.43]


Epoch #333: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #334: 1025it [00:02, 367.14it/s, env_step=342016, len=11, n/ep=6, n/st=64, player_1/loss=70.423, player_2/loss=76.835, rew=161.00]


Epoch #334: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #335: 1025it [00:02, 365.21it/s, env_step=343040, len=13, n/ep=5, n/st=64, player_1/loss=59.886, player_2/loss=89.449, rew=196.80]


Epoch #335: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #336: 1025it [00:02, 365.33it/s, env_step=344064, len=9, n/ep=7, n/st=64, player_1/loss=71.678, player_2/loss=65.357, rew=116.29]


Epoch #336: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #337: 1025it [00:02, 365.16it/s, env_step=345088, len=10, n/ep=6, n/st=64, player_1/loss=114.651, player_2/loss=28.330, rew=125.67]


Epoch #337: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #338: 1025it [00:02, 366.23it/s, env_step=346112, len=11, n/ep=6, n/st=64, player_1/loss=124.209, player_2/loss=29.898, rew=175.33]


Epoch #338: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #339: 1025it [00:02, 368.76it/s, env_step=347136, len=10, n/ep=6, n/st=64, player_1/loss=101.977, player_2/loss=55.031, rew=153.00]


Epoch #339: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #340: 1025it [00:02, 364.38it/s, env_step=348160, len=9, n/ep=5, n/st=64, player_1/loss=67.637, player_2/loss=112.856, rew=89.60]


Epoch #340: test_reward: 154.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #341: 1025it [00:02, 365.75it/s, env_step=349184, len=9, n/ep=6, n/st=64, player_1/loss=71.167, player_2/loss=82.631, rew=97.33]


Epoch #341: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #342: 1025it [00:02, 366.64it/s, env_step=350208, len=11, n/ep=5, n/st=64, player_1/loss=85.204, player_2/loss=12.892, rew=150.00]


Epoch #342: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #343: 1025it [00:02, 365.88it/s, env_step=351232, len=16, n/ep=4, n/st=64, player_1/loss=60.858, player_2/loss=81.396, rew=318.00]


Epoch #343: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #344: 1025it [00:02, 345.10it/s, env_step=352256, len=8, n/ep=8, n/st=64, player_1/loss=41.631, player_2/loss=132.503, rew=84.75]


Epoch #344: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #345: 1025it [00:02, 392.81it/s, env_step=353280, len=10, n/ep=6, n/st=64, player_1/loss=41.612, player_2/loss=131.296, rew=129.33]


Epoch #345: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #346: 1025it [00:02, 377.57it/s, env_step=354304, len=10, n/ep=6, n/st=64, player_1/loss=65.386, player_2/loss=42.555, rew=128.33]


Epoch #346: test_reward: 108.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #347: 1025it [00:02, 395.52it/s, env_step=355328, len=13, n/ep=5, n/st=64, player_1/loss=123.159, player_2/loss=94.761, rew=196.40]


Epoch #347: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #348: 1025it [00:02, 411.32it/s, env_step=356352, len=18, n/ep=4, n/st=64, player_1/loss=250.770, player_2/loss=159.325, rew=375.00]


Epoch #348: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #349: 1025it [00:02, 395.17it/s, env_step=357376, len=12, n/ep=5, n/st=64, player_1/loss=214.835, player_2/loss=78.818, rew=170.80]


Epoch #349: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #350: 1025it [00:02, 429.74it/s, env_step=358400, len=12, n/ep=6, n/st=64, player_1/loss=66.008, player_2/loss=33.533, rew=210.67]


Epoch #350: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #351: 1025it [00:02, 387.01it/s, env_step=359424, len=8, n/ep=8, n/st=64, player_1/loss=54.035, player_2/loss=61.434, rew=72.00]


Epoch #351: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #352: 1025it [00:02, 400.19it/s, env_step=360448, len=8, n/ep=8, n/st=64, player_1/loss=67.745, player_2/loss=72.914, rew=89.75]


Epoch #352: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #353: 1025it [00:02, 410.69it/s, env_step=361472, len=8, n/ep=8, n/st=64, player_1/loss=64.710, player_2/loss=93.840, rew=87.75]


Epoch #353: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #354: 1025it [00:02, 397.95it/s, env_step=362496, len=11, n/ep=5, n/st=64, player_1/loss=45.271, player_2/loss=61.326, rew=150.40]


Epoch #354: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #355: 1025it [00:02, 419.43it/s, env_step=363520, len=9, n/ep=7, n/st=64, player_1/loss=40.808, player_2/loss=45.857, rew=109.71]


Epoch #355: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #356: 1025it [00:02, 372.84it/s, env_step=364544, len=13, n/ep=5, n/st=64, player_1/loss=52.450, player_2/loss=35.284, rew=196.80]


Epoch #356: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #357: 1025it [00:02, 415.50it/s, env_step=365568, len=17, n/ep=4, n/st=64, player_1/loss=79.950, player_2/loss=39.821, rew=325.00]


Epoch #357: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #358: 1025it [00:02, 357.54it/s, env_step=366592, len=15, n/ep=4, n/st=64, player_1/loss=157.146, player_2/loss=117.701, rew=264.50]


Epoch #358: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #359: 1025it [00:02, 410.31it/s, env_step=367616, len=14, n/ep=4, n/st=64, player_1/loss=211.469, player_2/loss=168.128, rew=231.50]


Epoch #359: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #360: 1025it [00:02, 429.35it/s, env_step=368640, len=20, n/ep=4, n/st=64, player_1/loss=142.387, player_2/loss=72.122, rew=477.50]


Epoch #360: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #361: 1025it [00:02, 377.85it/s, env_step=369664, len=14, n/ep=4, n/st=64, player_1/loss=117.257, player_2/loss=145.891, rew=247.00]


Epoch #361: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #362: 1025it [00:02, 431.35it/s, env_step=370688, len=8, n/ep=9, n/st=64, player_1/loss=169.453, player_2/loss=242.763, rew=71.56]


Epoch #362: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #363: 1025it [00:02, 443.18it/s, env_step=371712, len=9, n/ep=6, n/st=64, player_1/loss=92.709, player_2/loss=212.559, rew=104.67]


Epoch #363: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #364: 1025it [00:02, 442.06it/s, env_step=372736, len=9, n/ep=6, n/st=64, player_1/loss=51.143, player_2/loss=63.815, rew=101.00]


Epoch #364: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #365: 1025it [00:02, 450.54it/s, env_step=373760, len=8, n/ep=9, n/st=64, player_1/loss=93.034, player_2/loss=79.582, rew=72.67]


Epoch #365: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #366: 1025it [00:02, 457.23it/s, env_step=374784, len=10, n/ep=6, n/st=64, player_1/loss=88.748, player_2/loss=128.267, rew=146.67]


Epoch #366: test_reward: 154.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #367: 1025it [00:02, 456.97it/s, env_step=375808, len=10, n/ep=6, n/st=64, player_1/loss=57.401, player_2/loss=158.340, rew=122.33]


Epoch #367: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #368: 1025it [00:02, 455.21it/s, env_step=376832, len=11, n/ep=6, n/st=64, player_1/loss=49.962, player_2/loss=133.255, rew=139.67]


Epoch #368: test_reward: 208.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #369: 1025it [00:02, 455.58it/s, env_step=377856, len=13, n/ep=3, n/st=64, player_1/loss=62.077, player_2/loss=61.653, rew=198.67]


Epoch #369: test_reward: 1054.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #370: 1025it [00:02, 365.63it/s, env_step=378880, len=14, n/ep=5, n/st=64, player_1/loss=54.641, player_2/loss=53.471, rew=234.00]


Epoch #370: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #371: 1025it [00:03, 338.86it/s, env_step=379904, len=13, n/ep=5, n/st=64, player_1/loss=15.144, player_2/loss=128.120, rew=192.80]


Epoch #371: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #372: 1025it [00:02, 407.27it/s, env_step=380928, len=8, n/ep=8, n/st=64, player_1/loss=57.475, player_2/loss=186.967, rew=76.25]


Epoch #372: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #373: 1025it [00:02, 422.13it/s, env_step=381952, len=13, n/ep=5, n/st=64, player_1/loss=85.277, player_2/loss=141.697, rew=211.20]


Epoch #373: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #374: 1025it [00:02, 427.71it/s, env_step=382976, len=14, n/ep=5, n/st=64, player_1/loss=86.631, player_2/loss=86.075, rew=255.60]


Epoch #374: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #375: 1025it [00:02, 372.30it/s, env_step=384000, len=16, n/ep=4, n/st=64, player_1/loss=110.530, player_2/loss=88.903, rew=309.00]


Epoch #375: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #376: 1025it [00:02, 398.41it/s, env_step=385024, len=11, n/ep=5, n/st=64, player_1/loss=108.692, player_2/loss=68.430, rew=143.20]


Epoch #376: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #377: 1025it [00:02, 379.45it/s, env_step=386048, len=27, n/ep=2, n/st=64, player_1/loss=91.222, player_2/loss=96.595, rew=763.00]


Epoch #377: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #378: 1025it [00:02, 369.50it/s, env_step=387072, len=8, n/ep=8, n/st=64, player_1/loss=66.538, player_2/loss=120.235, rew=81.00]


Epoch #378: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #379: 1025it [00:02, 375.41it/s, env_step=388096, len=16, n/ep=4, n/st=64, player_1/loss=63.437, player_2/loss=87.797, rew=283.00]


Epoch #379: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #380: 1025it [00:02, 368.14it/s, env_step=389120, len=10, n/ep=6, n/st=64, player_1/loss=70.335, player_2/loss=131.923, rew=125.33]


Epoch #380: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #381: 1025it [00:02, 377.31it/s, env_step=390144, len=14, n/ep=5, n/st=64, player_1/loss=75.174, player_2/loss=118.547, rew=224.00]


Epoch #381: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #382: 1025it [00:02, 415.08it/s, env_step=391168, len=12, n/ep=5, n/st=64, player_1/loss=69.756, player_2/loss=75.813, rew=180.80]


Epoch #382: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #383: 1025it [00:02, 379.64it/s, env_step=392192, len=7, n/ep=8, n/st=64, player_1/loss=59.964, rew=66.75]       


Epoch #383: test_reward: 154.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #384: 1025it [00:02, 432.59it/s, env_step=393216, len=14, n/ep=4, n/st=64, player_1/loss=95.660, player_2/loss=46.497, rew=220.00]


Epoch #384: test_reward: 208.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #385: 1025it [00:02, 428.99it/s, env_step=394240, len=15, n/ep=4, n/st=64, player_1/loss=168.329, player_2/loss=98.592, rew=245.50]


Epoch #385: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #386: 1025it [00:02, 435.82it/s, env_step=395264, len=17, n/ep=4, n/st=64, player_1/loss=177.058, player_2/loss=83.395, rew=344.50]


Epoch #386: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #387: 1025it [00:02, 443.23it/s, env_step=396288, len=16, n/ep=4, n/st=64, player_1/loss=153.802, player_2/loss=48.290, rew=283.00]


Epoch #387: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #388: 1025it [00:02, 442.70it/s, env_step=397312, len=19, n/ep=4, n/st=64, player_1/loss=229.600, player_2/loss=145.134, rew=383.00]


Epoch #388: test_reward: 460.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #389: 1025it [00:02, 443.28it/s, env_step=398336, len=21, n/ep=2, n/st=64, player_1/loss=118.353, player_2/loss=243.628, rew=461.00]


Epoch #389: test_reward: 504.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #390: 1025it [00:02, 425.26it/s, env_step=399360, len=15, n/ep=4, n/st=64, player_1/loss=88.747, player_2/loss=155.887, rew=250.50]


Epoch #390: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #391: 1025it [00:02, 411.95it/s, env_step=400384, len=8, n/ep=7, n/st=64, player_1/loss=65.108, player_2/loss=94.308, rew=80.57]


Epoch #391: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #392: 1025it [00:02, 426.51it/s, env_step=401408, len=14, n/ep=4, n/st=64, player_1/loss=60.438, player_2/loss=88.755, rew=217.00]


Epoch #392: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #393: 1025it [00:02, 413.77it/s, env_step=402432, len=15, n/ep=5, n/st=64, player_1/loss=55.790, player_2/loss=90.073, rew=254.80]


Epoch #393: test_reward: 130.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #394: 1025it [00:02, 367.11it/s, env_step=403456, len=8, n/ep=7, n/st=64, player_1/loss=105.003, player_2/loss=108.967, rew=87.71]


Epoch #394: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #395: 1025it [00:02, 389.27it/s, env_step=404480, len=11, n/ep=6, n/st=64, player_1/loss=120.916, player_2/loss=116.818, rew=155.67]


Epoch #395: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #396: 1025it [00:02, 350.18it/s, env_step=405504, len=14, n/ep=4, n/st=64, player_1/loss=65.379, player_2/loss=53.693, rew=211.50]


Epoch #396: test_reward: 154.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #397: 1025it [00:02, 405.27it/s, env_step=406528, len=11, n/ep=5, n/st=64, player_1/loss=84.530, player_2/loss=89.600, rew=145.60]


Epoch #397: test_reward: 154.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #398: 1025it [00:02, 392.55it/s, env_step=407552, len=8, n/ep=8, n/st=64, player_1/loss=90.266, player_2/loss=124.721, rew=89.75]


Epoch #398: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #399: 1025it [00:02, 387.36it/s, env_step=408576, len=8, n/ep=8, n/st=64, player_1/loss=73.966, player_2/loss=98.883, rew=71.00]


Epoch #399: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #400: 1025it [00:03, 332.79it/s, env_step=409600, len=14, n/ep=4, n/st=64, player_1/loss=47.933, player_2/loss=45.823, rew=221.00]


Epoch #400: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #401: 1025it [00:02, 371.23it/s, env_step=410624, len=7, n/ep=7, n/st=64, player_1/loss=19.172, player_2/loss=44.775, rew=69.71]


Epoch #401: test_reward: 54.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #402: 1025it [00:02, 385.32it/s, env_step=411648, len=11, n/ep=5, n/st=64, player_1/loss=14.865, player_2/loss=60.074, rew=153.20]


Epoch #402: test_reward: 340.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #403: 1025it [00:02, 345.35it/s, env_step=412672, len=9, n/ep=7, n/st=64, player_1/loss=13.955, player_2/loss=104.403, rew=90.29]


Epoch #403: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #404: 1025it [00:02, 382.30it/s, env_step=413696, len=8, n/ep=7, n/st=64, player_1/loss=23.684, player_2/loss=96.190, rew=80.86]


Epoch #404: test_reward: 88.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #405: 1025it [00:02, 376.41it/s, env_step=414720, len=17, n/ep=4, n/st=64, player_1/loss=121.055, player_2/loss=152.129, rew=333.50]


Epoch #405: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #406: 1025it [00:02, 363.08it/s, env_step=415744, len=16, n/ep=4, n/st=64, player_1/loss=217.923, player_2/loss=92.998, rew=300.50]


Epoch #406: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #407: 1025it [00:02, 368.03it/s, env_step=416768, len=16, n/ep=4, n/st=64, player_1/loss=169.543, player_2/loss=85.443, rew=285.00]


Epoch #407: test_reward: 238.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #408: 1025it [00:02, 398.34it/s, env_step=417792, len=25, n/ep=2, n/st=64, player_1/loss=186.095, player_2/loss=76.311, rew=676.00]


Epoch #408: test_reward: 598.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #409: 1025it [00:02, 391.35it/s, env_step=418816, len=20, n/ep=3, n/st=64, player_1/loss=252.140, player_2/loss=43.228, rew=476.67]


Epoch #409: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #410: 1025it [00:02, 371.04it/s, env_step=419840, len=17, n/ep=4, n/st=64, player_1/loss=274.877, player_2/loss=67.086, rew=306.00]


Epoch #410: test_reward: 304.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #411: 1025it [00:02, 375.03it/s, env_step=420864, len=22, n/ep=3, n/st=64, player_1/loss=256.486, player_2/loss=141.216, rew=546.00]


Epoch #411: test_reward: 460.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #412: 1025it [00:02, 387.36it/s, env_step=421888, len=27, n/ep=3, n/st=64, player_1/loss=311.485, rew=830.00]    


Epoch #412: test_reward: 460.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #413: 1025it [00:02, 396.95it/s, env_step=422912, len=25, n/ep=3, n/st=64, player_1/loss=437.610, player_2/loss=317.271, rew=722.67]


Epoch #413: test_reward: 1120.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #414: 1025it [00:02, 399.74it/s, env_step=423936, len=12, n/ep=5, n/st=64, player_1/loss=283.960, player_2/loss=282.504, rew=171.20]


Epoch #414: test_reward: 180.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #415: 1025it [00:02, 399.12it/s, env_step=424960, len=16, n/ep=4, n/st=64, player_1/loss=58.406, player_2/loss=231.317, rew=293.00]


Epoch #415: test_reward: 598.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #416: 1025it [00:02, 398.03it/s, env_step=425984, len=25, n/ep=2, n/st=64, player_1/loss=90.656, player_2/loss=121.471, rew=680.00]


Epoch #416: test_reward: 504.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #417: 1025it [00:02, 385.03it/s, env_step=427008, len=20, n/ep=3, n/st=64, player_1/loss=158.980, player_2/loss=95.862, rew=454.00]


Epoch #417: test_reward: 504.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #418: 1025it [00:02, 380.46it/s, env_step=428032, len=15, n/ep=5, n/st=64, player_1/loss=184.760, player_2/loss=104.794, rew=274.40]


Epoch #418: test_reward: 208.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #419: 1025it [00:02, 381.59it/s, env_step=429056, len=22, n/ep=3, n/st=64, player_1/loss=175.983, player_2/loss=201.690, rew=504.00]


Epoch #419: test_reward: 378.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #420: 1025it [00:02, 379.19it/s, env_step=430080, len=21, n/ep=3, n/st=64, player_1/loss=89.459, player_2/loss=243.274, rew=462.00]


Epoch #420: test_reward: 598.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #421: 1025it [00:02, 373.66it/s, env_step=431104, len=31, n/ep=2, n/st=64, player_1/loss=31.073, player_2/loss=202.196, rew=1034.00]


Epoch #421: test_reward: 928.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #422: 1025it [00:02, 380.88it/s, env_step=432128, len=23, n/ep=2, n/st=64, player_1/loss=79.154, player_2/loss=248.222, rew=554.00]


Epoch #422: test_reward: 460.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #423: 1025it [00:02, 380.18it/s, env_step=433152, len=33, n/ep=2, n/st=64, player_1/loss=264.676, player_2/loss=260.820, rew=1154.00]


Epoch #423: test_reward: 648.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #424: 1025it [00:02, 379.19it/s, env_step=434176, len=12, n/ep=5, n/st=64, player_1/loss=320.965, player_2/loss=281.149, rew=177.20]


Epoch #424: test_reward: 70.000000 ± 0.000000, best_reward: 1188.000000 ± 0.000000 in #153


Epoch #425: 1025it [00:02, 377.79it/s, env_step=435200, len=13, n/ep=5, n/st=64, player_1/loss=369.366, player_2/loss=278.844, rew=202.40]


Epoch #425: test_reward: 1330.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #426: 1025it [00:02, 380.74it/s, env_step=436224, len=19, n/ep=3, n/st=64, player_1/loss=358.916, player_2/loss=140.680, rew=407.33]


Epoch #426: test_reward: 550.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #427: 1025it [00:02, 380.60it/s, env_step=437248, len=18, n/ep=3, n/st=64, player_1/loss=204.056, player_2/loss=174.390, rew=380.67]


Epoch #427: test_reward: 378.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #428: 1025it [00:02, 377.52it/s, env_step=438272, len=20, n/ep=3, n/st=64, player_1/loss=265.446, player_2/loss=120.670, rew=432.67]


Epoch #428: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #429: 1025it [00:02, 382.59it/s, env_step=439296, len=31, n/ep=2, n/st=64, player_1/loss=318.390, rew=1022.00]   


Epoch #429: test_reward: 928.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #430: 1025it [00:02, 381.16it/s, env_step=440320, len=15, n/ep=3, n/st=64, player_1/loss=267.153, player_2/loss=56.155, rew=270.00]


Epoch #430: test_reward: 88.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #431: 1025it [00:02, 380.46it/s, env_step=441344, len=22, n/ep=3, n/st=64, player_1/loss=357.580, player_2/loss=57.358, rew=554.00]


Epoch #431: test_reward: 340.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #432: 1025it [00:02, 376.68it/s, env_step=442368, len=24, n/ep=3, n/st=64, player_1/loss=266.202, player_2/loss=251.294, rew=674.67]


Epoch #432: test_reward: 1054.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #433: 1025it [00:02, 380.60it/s, env_step=443392, len=19, n/ep=4, n/st=64, player_1/loss=110.093, player_2/loss=290.716, rew=406.50]


Epoch #433: test_reward: 180.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #434: 1025it [00:02, 378.49it/s, env_step=444416, len=13, n/ep=5, n/st=64, player_1/loss=173.949, player_2/loss=182.293, rew=194.40]


Epoch #434: test_reward: 180.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #435: 1025it [00:02, 379.75it/s, env_step=445440, len=13, n/ep=5, n/st=64, player_1/loss=249.883, player_2/loss=105.836, rew=187.20]


Epoch #435: test_reward: 208.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #436: 1025it [00:02, 378.94it/s, env_step=446464, len=21, n/ep=3, n/st=64, player_1/loss=308.323, player_2/loss=105.478, rew=478.67]


Epoch #436: test_reward: 810.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #437: 1025it [00:02, 381.59it/s, env_step=447488, len=33, n/ep=2, n/st=64, player_1/loss=344.155, player_2/loss=109.847, rew=1241.00]


Epoch #437: test_reward: 598.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #438: 1025it [00:02, 377.52it/s, env_step=448512, len=13, n/ep=6, n/st=64, player_1/loss=272.679, player_2/loss=187.822, rew=267.00]


Epoch #438: test_reward: 88.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #439: 1025it [00:02, 382.01it/s, env_step=449536, len=28, n/ep=2, n/st=64, player_1/loss=319.956, player_2/loss=125.787, rew=814.00]


Epoch #439: test_reward: 1054.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #440: 1025it [00:02, 379.61it/s, env_step=450560, len=26, n/ep=2, n/st=64, player_1/loss=296.765, player_2/loss=251.005, rew=736.00]


Epoch #440: test_reward: 810.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #441: 1025it [00:02, 380.03it/s, env_step=451584, len=27, n/ep=3, n/st=64, player_1/loss=215.556, player_2/loss=377.693, rew=814.67]


Epoch #441: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #442: 1025it [00:02, 373.80it/s, env_step=452608, len=17, n/ep=3, n/st=64, player_1/loss=207.179, player_2/loss=211.571, rew=328.67]


Epoch #442: test_reward: 54.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #443: 1025it [00:02, 376.96it/s, env_step=453632, len=12, n/ep=5, n/st=64, player_1/loss=285.719, player_2/loss=136.429, rew=190.80]


Epoch #443: test_reward: 154.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #444: 1025it [00:02, 379.61it/s, env_step=454656, len=10, n/ep=8, n/st=64, player_1/loss=234.620, rew=174.25]    


Epoch #444: test_reward: 108.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #445: 1025it [00:02, 378.49it/s, env_step=455680, len=15, n/ep=4, n/st=64, player_1/loss=194.197, player_2/loss=139.785, rew=273.50]


Epoch #445: test_reward: 504.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #446: 1025it [00:02, 379.05it/s, env_step=456704, len=32, n/ep=2, n/st=64, player_1/loss=275.487, player_2/loss=207.378, rew=1058.00]


Epoch #446: test_reward: 550.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #447: 1025it [00:02, 378.49it/s, env_step=457728, len=24, n/ep=2, n/st=64, player_1/loss=335.632, player_2/loss=193.587, rew=599.00]


Epoch #447: test_reward: 460.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #448: 1025it [00:02, 381.45it/s, env_step=458752, len=15, n/ep=4, n/st=64, player_1/loss=314.601, player_2/loss=252.863, rew=240.00]


Epoch #448: test_reward: 304.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #449: 1025it [00:02, 382.87it/s, env_step=459776, len=14, n/ep=5, n/st=64, player_1/loss=248.056, player_2/loss=349.106, rew=232.80]


Epoch #449: test_reward: 304.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #450: 1025it [00:02, 381.02it/s, env_step=460800, len=17, n/ep=3, n/st=64, player_1/loss=101.781, player_2/loss=251.522, rew=337.33]


Epoch #450: test_reward: 238.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #451: 1025it [00:02, 379.75it/s, env_step=461824, len=22, n/ep=3, n/st=64, player_1/loss=109.417, player_2/loss=311.954, rew=598.67]


Epoch #451: test_reward: 180.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #452: 1025it [00:02, 380.60it/s, env_step=462848, len=16, n/ep=4, n/st=64, player_1/loss=227.751, player_2/loss=256.726, rew=295.50]


Epoch #452: test_reward: 238.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #453: 1025it [00:02, 376.96it/s, env_step=463872, len=21, n/ep=3, n/st=64, player_1/loss=280.059, player_2/loss=107.711, rew=477.33]


Epoch #453: test_reward: 460.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #454: 1025it [00:02, 382.02it/s, env_step=464896, len=23, n/ep=3, n/st=64, player_1/loss=240.652, player_2/loss=44.804, rew=611.33]


Epoch #454: test_reward: 700.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #455: 1025it [00:02, 378.91it/s, env_step=465920, len=23, n/ep=2, n/st=64, player_1/loss=205.760, player_2/loss=112.021, rew=550.00]


Epoch #455: test_reward: 460.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #456: 1025it [00:02, 378.21it/s, env_step=466944, len=31, n/ep=2, n/st=64, player_1/loss=205.219, player_2/loss=147.961, rew=1094.00]


Epoch #456: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #457: 1025it [00:02, 383.59it/s, env_step=467968, len=13, n/ep=4, n/st=64, player_1/loss=164.602, player_2/loss=145.581, rew=204.50]


Epoch #457: test_reward: 180.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #458: 1025it [00:02, 379.89it/s, env_step=468992, len=14, n/ep=5, n/st=64, player_1/loss=159.497, player_2/loss=115.275, rew=228.00]


Epoch #458: test_reward: 238.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #459: 1025it [00:02, 380.03it/s, env_step=470016, len=19, n/ep=4, n/st=64, player_1/loss=375.382, player_2/loss=123.702, rew=388.00]


Epoch #459: test_reward: 700.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #460: 1025it [00:02, 375.72it/s, env_step=471040, len=27, n/ep=2, n/st=64, player_1/loss=444.124, player_2/loss=339.274, rew=755.00]


Epoch #460: test_reward: 754.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #461: 1025it [00:02, 381.73it/s, env_step=472064, len=18, n/ep=4, n/st=64, player_1/loss=436.812, player_2/loss=373.006, rew=361.00]


Epoch #461: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #462: 1025it [00:02, 380.88it/s, env_step=473088, len=25, n/ep=2, n/st=64, player_2/loss=169.927, rew=649.00]    


Epoch #462: test_reward: 1258.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #463: 1025it [00:02, 378.35it/s, env_step=474112, len=33, n/ep=2, n/st=64, player_1/loss=383.295, player_2/loss=322.694, rew=1145.00]


Epoch #463: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #464: 1025it [00:02, 380.03it/s, env_step=475136, len=18, n/ep=4, n/st=64, player_1/loss=472.117, player_2/loss=285.962, rew=346.50]


Epoch #464: test_reward: 238.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #465: 1025it [00:02, 380.46it/s, env_step=476160, len=17, n/ep=4, n/st=64, player_1/loss=374.717, player_2/loss=249.444, rew=326.00]


Epoch #465: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #466: 1025it [00:02, 377.52it/s, env_step=477184, len=16, n/ep=4, n/st=64, player_1/loss=242.267, player_2/loss=233.552, rew=293.50]


Epoch #466: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #467: 1025it [00:02, 379.47it/s, env_step=478208, len=25, n/ep=2, n/st=64, player_1/loss=163.954, player_2/loss=234.929, rew=697.00]


Epoch #467: test_reward: 990.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #468: 1025it [00:02, 380.46it/s, env_step=479232, len=13, n/ep=5, n/st=64, player_1/loss=251.247, player_2/loss=229.174, rew=184.40]


Epoch #468: test_reward: 180.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #469: 1025it [00:02, 380.60it/s, env_step=480256, len=21, n/ep=3, n/st=64, player_1/loss=246.395, rew=478.67]    


Epoch #469: test_reward: 378.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #470: 1025it [00:02, 383.30it/s, env_step=481280, len=21, n/ep=3, n/st=64, player_1/loss=410.246, player_2/loss=397.227, rew=550.67]


Epoch #470: test_reward: 928.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #471: 1025it [00:02, 381.59it/s, env_step=482304, len=28, n/ep=2, n/st=64, player_1/loss=387.944, player_2/loss=225.660, rew=859.00]


Epoch #471: test_reward: 810.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #472: 1025it [00:02, 383.73it/s, env_step=483328, len=22, n/ep=3, n/st=64, player_1/loss=134.206, player_2/loss=152.205, rew=530.67]


Epoch #472: test_reward: 648.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #473: 1025it [00:02, 380.18it/s, env_step=484352, len=28, n/ep=3, n/st=64, player_1/loss=225.998, player_2/loss=163.278, rew=868.67]


Epoch #473: test_reward: 460.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #474: 1025it [00:02, 381.45it/s, env_step=485376, len=23, n/ep=3, n/st=64, player_1/loss=175.050, player_2/loss=165.955, rew=550.67]


Epoch #474: test_reward: 1330.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #475: 1025it [00:02, 380.88it/s, env_step=486400, len=21, n/ep=3, n/st=64, player_1/loss=213.552, player_2/loss=199.470, rew=506.00]


Epoch #475: test_reward: 754.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #476: 1025it [00:02, 376.96it/s, env_step=487424, len=28, n/ep=3, n/st=64, player_1/loss=523.200, player_2/loss=250.757, rew=855.33]


Epoch #476: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #477: 1025it [00:02, 380.88it/s, env_step=488448, len=34, n/ep=2, n/st=64, player_1/loss=538.537, player_2/loss=170.295, rew=1189.00]


Epoch #477: test_reward: 1188.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #478: 1025it [00:02, 377.93it/s, env_step=489472, len=18, n/ep=3, n/st=64, player_1/loss=508.049, player_2/loss=176.883, rew=352.67]


Epoch #478: test_reward: 378.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #479: 1025it [00:02, 376.27it/s, env_step=490496, len=19, n/ep=3, n/st=64, player_1/loss=749.516, player_2/loss=425.603, rew=406.00]


Epoch #479: test_reward: 378.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #480: 1025it [00:02, 378.77it/s, env_step=491520, len=21, n/ep=3, n/st=64, player_1/loss=404.002, player_2/loss=433.213, rew=476.00]


Epoch #480: test_reward: 460.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #481: 1025it [00:02, 379.61it/s, env_step=492544, len=29, n/ep=2, n/st=64, player_1/loss=130.141, player_2/loss=377.832, rew=949.00]


Epoch #481: test_reward: 1188.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #482: 1025it [00:02, 379.75it/s, env_step=493568, len=19, n/ep=3, n/st=64, player_1/loss=310.208, player_2/loss=238.829, rew=391.33]


Epoch #482: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #483: 1025it [00:02, 381.45it/s, env_step=494592, len=24, n/ep=3, n/st=64, player_1/loss=403.978, player_2/loss=129.233, rew=602.67]


Epoch #483: test_reward: 648.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #484: 1025it [00:02, 381.16it/s, env_step=495616, len=22, n/ep=3, n/st=64, player_1/loss=634.330, player_2/loss=109.025, rew=647.33]


Epoch #484: test_reward: 378.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #485: 1025it [00:02, 381.73it/s, env_step=496640, len=18, n/ep=3, n/st=64, player_1/loss=603.425, player_2/loss=147.490, rew=344.67]


Epoch #485: test_reward: 754.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #486: 1025it [00:02, 376.68it/s, env_step=497664, len=27, n/ep=2, n/st=64, player_1/loss=334.162, player_2/loss=243.033, rew=779.00]


Epoch #486: test_reward: 598.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #487: 1025it [00:02, 375.30it/s, env_step=498688, len=28, n/ep=3, n/st=64, player_1/loss=361.586, player_2/loss=283.212, rew=812.67]


Epoch #487: test_reward: 54.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #488: 1025it [00:02, 377.51it/s, env_step=499712, len=27, n/ep=3, n/st=64, player_1/loss=243.134, player_2/loss=312.159, rew=764.67]


Epoch #488: test_reward: 460.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #489: 1025it [00:02, 379.75it/s, env_step=500736, len=28, n/ep=2, n/st=64, player_1/loss=267.466, player_2/loss=228.758, rew=891.00]


Epoch #489: test_reward: 270.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #490: 1025it [00:02, 381.59it/s, env_step=501760, len=22, n/ep=3, n/st=64, player_1/loss=341.399, player_2/loss=120.869, rew=562.67]


Epoch #490: test_reward: 700.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #491: 1025it [00:02, 378.35it/s, env_step=502784, len=23, n/ep=3, n/st=64, player_1/loss=430.014, player_2/loss=227.755, rew=600.67]


Epoch #491: test_reward: 1054.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #492: 1025it [00:02, 382.87it/s, env_step=503808, len=29, n/ep=3, n/st=64, player_1/loss=501.884, player_2/loss=317.199, rew=970.67]


Epoch #492: test_reward: 1054.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #493: 1025it [00:02, 381.31it/s, env_step=504832, len=21, n/ep=3, n/st=64, player_1/loss=459.380, player_2/loss=268.997, rew=478.67]


Epoch #493: test_reward: 418.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #494: 1025it [00:02, 380.46it/s, env_step=505856, len=32, n/ep=2, n/st=64, player_1/loss=403.749, player_2/loss=376.332, rew=1058.00]


Epoch #494: test_reward: 810.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #495: 1025it [00:02, 380.88it/s, env_step=506880, len=29, n/ep=2, n/st=64, player_1/loss=314.386, player_2/loss=470.680, rew=869.00]


Epoch #495: test_reward: 928.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #496: 1025it [00:02, 380.60it/s, env_step=507904, len=34, n/ep=2, n/st=64, player_1/loss=512.841, player_2/loss=258.155, rew=1235.00]


Epoch #496: test_reward: 990.000000 ± 0.000000, best_reward: 1330.000000 ± 0.000000 in #425


Epoch #497: 1025it [00:02, 379.33it/s, env_step=508928, len=36, n/ep=2, n/st=64, player_1/loss=548.486, player_2/loss=170.481, rew=1369.00]


Epoch #497: test_reward: 1480.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #498: 1025it [00:02, 380.60it/s, env_step=509952, len=31, n/ep=2, n/st=64, player_1/loss=632.045, player_2/loss=89.388, rew=994.00]


Epoch #498: test_reward: 810.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #499: 1025it [00:02, 375.44it/s, env_step=510976, len=18, n/ep=3, n/st=64, player_2/loss=124.668, rew=369.33]    


Epoch #499: test_reward: 504.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #500: 1025it [00:02, 383.45it/s, env_step=512000, len=21, n/ep=3, n/st=64, player_1/loss=264.124, player_2/loss=244.571, rew=508.00]


Epoch #500: test_reward: 504.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #501: 1025it [00:02, 381.73it/s, env_step=513024, len=33, n/ep=2, n/st=64, player_1/loss=417.707, player_2/loss=318.551, rew=1184.00]


Epoch #501: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #502: 1025it [00:02, 379.61it/s, env_step=514048, len=31, n/ep=2, n/st=64, player_1/loss=470.570, player_2/loss=438.666, rew=991.00]


Epoch #502: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #503: 1025it [00:02, 382.30it/s, env_step=515072, len=32, n/ep=2, n/st=64, player_1/loss=475.300, player_2/loss=356.078, rew=1054.00]


Epoch #503: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #504: 1025it [00:02, 379.89it/s, env_step=516096, len=21, n/ep=3, n/st=64, player_1/loss=290.741, player_2/loss=312.050, rew=476.00]


Epoch #504: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #505: 1025it [00:02, 375.58it/s, env_step=517120, len=21, n/ep=3, n/st=64, player_2/loss=513.478, rew=460.67]    


Epoch #505: test_reward: 504.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #506: 1025it [00:02, 378.63it/s, env_step=518144, len=29, n/ep=2, n/st=64, player_1/loss=323.162, player_2/loss=275.675, rew=868.00]


Epoch #506: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #507: 1025it [00:02, 381.59it/s, env_step=519168, len=25, n/ep=2, n/st=64, player_1/loss=565.340, player_2/loss=279.695, rew=676.00]


Epoch #507: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #508: 1025it [00:02, 380.32it/s, env_step=520192, len=27, n/ep=2, n/st=64, player_1/loss=624.038, player_2/loss=417.995, rew=758.00]


Epoch #508: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #509: 1025it [00:02, 377.93it/s, env_step=521216, len=23, n/ep=3, n/st=64, player_1/loss=431.178, player_2/loss=246.854, rew=625.33]


Epoch #509: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #510: 1025it [00:02, 380.18it/s, env_step=522240, len=27, n/ep=3, n/st=64, player_1/loss=306.912, player_2/loss=236.438, rew=784.00]


Epoch #510: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #511: 1025it [00:02, 381.45it/s, env_step=523264, len=23, n/ep=3, n/st=64, player_1/loss=278.727, player_2/loss=333.872, rew=580.00]


Epoch #511: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #512: 1025it [00:02, 380.03it/s, env_step=524288, len=20, n/ep=3, n/st=64, player_1/loss=241.149, player_2/loss=195.069, rew=475.33]


Epoch #512: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #513: 1025it [00:02, 382.30it/s, env_step=525312, len=25, n/ep=2, n/st=64, player_1/loss=323.317, player_2/loss=96.285, rew=697.00]


Epoch #513: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #514: 1025it [00:02, 382.44it/s, env_step=526336, len=23, n/ep=2, n/st=64, player_1/loss=403.824, player_2/loss=282.366, rew=580.00]


Epoch #514: test_reward: 1480.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #515: 1025it [00:02, 384.88it/s, env_step=527360, len=26, n/ep=2, n/st=64, player_1/loss=379.286, player_2/loss=361.910, rew=757.00]


Epoch #515: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #516: 1025it [00:02, 377.24it/s, env_step=528384, len=22, n/ep=3, n/st=64, player_1/loss=220.934, player_2/loss=385.693, rew=572.67]


Epoch #516: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #517: 1025it [00:02, 382.02it/s, env_step=529408, len=19, n/ep=3, n/st=64, player_1/loss=126.738, player_2/loss=378.059, rew=392.00]


Epoch #517: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #518: 1025it [00:02, 380.03it/s, env_step=530432, len=17, n/ep=4, n/st=64, player_1/loss=326.398, player_2/loss=417.657, rew=332.00]


Epoch #518: test_reward: 504.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #519: 1025it [00:02, 379.47it/s, env_step=531456, len=27, n/ep=2, n/st=64, player_1/loss=507.206, player_2/loss=617.411, rew=782.00]


Epoch #519: test_reward: 550.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #520: 1025it [00:02, 379.05it/s, env_step=532480, len=28, n/ep=2, n/st=64, player_1/loss=425.865, player_2/loss=505.240, rew=839.00]


Epoch #520: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #521: 1025it [00:02, 380.18it/s, env_step=533504, len=25, n/ep=3, n/st=64, player_1/loss=487.131, player_2/loss=352.173, rew=666.67]


Epoch #521: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #522: 1025it [00:02, 380.88it/s, env_step=534528, len=25, n/ep=3, n/st=64, player_1/loss=457.111, player_2/loss=250.352, rew=682.67]


Epoch #522: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #523: 1025it [00:02, 381.45it/s, env_step=535552, len=27, n/ep=2, n/st=64, player_1/loss=310.452, player_2/loss=108.184, rew=755.00]


Epoch #523: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #524: 1025it [00:02, 380.03it/s, env_step=536576, len=25, n/ep=2, n/st=64, player_1/loss=126.026, player_2/loss=110.202, rew=676.00]


Epoch #524: test_reward: 810.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #525: 1025it [00:02, 381.73it/s, env_step=537600, len=28, n/ep=2, n/st=64, player_1/loss=302.511, player_2/loss=97.177, rew=839.00]


Epoch #525: test_reward: 990.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #526: 1025it [00:02, 381.59it/s, env_step=538624, len=32, n/ep=1, n/st=64, player_1/loss=495.105, player_2/loss=268.737, rew=1054.00]


Epoch #526: test_reward: 1480.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #527: 1025it [00:02, 379.89it/s, env_step=539648, len=19, n/ep=4, n/st=64, player_1/loss=251.082, player_2/loss=447.763, rew=428.50]


Epoch #527: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #528: 1025it [00:02, 381.87it/s, env_step=540672, len=19, n/ep=3, n/st=64, player_1/loss=153.515, player_2/loss=297.449, rew=432.67]


Epoch #528: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #529: 1025it [00:02, 378.07it/s, env_step=541696, len=28, n/ep=2, n/st=64, player_1/loss=119.885, player_2/loss=54.396, rew=846.00]


Epoch #529: test_reward: 928.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #530: 1025it [00:02, 362.05it/s, env_step=542720, len=25, n/ep=2, n/st=64, player_1/loss=251.526, player_2/loss=282.355, rew=652.00]


Epoch #530: test_reward: 810.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #531: 1025it [00:02, 379.47it/s, env_step=543744, len=32, n/ep=2, n/st=64, player_1/loss=381.366, player_2/loss=382.112, rew=1058.00]


Epoch #531: test_reward: 1480.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #532: 1025it [00:02, 379.75it/s, env_step=544768, len=34, n/ep=2, n/st=64, player_1/loss=256.080, player_2/loss=422.800, rew=1189.00]


Epoch #532: test_reward: 1258.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #533: 1025it [00:02, 379.05it/s, env_step=545792, len=26, n/ep=2, n/st=64, player_1/loss=245.382, player_2/loss=712.661, rew=700.00]


Epoch #533: test_reward: 504.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #534: 1025it [00:02, 350.90it/s, env_step=546816, len=27, n/ep=2, n/st=64, player_1/loss=296.347, player_2/loss=471.675, rew=803.00]


Epoch #534: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #535: 1025it [00:02, 354.79it/s, env_step=547840, len=24, n/ep=3, n/st=64, player_1/loss=671.010, player_2/loss=443.586, rew=624.00]


Epoch #535: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #536: 1025it [00:02, 369.76it/s, env_step=548864, len=32, n/ep=2, n/st=64, player_1/loss=851.060, player_2/loss=226.629, rew=1055.00]


Epoch #536: test_reward: 1188.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #537: 1025it [00:02, 372.98it/s, env_step=549888, len=29, n/ep=2, n/st=64, player_1/loss=364.784, player_2/loss=429.412, rew=884.00]


Epoch #537: test_reward: 504.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #538: 1025it [00:02, 372.17it/s, env_step=550912, len=38, n/ep=2, n/st=64, player_1/loss=430.493, player_2/loss=530.151, rew=1511.00]


Epoch #538: test_reward: 270.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #539: 1025it [00:02, 368.82it/s, env_step=551936, len=15, n/ep=4, n/st=64, player_1/loss=307.232, player_2/loss=567.701, rew=239.00]


Epoch #539: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #540: 1025it [00:02, 369.62it/s, env_step=552960, len=16, n/ep=4, n/st=64, player_1/loss=260.978, player_2/loss=589.105, rew=292.00]


Epoch #540: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #541: 1025it [00:02, 368.16it/s, env_step=553984, len=33, n/ep=2, n/st=64, player_1/loss=310.211, player_2/loss=510.276, rew=1120.00]


Epoch #541: test_reward: 1188.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #542: 1025it [00:02, 371.77it/s, env_step=555008, len=21, n/ep=3, n/st=64, player_1/loss=142.814, player_2/loss=352.556, rew=460.67]


Epoch #542: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #543: 1025it [00:02, 370.16it/s, env_step=556032, len=20, n/ep=3, n/st=64, player_1/loss=188.178, player_2/loss=495.249, rew=447.33]


Epoch #543: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #544: 1025it [00:02, 370.02it/s, env_step=557056, len=21, n/ep=3, n/st=64, player_1/loss=314.866, player_2/loss=574.472, rew=460.67]


Epoch #544: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #545: 1025it [00:02, 371.63it/s, env_step=558080, len=22, n/ep=2, n/st=64, player_1/loss=646.861, player_2/loss=224.003, rew=583.00]


Epoch #545: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #546: 1025it [00:02, 376.13it/s, env_step=559104, len=18, n/ep=3, n/st=64, player_1/loss=835.016, player_2/loss=170.676, rew=516.00]


Epoch #546: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #547: 1025it [00:02, 371.09it/s, env_step=560128, len=36, n/ep=2, n/st=64, player_1/loss=484.256, player_2/loss=431.094, rew=1387.00]


Epoch #547: test_reward: 1404.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #548: 1025it [00:02, 373.80it/s, env_step=561152, len=32, n/ep=2, n/st=64, player_1/loss=158.894, player_2/loss=454.169, rew=1129.00]


Epoch #548: test_reward: 1188.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #549: 1025it [00:02, 374.07it/s, env_step=562176, len=18, n/ep=4, n/st=64, player_1/loss=172.472, player_2/loss=334.127, rew=374.50]


Epoch #549: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #550: 1025it [00:02, 372.04it/s, env_step=563200, len=36, n/ep=1, n/st=64, player_1/loss=360.134, player_2/loss=328.903, rew=1330.00]


Epoch #550: test_reward: 1258.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #551: 1025it [00:02, 374.07it/s, env_step=564224, len=34, n/ep=2, n/st=64, player_1/loss=574.742, player_2/loss=208.061, rew=1204.00]


Epoch #551: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #552: 1025it [00:02, 369.62it/s, env_step=565248, len=20, n/ep=2, n/st=64, player_1/loss=638.901, player_2/loss=534.086, rew=441.00]


Epoch #552: test_reward: 810.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #553: 1025it [00:02, 372.85it/s, env_step=566272, len=20, n/ep=3, n/st=64, player_1/loss=461.477, player_2/loss=575.090, rew=422.67]


Epoch #553: test_reward: 270.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #554: 1025it [00:02, 375.17it/s, env_step=567296, len=21, n/ep=3, n/st=64, player_1/loss=263.866, player_2/loss=196.619, rew=494.67]


Epoch #554: test_reward: 550.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #555: 1025it [00:02, 375.72it/s, env_step=568320, len=28, n/ep=2, n/st=64, player_1/loss=436.228, player_2/loss=322.089, rew=910.00]


Epoch #555: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #556: 1025it [00:02, 378.21it/s, env_step=569344, len=22, n/ep=3, n/st=64, player_1/loss=459.413, player_2/loss=231.406, rew=537.33]


Epoch #556: test_reward: 990.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #557: 1025it [00:02, 377.51it/s, env_step=570368, len=29, n/ep=3, n/st=64, player_1/loss=310.609, player_2/loss=287.647, rew=921.33]


Epoch #557: test_reward: 1480.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #558: 1025it [00:02, 375.86it/s, env_step=571392, len=25, n/ep=2, n/st=64, player_1/loss=97.544, player_2/loss=379.034, rew=684.00]


Epoch #558: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #559: 1025it [00:02, 377.24it/s, env_step=572416, len=20, n/ep=3, n/st=64, player_1/loss=320.829, player_2/loss=428.536, rew=447.33]


Epoch #559: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #560: 1025it [00:02, 377.65it/s, env_step=573440, len=19, n/ep=3, n/st=64, player_1/loss=246.799, player_2/loss=332.474, rew=422.00]


Epoch #560: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #561: 1025it [00:02, 373.66it/s, env_step=574464, len=26, n/ep=2, n/st=64, player_1/loss=148.996, player_2/loss=389.479, rew=701.00]


Epoch #561: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #562: 1025it [00:02, 374.89it/s, env_step=575488, len=34, n/ep=2, n/st=64, player_1/loss=150.196, player_2/loss=669.692, rew=1188.00]


Epoch #562: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #563: 1025it [00:02, 374.76it/s, env_step=576512, len=32, n/ep=2, n/st=64, player_1/loss=90.457, player_2/loss=514.551, rew=1117.00]


Epoch #563: test_reward: 1188.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #564: 1025it [00:02, 375.58it/s, env_step=577536, len=42, n/ep=1, n/st=64, player_1/loss=205.200, player_2/loss=401.126, rew=1834.00]


Epoch #564: test_reward: 990.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #565: 1025it [00:02, 376.55it/s, env_step=578560, len=28, n/ep=2, n/st=64, player_1/loss=467.217, player_2/loss=470.234, rew=839.00]


Epoch #565: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #566: 1025it [00:02, 377.52it/s, env_step=579584, len=30, n/ep=2, n/st=64, player_1/loss=579.040, player_2/loss=415.725, rew=977.00]


Epoch #566: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #567: 1025it [00:02, 375.72it/s, env_step=580608, len=25, n/ep=3, n/st=64, player_1/loss=357.945, player_2/loss=403.617, rew=666.67]


Epoch #567: test_reward: 810.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #568: 1025it [00:02, 375.72it/s, env_step=581632, len=22, n/ep=3, n/st=64, player_1/loss=577.400, player_2/loss=404.902, rew=642.67]


Epoch #568: test_reward: 1404.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #569: 1025it [00:02, 376.82it/s, env_step=582656, len=36, n/ep=2, n/st=64, player_1/loss=741.057, player_2/loss=212.804, rew=1355.00]


Epoch #569: test_reward: 928.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #570: 1025it [00:02, 379.33it/s, env_step=583680, len=29, n/ep=3, n/st=64, player_1/loss=645.049, player_2/loss=191.592, rew=874.00]


Epoch #570: test_reward: 868.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #571: 1025it [00:02, 376.41it/s, env_step=584704, len=30, n/ep=2, n/st=64, player_1/loss=297.187, player_2/loss=426.454, rew=928.00]


Epoch #571: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #572: 1025it [00:02, 373.94it/s, env_step=585728, len=35, n/ep=2, n/st=64, player_1/loss=327.788, player_2/loss=485.796, rew=1294.00]


Epoch #572: test_reward: 1120.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #573: 1025it [00:02, 373.25it/s, env_step=586752, len=14, n/ep=4, n/st=64, player_1/loss=512.053, player_2/loss=429.601, rew=223.50]


Epoch #573: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #574: 1025it [00:02, 378.21it/s, env_step=587776, len=16, n/ep=4, n/st=64, player_1/loss=343.023, player_2/loss=536.566, rew=282.00]


Epoch #574: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #575: 1025it [00:02, 375.86it/s, env_step=588800, len=20, n/ep=3, n/st=64, player_1/loss=449.030, player_2/loss=611.095, rew=432.67]


Epoch #575: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #576: 1025it [00:02, 377.65it/s, env_step=589824, len=21, n/ep=3, n/st=64, player_1/loss=553.883, player_2/loss=296.395, rew=490.67]


Epoch #576: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #577: 1025it [00:02, 379.19it/s, env_step=590848, len=22, n/ep=2, n/st=64, player_1/loss=248.070, player_2/loss=223.103, rew=520.00]


Epoch #577: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #578: 1025it [00:02, 374.76it/s, env_step=591872, len=15, n/ep=3, n/st=64, player_1/loss=274.716, player_2/loss=201.245, rew=252.67]


Epoch #578: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #579: 1025it [00:02, 376.96it/s, env_step=592896, len=23, n/ep=3, n/st=64, player_1/loss=244.773, player_2/loss=57.742, rew=600.00]


Epoch #579: test_reward: 810.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #580: 1025it [00:02, 380.74it/s, env_step=593920, len=31, n/ep=2, n/st=64, player_1/loss=432.921, player_2/loss=129.317, rew=1064.00]


Epoch #580: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #581: 1025it [00:02, 376.41it/s, env_step=594944, len=31, n/ep=2, n/st=64, player_1/loss=484.304, player_2/loss=220.357, rew=1064.00]


Epoch #581: test_reward: 1480.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #582: 1025it [00:02, 377.38it/s, env_step=595968, len=15, n/ep=5, n/st=64, player_1/loss=372.177, player_2/loss=223.286, rew=262.40]


Epoch #582: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #583: 1025it [00:02, 378.07it/s, env_step=596992, len=19, n/ep=5, n/st=64, player_1/loss=305.694, player_2/loss=166.655, rew=466.40]


Epoch #583: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #584: 1025it [00:02, 377.93it/s, env_step=598016, len=14, n/ep=4, n/st=64, player_1/loss=327.691, player_2/loss=355.397, rew=209.00]


Epoch #584: test_reward: 88.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #585: 1025it [00:02, 380.32it/s, env_step=599040, len=28, n/ep=2, n/st=64, player_1/loss=412.281, player_2/loss=548.196, rew=911.00]


Epoch #585: test_reward: 1480.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #586: 1025it [00:02, 379.19it/s, env_step=600064, len=15, n/ep=4, n/st=64, player_1/loss=275.457, player_2/loss=363.199, rew=261.00]


Epoch #586: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #587: 1025it [00:02, 374.21it/s, env_step=601088, len=23, n/ep=3, n/st=64, player_1/loss=163.886, player_2/loss=525.534, rew=620.67]


Epoch #587: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #588: 1025it [00:02, 370.69it/s, env_step=602112, len=27, n/ep=2, n/st=64, player_1/loss=227.554, player_2/loss=454.995, rew=784.00]


Epoch #588: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #589: 1025it [00:02, 374.48it/s, env_step=603136, len=36, n/ep=2, n/st=64, player_1/loss=249.618, player_2/loss=245.347, rew=1373.00]


Epoch #589: test_reward: 1258.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #590: 1025it [00:02, 375.99it/s, env_step=604160, len=29, n/ep=2, n/st=64, player_1/loss=151.123, player_2/loss=266.585, rew=954.00]


Epoch #590: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #591: 1025it [00:02, 377.52it/s, env_step=605184, len=33, n/ep=2, n/st=64, player_1/loss=197.002, player_2/loss=151.446, rew=1136.00]


Epoch #591: test_reward: 1188.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #592: 1025it [00:02, 377.65it/s, env_step=606208, len=20, n/ep=3, n/st=64, player_1/loss=414.238, player_2/loss=58.805, rew=452.67]


Epoch #592: test_reward: 504.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #593: 1025it [00:02, 375.72it/s, env_step=607232, len=20, n/ep=3, n/st=64, player_1/loss=443.848, player_2/loss=60.925, rew=448.67]


Epoch #593: test_reward: 550.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #594: 1025it [00:02, 366.98it/s, env_step=608256, len=23, n/ep=3, n/st=64, player_1/loss=255.339, player_2/loss=267.786, rew=591.33]


Epoch #594: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #595: 1025it [00:02, 378.63it/s, env_step=609280, len=26, n/ep=2, n/st=64, player_1/loss=310.861, player_2/loss=454.953, rew=729.00]


Epoch #595: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #596: 1025it [00:02, 374.89it/s, env_step=610304, len=18, n/ep=4, n/st=64, player_1/loss=243.383, rew=369.00]    


Epoch #596: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #597: 1025it [00:02, 378.35it/s, env_step=611328, len=16, n/ep=4, n/st=64, player_1/loss=128.203, player_2/loss=606.180, rew=274.50]


Epoch #597: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #598: 1025it [00:02, 376.13it/s, env_step=612352, len=36, n/ep=2, n/st=64, player_1/loss=172.035, player_2/loss=593.603, rew=1379.00]


Epoch #598: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #599: 1025it [00:02, 376.27it/s, env_step=613376, len=23, n/ep=3, n/st=64, player_1/loss=194.180, player_2/loss=826.581, rew=586.00]


Epoch #599: test_reward: 810.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #600: 1025it [00:02, 379.61it/s, env_step=614400, len=17, n/ep=5, n/st=64, player_1/loss=456.729, player_2/loss=661.394, rew=330.00]


Epoch #600: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #601: 1025it [00:02, 376.27it/s, env_step=615424, len=16, n/ep=4, n/st=64, player_1/loss=505.805, player_2/loss=337.945, rew=293.00]


Epoch #601: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #602: 1025it [00:02, 377.10it/s, env_step=616448, len=16, n/ep=3, n/st=64, player_1/loss=297.907, player_2/loss=305.767, rew=294.00]


Epoch #602: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #603: 1025it [00:02, 374.07it/s, env_step=617472, len=34, n/ep=2, n/st=64, player_1/loss=177.017, player_2/loss=576.387, rew=1192.00]


Epoch #603: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #604: 1025it [00:02, 377.79it/s, env_step=618496, len=22, n/ep=2, n/st=64, player_1/loss=133.979, player_2/loss=494.555, rew=659.00]


Epoch #604: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #605: 1025it [00:02, 379.47it/s, env_step=619520, len=16, n/ep=3, n/st=64, player_1/loss=152.899, rew=307.33]    


Epoch #605: test_reward: 238.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #606: 1025it [00:02, 377.10it/s, env_step=620544, len=15, n/ep=4, n/st=64, player_1/loss=253.755, player_2/loss=389.739, rew=262.50]


Epoch #606: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #607: 1025it [00:02, 382.02it/s, env_step=621568, len=16, n/ep=3, n/st=64, player_1/loss=191.346, player_2/loss=446.191, rew=330.67]


Epoch #607: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #608: 1025it [00:02, 376.27it/s, env_step=622592, len=28, n/ep=3, n/st=64, player_1/loss=362.605, player_2/loss=251.158, rew=828.00]


Epoch #608: test_reward: 990.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #609: 1025it [00:02, 378.07it/s, env_step=623616, len=29, n/ep=2, n/st=64, player_1/loss=324.372, player_2/loss=359.964, rew=900.00]


Epoch #609: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #610: 1025it [00:02, 376.27it/s, env_step=624640, len=31, n/ep=3, n/st=64, player_1/loss=481.868, player_2/loss=533.449, rew=990.67]


Epoch #610: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #611: 1025it [00:02, 377.10it/s, env_step=625664, len=33, n/ep=2, n/st=64, player_1/loss=515.766, player_2/loss=527.864, rew=1124.00]


Epoch #611: test_reward: 990.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #612: 1025it [00:02, 379.05it/s, env_step=626688, len=30, n/ep=2, n/st=64, player_1/loss=168.201, player_2/loss=597.481, rew=928.00]


Epoch #612: test_reward: 990.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #613: 1025it [00:02, 377.79it/s, env_step=627712, len=15, n/ep=5, n/st=64, player_1/loss=105.806, player_2/loss=686.553, rew=246.80]


Epoch #613: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #614: 1025it [00:02, 376.68it/s, env_step=628736, len=18, n/ep=4, n/st=64, player_1/loss=217.046, player_2/loss=471.072, rew=356.00]


Epoch #614: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #615: 1025it [00:02, 373.94it/s, env_step=629760, len=17, n/ep=4, n/st=64, player_1/loss=317.471, player_2/loss=242.652, rew=319.50]


Epoch #615: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #616: 1025it [00:02, 375.86it/s, env_step=630784, len=16, n/ep=4, n/st=64, player_1/loss=413.141, player_2/loss=271.969, rew=297.50]


Epoch #616: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #617: 1025it [00:02, 376.41it/s, env_step=631808, len=21, n/ep=3, n/st=64, player_1/loss=332.629, player_2/loss=139.279, rew=462.67]


Epoch #617: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #618: 1025it [00:02, 375.03it/s, env_step=632832, len=18, n/ep=4, n/st=64, player_1/loss=265.053, player_2/loss=132.362, rew=369.50]


Epoch #618: test_reward: 270.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #619: 1025it [00:02, 375.58it/s, env_step=633856, len=23, n/ep=3, n/st=64, player_1/loss=282.451, player_2/loss=179.881, rew=582.00]


Epoch #619: test_reward: 550.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #620: 1025it [00:02, 376.13it/s, env_step=634880, len=27, n/ep=2, n/st=64, player_1/loss=466.356, player_2/loss=144.722, rew=770.00]


Epoch #620: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #621: 1025it [00:02, 375.99it/s, env_step=635904, len=29, n/ep=2, n/st=64, player_1/loss=462.966, player_2/loss=98.733, rew=872.00]


Epoch #621: test_reward: 990.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #622: 1025it [00:02, 365.15it/s, env_step=636928, len=27, n/ep=2, n/st=64, player_1/loss=280.610, player_2/loss=78.144, rew=754.00]


Epoch #622: test_reward: 810.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #623: 1025it [00:02, 376.27it/s, env_step=637952, len=23, n/ep=2, n/st=64, player_1/loss=159.566, player_2/loss=167.832, rew=576.00]


Epoch #623: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #624: 1025it [00:02, 373.80it/s, env_step=638976, len=23, n/ep=2, n/st=64, player_1/loss=232.194, player_2/loss=250.176, rew=574.00]


Epoch #624: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #625: 1025it [00:02, 370.29it/s, env_step=640000, len=28, n/ep=2, n/st=64, player_1/loss=399.659, player_2/loss=271.751, rew=839.00]


Epoch #625: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #626: 1025it [00:02, 379.05it/s, env_step=641024, len=28, n/ep=2, n/st=64, player_1/loss=388.001, player_2/loss=416.828, rew=811.00]


Epoch #626: test_reward: 928.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #627: 1025it [00:02, 381.59it/s, env_step=642048, len=26, n/ep=3, n/st=64, player_1/loss=335.093, player_2/loss=416.519, rew=702.00]


Epoch #627: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #628: 1025it [00:02, 379.47it/s, env_step=643072, len=26, n/ep=2, n/st=64, player_1/loss=275.588, player_2/loss=353.075, rew=727.00]


Epoch #628: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #629: 1025it [00:02, 383.30it/s, env_step=644096, len=26, n/ep=3, n/st=64, player_1/loss=257.479, player_2/loss=353.055, rew=737.33]


Epoch #629: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #630: 1025it [00:02, 381.45it/s, env_step=645120, len=15, n/ep=4, n/st=64, player_1/loss=457.185, player_2/loss=253.786, rew=294.50]


Epoch #630: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #631: 1025it [00:02, 378.77it/s, env_step=646144, len=32, n/ep=2, n/st=64, player_1/loss=443.219, rew=1054.00]   


Epoch #631: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #632: 1025it [00:02, 379.05it/s, env_step=647168, len=34, n/ep=2, n/st=64, player_1/loss=296.975, player_2/loss=177.421, rew=1188.00]


Epoch #632: test_reward: 1330.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #633: 1025it [00:02, 376.69it/s, env_step=648192, len=25, n/ep=2, n/st=64, player_1/loss=544.747, player_2/loss=178.263, rew=648.00]


Epoch #633: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #634: 1025it [00:02, 379.19it/s, env_step=649216, len=30, n/ep=2, n/st=64, player_1/loss=706.229, player_2/loss=291.323, rew=971.00]


Epoch #634: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #635: 1025it [00:02, 378.07it/s, env_step=650240, len=36, n/ep=2, n/st=64, player_1/loss=355.937, player_2/loss=563.480, rew=1334.00]


Epoch #635: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #636: 1025it [00:02, 380.74it/s, env_step=651264, len=34, n/ep=2, n/st=64, player_2/loss=635.060, rew=1223.00]   


Epoch #636: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #637: 1025it [00:02, 381.02it/s, env_step=652288, len=24, n/ep=3, n/st=64, player_1/loss=128.550, player_2/loss=447.208, rew=712.00]


Epoch #637: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #638: 1025it [00:02, 374.07it/s, env_step=653312, len=21, n/ep=3, n/st=64, player_1/loss=213.398, player_2/loss=464.463, rew=509.33]


Epoch #638: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #639: 1025it [00:02, 380.74it/s, env_step=654336, len=27, n/ep=2, n/st=64, player_1/loss=288.663, player_2/loss=460.242, rew=782.00]


Epoch #639: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #640: 1025it [00:02, 377.38it/s, env_step=655360, len=21, n/ep=3, n/st=64, player_1/loss=246.349, player_2/loss=510.269, rew=478.00]


Epoch #640: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #641: 1025it [00:02, 380.18it/s, env_step=656384, len=23, n/ep=3, n/st=64, player_1/loss=450.262, player_2/loss=451.114, rew=583.33]


Epoch #641: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #642: 1025it [00:02, 380.18it/s, env_step=657408, len=35, n/ep=2, n/st=64, player_1/loss=324.470, player_2/loss=442.230, rew=1296.00]


Epoch #642: test_reward: 1188.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #643: 1025it [00:02, 377.24it/s, env_step=658432, len=18, n/ep=3, n/st=64, player_1/loss=55.016, player_2/loss=196.096, rew=358.67]


Epoch #643: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #644: 1025it [00:02, 379.75it/s, env_step=659456, len=24, n/ep=2, n/st=64, player_1/loss=123.914, player_2/loss=262.472, rew=665.00]


Epoch #644: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #645: 1025it [00:02, 379.19it/s, env_step=660480, len=8, n/ep=8, n/st=64, player_1/loss=225.805, player_2/loss=331.212, rew=85.25]


Epoch #645: test_reward: 130.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #646: 1025it [00:02, 378.07it/s, env_step=661504, len=17, n/ep=4, n/st=64, player_1/loss=193.392, player_2/loss=314.031, rew=309.00]


Epoch #646: test_reward: 270.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #647: 1025it [00:02, 377.79it/s, env_step=662528, len=11, n/ep=5, n/st=64, player_1/loss=236.773, player_2/loss=224.168, rew=158.80]


Epoch #647: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #648: 1025it [00:02, 380.60it/s, env_step=663552, len=11, n/ep=6, n/st=64, player_1/loss=187.429, player_2/loss=33.927, rew=139.67]


Epoch #648: test_reward: 130.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #649: 1025it [00:02, 377.93it/s, env_step=664576, len=7, n/ep=8, n/st=64, player_1/loss=208.529, player_2/loss=135.770, rew=69.25]


Epoch #649: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #650: 1025it [00:02, 379.33it/s, env_step=665600, len=15, n/ep=5, n/st=64, player_1/loss=191.266, player_2/loss=182.500, rew=319.60]


Epoch #650: test_reward: 88.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #651: 1025it [00:02, 378.77it/s, env_step=666624, len=12, n/ep=5, n/st=64, player_1/loss=138.305, player_2/loss=73.902, rew=197.60]


Epoch #651: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #652: 1025it [00:02, 377.65it/s, env_step=667648, len=13, n/ep=4, n/st=64, player_1/loss=169.768, player_2/loss=86.615, rew=222.00]


Epoch #652: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #653: 1025it [00:02, 375.58it/s, env_step=668672, len=18, n/ep=3, n/st=64, player_1/loss=173.514, player_2/loss=79.479, rew=356.67]


Epoch #653: test_reward: 270.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #654: 1025it [00:02, 378.63it/s, env_step=669696, len=11, n/ep=5, n/st=64, player_1/loss=130.915, player_2/loss=23.632, rew=172.80]


Epoch #654: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #655: 1025it [00:02, 371.09it/s, env_step=670720, len=12, n/ep=6, n/st=64, player_1/loss=93.703, player_2/loss=200.485, rew=180.67]


Epoch #655: test_reward: 70.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #656: 1025it [00:02, 379.47it/s, env_step=671744, len=13, n/ep=4, n/st=64, player_1/loss=56.296, player_2/loss=413.082, rew=200.00]


Epoch #656: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #657: 1025it [00:02, 377.65it/s, env_step=672768, len=20, n/ep=3, n/st=64, player_1/loss=80.082, player_2/loss=252.560, rew=432.67]


Epoch #657: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #658: 1025it [00:02, 377.65it/s, env_step=673792, len=21, n/ep=2, n/st=64, player_1/loss=209.517, player_2/loss=291.487, rew=460.00]


Epoch #658: test_reward: 88.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #659: 1025it [00:02, 375.03it/s, env_step=674816, len=20, n/ep=3, n/st=64, player_1/loss=216.683, player_2/loss=239.477, rew=433.33]


Epoch #659: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #660: 1025it [00:02, 377.93it/s, env_step=675840, len=24, n/ep=3, n/st=64, player_1/loss=175.135, player_2/loss=298.794, rew=658.67]


Epoch #660: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #661: 1025it [00:02, 379.05it/s, env_step=676864, len=16, n/ep=3, n/st=64, player_1/loss=316.175, player_2/loss=311.212, rew=280.67]


Epoch #661: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #662: 1025it [00:02, 377.93it/s, env_step=677888, len=24, n/ep=3, n/st=64, player_1/loss=239.330, player_2/loss=198.483, rew=616.00]


Epoch #662: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #663: 1025it [00:02, 364.89it/s, env_step=678912, len=23, n/ep=2, n/st=64, player_1/loss=204.743, player_2/loss=120.369, rew=575.00]


Epoch #663: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #664: 1025it [00:02, 375.99it/s, env_step=679936, len=19, n/ep=4, n/st=64, player_1/loss=284.114, player_2/loss=153.175, rew=392.50]


Epoch #664: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #665: 1025it [00:02, 379.19it/s, env_step=680960, len=21, n/ep=3, n/st=64, player_1/loss=169.091, player_2/loss=303.208, rew=464.67]


Epoch #665: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #666: 1025it [00:02, 378.63it/s, env_step=681984, len=20, n/ep=3, n/st=64, player_1/loss=30.400, player_2/loss=302.615, rew=448.67]


Epoch #666: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #667: 1025it [00:02, 379.33it/s, env_step=683008, len=28, n/ep=3, n/st=64, player_1/loss=122.502, player_2/loss=189.547, rew=849.33]


Epoch #667: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #668: 1025it [00:02, 378.63it/s, env_step=684032, len=24, n/ep=2, n/st=64, player_1/loss=301.568, player_2/loss=176.820, rew=599.00]


Epoch #668: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #669: 1025it [00:02, 379.75it/s, env_step=685056, len=18, n/ep=3, n/st=64, player_1/loss=402.595, player_2/loss=105.519, rew=362.00]


Epoch #669: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #670: 1025it [00:02, 379.89it/s, env_step=686080, len=18, n/ep=3, n/st=64, player_1/loss=177.060, player_2/loss=244.296, rew=373.33]


Epoch #670: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #671: 1025it [00:02, 379.47it/s, env_step=687104, len=22, n/ep=2, n/st=64, player_1/loss=162.188, player_2/loss=357.080, rew=539.00]


Epoch #671: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #672: 1025it [00:02, 378.49it/s, env_step=688128, len=31, n/ep=2, n/st=64, player_1/loss=277.514, player_2/loss=189.094, rew=994.00]


Epoch #672: test_reward: 868.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #673: 1025it [00:02, 375.03it/s, env_step=689152, len=37, n/ep=2, n/st=64, player_1/loss=218.365, player_2/loss=109.590, rew=1477.00]


Epoch #673: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #674: 1025it [00:02, 377.52it/s, env_step=690176, len=23, n/ep=2, n/st=64, player_1/loss=174.347, player_2/loss=340.251, rew=576.00]


Epoch #674: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #675: 1025it [00:02, 372.85it/s, env_step=691200, len=16, n/ep=3, n/st=64, player_1/loss=219.133, player_2/loss=325.040, rew=293.33]


Epoch #675: test_reward: 810.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #676: 1025it [00:02, 379.75it/s, env_step=692224, len=15, n/ep=5, n/st=64, player_1/loss=211.999, player_2/loss=120.879, rew=256.00]


Epoch #676: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #677: 1025it [00:02, 373.94it/s, env_step=693248, len=15, n/ep=4, n/st=64, player_1/loss=137.555, player_2/loss=70.273, rew=247.00]


Epoch #677: test_reward: 238.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #678: 1025it [00:02, 382.73it/s, env_step=694272, len=16, n/ep=4, n/st=64, player_1/loss=90.952, player_2/loss=98.337, rew=274.00]


Epoch #678: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #679: 1025it [00:02, 379.75it/s, env_step=695296, len=15, n/ep=4, n/st=64, player_1/loss=72.259, player_2/loss=231.877, rew=251.50]


Epoch #679: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #680: 1025it [00:02, 379.05it/s, env_step=696320, len=21, n/ep=4, n/st=64, player_1/loss=183.392, player_2/loss=199.959, rew=586.00]


Epoch #680: test_reward: 238.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #681: 1025it [00:02, 377.93it/s, env_step=697344, len=21, n/ep=3, n/st=64, player_1/loss=231.274, player_2/loss=128.801, rew=532.00]


Epoch #681: test_reward: 238.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #682: 1025it [00:02, 376.82it/s, env_step=698368, len=16, n/ep=4, n/st=64, player_1/loss=200.968, player_2/loss=124.514, rew=282.00]


Epoch #682: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #683: 1025it [00:02, 378.63it/s, env_step=699392, len=17, n/ep=3, n/st=64, player_1/loss=122.204, player_2/loss=144.152, rew=329.33]


Epoch #683: test_reward: 304.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #684: 1025it [00:02, 378.91it/s, env_step=700416, len=21, n/ep=3, n/st=64, player_1/loss=100.457, player_2/loss=103.219, rew=494.67]


Epoch #684: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #685: 1025it [00:02, 377.65it/s, env_step=701440, len=29, n/ep=3, n/st=64, player_1/loss=121.241, player_2/loss=146.904, rew=965.33]


Epoch #685: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #686: 1025it [00:02, 370.69it/s, env_step=702464, len=22, n/ep=4, n/st=64, player_1/loss=319.734, player_2/loss=102.234, rew=579.50]


Epoch #686: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #687: 1025it [00:02, 377.38it/s, env_step=703488, len=25, n/ep=2, n/st=64, player_1/loss=472.538, player_2/loss=46.829, rew=676.00]


Epoch #687: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #688: 1025it [00:02, 378.77it/s, env_step=704512, len=23, n/ep=3, n/st=64, player_1/loss=373.627, player_2/loss=169.126, rew=566.00]


Epoch #688: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #689: 1025it [00:02, 380.88it/s, env_step=705536, len=19, n/ep=3, n/st=64, player_1/loss=236.765, player_2/loss=283.544, rew=394.67]


Epoch #689: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #690: 1025it [00:02, 378.91it/s, env_step=706560, len=17, n/ep=4, n/st=64, player_1/loss=235.357, player_2/loss=311.771, rew=326.50]


Epoch #690: test_reward: 238.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #691: 1025it [00:02, 379.05it/s, env_step=707584, len=15, n/ep=4, n/st=64, player_1/loss=226.327, player_2/loss=261.722, rew=240.00]


Epoch #691: test_reward: 238.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #692: 1025it [00:02, 378.77it/s, env_step=708608, len=17, n/ep=3, n/st=64, player_1/loss=116.176, player_2/loss=75.329, rew=369.33]


Epoch #692: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #693: 1025it [00:02, 379.61it/s, env_step=709632, len=32, n/ep=2, n/st=64, player_1/loss=222.536, player_2/loss=260.454, rew=1093.00]


Epoch #693: test_reward: 1404.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #694: 1025it [00:02, 379.33it/s, env_step=710656, len=20, n/ep=3, n/st=64, player_1/loss=459.445, player_2/loss=218.939, rew=432.00]


Epoch #694: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #695: 1025it [00:02, 378.35it/s, env_step=711680, len=23, n/ep=3, n/st=64, player_1/loss=357.575, player_2/loss=78.492, rew=603.33]


Epoch #695: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #696: 1025it [00:02, 380.03it/s, env_step=712704, len=33, n/ep=2, n/st=64, player_1/loss=187.244, player_2/loss=231.379, rew=1121.00]


Epoch #696: test_reward: 1120.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #697: 1025it [00:02, 378.63it/s, env_step=713728, len=14, n/ep=4, n/st=64, player_1/loss=340.042, player_2/loss=346.123, rew=209.50]


Epoch #697: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #698: 1025it [00:02, 381.02it/s, env_step=714752, len=35, n/ep=1, n/st=64, player_1/loss=335.277, player_2/loss=282.294, rew=1258.00]


Epoch #698: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #699: 1025it [00:02, 376.96it/s, env_step=715776, len=11, n/ep=6, n/st=64, player_1/loss=125.086, player_2/loss=208.477, rew=143.00]


Epoch #699: test_reward: 130.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #700: 1025it [00:02, 378.63it/s, env_step=716800, len=20, n/ep=3, n/st=64, player_1/loss=97.694, player_2/loss=131.965, rew=428.67]


Epoch #700: test_reward: 238.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #701: 1025it [00:02, 377.38it/s, env_step=717824, len=13, n/ep=5, n/st=64, player_1/loss=176.589, player_2/loss=153.576, rew=206.40]


Epoch #701: test_reward: 238.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #702: 1025it [00:02, 373.66it/s, env_step=718848, len=21, n/ep=3, n/st=64, player_1/loss=211.488, player_2/loss=150.373, rew=542.67]


Epoch #702: test_reward: 928.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #703: 1025it [00:02, 378.21it/s, env_step=719872, len=29, n/ep=2, n/st=64, player_1/loss=96.333, player_2/loss=202.535, rew=932.00]


Epoch #703: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #704: 1025it [00:02, 377.79it/s, env_step=720896, len=16, n/ep=5, n/st=64, player_1/loss=200.189, player_2/loss=135.433, rew=281.20]


Epoch #704: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #705: 1025it [00:02, 380.17it/s, env_step=721920, len=15, n/ep=4, n/st=64, player_1/loss=299.933, player_2/loss=105.218, rew=246.50]


Epoch #705: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #706: 1025it [00:02, 377.79it/s, env_step=722944, len=27, n/ep=2, n/st=64, player_1/loss=310.232, player_2/loss=119.684, rew=898.00]


Epoch #706: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #707: 1025it [00:02, 374.89it/s, env_step=723968, len=16, n/ep=5, n/st=64, player_1/loss=382.304, player_2/loss=225.015, rew=296.00]


Epoch #707: test_reward: 154.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #708: 1025it [00:02, 378.77it/s, env_step=724992, len=16, n/ep=4, n/st=64, player_1/loss=435.455, player_2/loss=136.602, rew=289.00]


Epoch #708: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #709: 1025it [00:02, 379.61it/s, env_step=726016, len=22, n/ep=3, n/st=64, player_1/loss=456.852, player_2/loss=72.502, rew=537.33]


Epoch #709: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #710: 1025it [00:02, 378.91it/s, env_step=727040, len=22, n/ep=3, n/st=64, player_1/loss=428.817, player_2/loss=167.282, rew=520.00]


Epoch #710: test_reward: 550.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #711: 1025it [00:02, 380.74it/s, env_step=728064, len=22, n/ep=3, n/st=64, player_1/loss=126.210, player_2/loss=248.896, rew=536.00]


Epoch #711: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #712: 1025it [00:02, 379.75it/s, env_step=729088, len=22, n/ep=3, n/st=64, player_1/loss=111.142, player_2/loss=146.463, rew=526.00]


Epoch #712: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #713: 1025it [00:02, 380.18it/s, env_step=730112, len=21, n/ep=3, n/st=64, player_1/loss=200.396, player_2/loss=48.261, rew=481.33]


Epoch #713: test_reward: 418.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #714: 1025it [00:02, 378.49it/s, env_step=731136, len=28, n/ep=2, n/st=64, player_1/loss=275.883, player_2/loss=87.372, rew=811.00]


Epoch #714: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #715: 1025it [00:02, 377.65it/s, env_step=732160, len=24, n/ep=3, n/st=64, player_1/loss=278.152, player_2/loss=75.164, rew=707.33]


Epoch #715: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #716: 1025it [00:02, 377.24it/s, env_step=733184, len=20, n/ep=3, n/st=64, player_1/loss=241.052, player_2/loss=98.977, rew=512.67]


Epoch #716: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #717: 1025it [00:02, 379.05it/s, env_step=734208, len=26, n/ep=2, n/st=64, player_1/loss=182.388, player_2/loss=98.339, rew=704.00]


Epoch #717: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #718: 1025it [00:02, 376.55it/s, env_step=735232, len=33, n/ep=2, n/st=64, player_1/loss=333.178, player_2/loss=192.622, rew=1136.00]


Epoch #718: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #719: 1025it [00:02, 379.19it/s, env_step=736256, len=29, n/ep=2, n/st=64, player_1/loss=331.042, player_2/loss=306.913, rew=869.00]


Epoch #719: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #720: 1025it [00:02, 378.77it/s, env_step=737280, len=27, n/ep=2, n/st=64, player_1/loss=266.459, player_2/loss=352.689, rew=788.00]


Epoch #720: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #721: 1025it [00:02, 376.55it/s, env_step=738304, len=40, n/ep=1, n/st=64, player_1/loss=241.972, player_2/loss=93.071, rew=1638.00]


Epoch #721: test_reward: 1054.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #722: 1025it [00:02, 379.19it/s, env_step=739328, len=9, n/ep=7, n/st=64, player_1/loss=235.859, player_2/loss=99.766, rew=96.00]


Epoch #722: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #723: 1025it [00:02, 381.31it/s, env_step=740352, len=22, n/ep=3, n/st=64, player_1/loss=211.710, player_2/loss=104.417, rew=504.67]


Epoch #723: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #724: 1025it [00:02, 378.35it/s, env_step=741376, len=22, n/ep=2, n/st=64, player_1/loss=214.228, player_2/loss=92.787, rew=529.00]


Epoch #724: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #725: 1025it [00:02, 376.82it/s, env_step=742400, len=11, n/ep=6, n/st=64, player_1/loss=308.026, player_2/loss=96.119, rew=237.33]


Epoch #725: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #726: 1025it [00:02, 374.35it/s, env_step=743424, len=15, n/ep=5, n/st=64, player_1/loss=247.446, player_2/loss=134.900, rew=242.80]


Epoch #726: test_reward: 270.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #727: 1025it [00:02, 380.03it/s, env_step=744448, len=16, n/ep=4, n/st=64, player_1/loss=150.160, player_2/loss=254.093, rew=284.50]


Epoch #727: test_reward: 154.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #728: 1025it [00:02, 379.33it/s, env_step=745472, len=13, n/ep=4, n/st=64, player_1/loss=246.373, player_2/loss=255.811, rew=215.50]


Epoch #728: test_reward: 304.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #729: 1025it [00:02, 381.45it/s, env_step=746496, len=28, n/ep=2, n/st=64, player_1/loss=216.556, player_2/loss=202.436, rew=819.00]


Epoch #729: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #730: 1025it [00:02, 376.41it/s, env_step=747520, len=22, n/ep=3, n/st=64, player_1/loss=384.622, player_2/loss=96.873, rew=504.67]


Epoch #730: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #731: 1025it [00:02, 378.63it/s, env_step=748544, len=11, n/ep=5, n/st=64, player_1/loss=365.374, player_2/loss=111.765, rew=166.40]


Epoch #731: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #732: 1025it [00:02, 380.31it/s, env_step=749568, len=18, n/ep=3, n/st=64, player_1/loss=97.167, player_2/loss=153.099, rew=379.33]


Epoch #732: test_reward: 340.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #733: 1025it [00:02, 380.03it/s, env_step=750592, len=11, n/ep=7, n/st=64, player_1/loss=234.727, player_2/loss=285.383, rew=185.71]


Epoch #733: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #734: 1025it [00:02, 377.79it/s, env_step=751616, len=22, n/ep=3, n/st=64, player_1/loss=264.425, player_2/loss=331.455, rew=538.67]


Epoch #734: test_reward: 154.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #735: 1025it [00:02, 376.96it/s, env_step=752640, len=8, n/ep=8, n/st=64, player_1/loss=118.618, player_2/loss=332.922, rew=73.25]


Epoch #735: test_reward: 70.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #736: 1025it [00:02, 380.32it/s, env_step=753664, len=8, n/ep=8, n/st=64, player_1/loss=58.675, player_2/loss=348.407, rew=73.50]


Epoch #736: test_reward: 208.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #737: 1025it [00:02, 377.79it/s, env_step=754688, len=8, n/ep=8, n/st=64, player_1/loss=43.181, player_2/loss=339.629, rew=87.75]


Epoch #737: test_reward: 130.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #738: 1025it [00:02, 379.19it/s, env_step=755712, len=18, n/ep=3, n/st=64, player_1/loss=61.569, player_2/loss=160.332, rew=370.00]


Epoch #738: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #739: 1025it [00:02, 379.61it/s, env_step=756736, len=32, n/ep=2, n/st=64, player_1/loss=111.982, player_2/loss=255.922, rew=1054.00]


Epoch #739: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #740: 1025it [00:02, 375.72it/s, env_step=757760, len=17, n/ep=4, n/st=64, player_1/loss=114.987, player_2/loss=412.442, rew=333.50]


Epoch #740: test_reward: 238.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #741: 1025it [00:02, 379.61it/s, env_step=758784, len=20, n/ep=3, n/st=64, player_1/loss=98.492, player_2/loss=399.514, rew=422.67]


Epoch #741: test_reward: 304.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #742: 1025it [00:02, 380.88it/s, env_step=759808, len=10, n/ep=6, n/st=64, player_1/loss=240.053, player_2/loss=162.993, rew=167.67]


Epoch #742: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #743: 1025it [00:02, 377.10it/s, env_step=760832, len=26, n/ep=2, n/st=64, player_1/loss=320.503, player_2/loss=143.984, rew=729.00]


Epoch #743: test_reward: 154.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #744: 1025it [00:02, 378.35it/s, env_step=761856, len=20, n/ep=4, n/st=64, player_1/loss=461.793, player_2/loss=496.641, rew=501.50]


Epoch #744: test_reward: 54.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #745: 1025it [00:02, 374.76it/s, env_step=762880, len=18, n/ep=3, n/st=64, player_1/loss=428.092, player_2/loss=483.326, rew=366.00]


Epoch #745: test_reward: 598.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #746: 1025it [00:02, 370.96it/s, env_step=763904, len=21, n/ep=3, n/st=64, player_1/loss=326.457, player_2/loss=310.150, rew=462.67]


Epoch #746: test_reward: 378.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #747: 1025it [00:02, 371.90it/s, env_step=764928, len=17, n/ep=4, n/st=64, player_1/loss=381.269, player_2/loss=350.044, rew=389.00]


Epoch #747: test_reward: 990.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #748: 1025it [00:02, 378.91it/s, env_step=765952, len=30, n/ep=3, n/st=64, player_1/loss=567.806, player_2/loss=465.096, rew=975.33]


Epoch #748: test_reward: 270.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #749: 1025it [00:02, 377.52it/s, env_step=766976, len=15, n/ep=4, n/st=64, player_1/loss=524.294, player_2/loss=325.774, rew=241.50]


Epoch #749: test_reward: 1188.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #750: 1025it [00:02, 378.21it/s, env_step=768000, len=14, n/ep=4, n/st=64, player_1/loss=173.260, player_2/loss=288.422, rew=223.50]


Epoch #750: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #751: 1025it [00:02, 379.89it/s, env_step=769024, len=20, n/ep=4, n/st=64, player_1/loss=264.717, player_2/loss=452.684, rew=538.00]


Epoch #751: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #752: 1025it [00:02, 378.91it/s, env_step=770048, len=36, n/ep=2, n/st=64, player_1/loss=462.132, player_2/loss=431.671, rew=1339.00]


Epoch #752: test_reward: 1404.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #753: 1025it [00:02, 380.03it/s, env_step=771072, len=23, n/ep=2, n/st=64, player_1/loss=508.417, player_2/loss=284.966, rew=586.00]


Epoch #753: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #754: 1025it [00:02, 380.46it/s, env_step=772096, len=30, n/ep=2, n/st=64, player_1/loss=286.830, player_2/loss=189.448, rew=961.00]


Epoch #754: test_reward: 504.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #755: 1025it [00:02, 379.61it/s, env_step=773120, len=25, n/ep=3, n/st=64, player_1/loss=278.430, player_2/loss=280.810, rew=702.67]


Epoch #755: test_reward: 700.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #756: 1025it [00:02, 378.35it/s, env_step=774144, len=18, n/ep=3, n/st=64, player_1/loss=463.957, player_2/loss=247.860, rew=414.67]


Epoch #756: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #757: 1025it [00:02, 376.96it/s, env_step=775168, len=16, n/ep=4, n/st=64, player_1/loss=194.544, player_2/loss=341.507, rew=270.50]


Epoch #757: test_reward: 130.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #758: 1025it [00:02, 380.60it/s, env_step=776192, len=19, n/ep=2, n/st=64, player_1/loss=408.344, player_2/loss=517.292, rew=400.00]


Epoch #758: test_reward: 648.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #759: 1025it [00:02, 380.46it/s, env_step=777216, len=15, n/ep=4, n/st=64, player_1/loss=399.540, player_2/loss=417.674, rew=238.00]


Epoch #759: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #760: 1025it [00:02, 377.79it/s, env_step=778240, len=11, n/ep=5, n/st=64, player_1/loss=68.865, player_2/loss=201.951, rew=147.20]


Epoch #760: test_reward: 270.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #761: 1025it [00:02, 380.03it/s, env_step=779264, len=24, n/ep=3, n/st=64, player_1/loss=239.273, player_2/loss=55.621, rew=628.67]


Epoch #761: test_reward: 868.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #762: 1025it [00:02, 379.33it/s, env_step=780288, len=26, n/ep=2, n/st=64, player_1/loss=304.145, player_2/loss=350.563, rew=757.00]


Epoch #762: test_reward: 504.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #763: 1025it [00:02, 376.68it/s, env_step=781312, len=24, n/ep=2, n/st=64, player_1/loss=207.580, player_2/loss=472.754, rew=607.00]


Epoch #763: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #764: 1025it [00:02, 377.24it/s, env_step=782336, len=20, n/ep=3, n/st=64, player_1/loss=172.469, player_2/loss=167.342, rew=420.67]


Epoch #764: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #765: 1025it [00:02, 376.55it/s, env_step=783360, len=19, n/ep=3, n/st=64, player_1/loss=257.743, player_2/loss=236.051, rew=392.67]


Epoch #765: test_reward: 460.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #766: 1025it [00:02, 377.52it/s, env_step=784384, len=15, n/ep=4, n/st=64, player_1/loss=278.400, player_2/loss=302.604, rew=259.00]


Epoch #766: test_reward: 180.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #767: 1025it [00:02, 379.61it/s, env_step=785408, len=31, n/ep=2, n/st=64, player_1/loss=152.427, player_2/loss=344.047, rew=1028.00]


Epoch #767: test_reward: 754.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #768: 1025it [00:02, 377.93it/s, env_step=786432, len=23, n/ep=3, n/st=64, player_1/loss=642.479, player_2/loss=171.052, rew=626.00]


Epoch #768: test_reward: 88.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #769: 1025it [00:02, 377.93it/s, env_step=787456, len=19, n/ep=5, n/st=64, player_1/loss=699.744, player_2/loss=239.911, rew=498.40]


Epoch #769: test_reward: 130.000000 ± 0.000000, best_reward: 1480.000000 ± 0.000000 in #497


Epoch #770: 1025it [00:02, 378.77it/s, env_step=788480, len=30, n/ep=2, n/st=64, player_1/loss=333.993, player_2/loss=446.474, rew=961.00]


Epoch #770: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #771: 1025it [00:02, 375.58it/s, env_step=789504, len=18, n/ep=4, n/st=64, player_1/loss=420.310, player_2/loss=423.637, rew=355.50]


Epoch #771: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #772: 1025it [00:02, 378.77it/s, env_step=790528, len=21, n/ep=3, n/st=64, player_1/loss=239.244, player_2/loss=144.291, rew=497.33]


Epoch #772: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #773: 1025it [00:02, 378.77it/s, env_step=791552, len=23, n/ep=4, n/st=64, player_1/loss=118.086, player_2/loss=193.047, rew=659.50]


Epoch #773: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #774: 1025it [00:02, 380.88it/s, env_step=792576, len=36, n/ep=2, n/st=64, player_1/loss=284.480, player_2/loss=251.028, rew=1373.00]


Epoch #774: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #775: 1025it [00:02, 378.35it/s, env_step=793600, len=21, n/ep=3, n/st=64, player_1/loss=464.193, player_2/loss=309.181, rew=506.00]


Epoch #775: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #776: 1025it [00:02, 377.10it/s, env_step=794624, len=26, n/ep=3, n/st=64, player_1/loss=669.407, player_2/loss=458.708, rew=794.00]


Epoch #776: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #777: 1025it [00:02, 378.07it/s, env_step=795648, len=21, n/ep=3, n/st=64, player_1/loss=565.825, player_2/loss=359.113, rew=462.67]


Epoch #777: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #778: 1025it [00:02, 379.75it/s, env_step=796672, len=39, n/ep=2, n/st=64, player_1/loss=275.555, player_2/loss=438.359, rew=1582.00]


Epoch #778: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #779: 1025it [00:02, 377.93it/s, env_step=797696, len=23, n/ep=3, n/st=64, player_1/loss=118.503, player_2/loss=438.654, rew=566.67]


Epoch #779: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #780: 1025it [00:02, 378.91it/s, env_step=798720, len=28, n/ep=3, n/st=64, player_1/loss=195.181, player_2/loss=291.455, rew=862.67]


Epoch #780: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #781: 1025it [00:02, 379.47it/s, env_step=799744, len=20, n/ep=2, n/st=64, player_1/loss=299.147, player_2/loss=299.506, rew=418.00]


Epoch #781: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #782: 1025it [00:02, 379.05it/s, env_step=800768, len=23, n/ep=2, n/st=64, player_1/loss=216.223, player_2/loss=212.981, rew=574.00]


Epoch #782: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #783: 1025it [00:02, 380.03it/s, env_step=801792, len=22, n/ep=3, n/st=64, player_1/loss=417.917, player_2/loss=310.990, rew=508.67]


Epoch #783: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #784: 1025it [00:02, 379.05it/s, env_step=802816, len=18, n/ep=3, n/st=64, player_1/loss=279.647, player_2/loss=551.185, rew=381.33]


Epoch #784: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #785: 1025it [00:02, 380.46it/s, env_step=803840, len=22, n/ep=3, n/st=64, player_1/loss=26.573, player_2/loss=603.676, rew=506.67]


Epoch #785: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #786: 1025it [00:02, 375.72it/s, env_step=804864, len=28, n/ep=2, n/st=64, player_1/loss=133.753, player_2/loss=301.117, rew=839.00]


Epoch #786: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #787: 1025it [00:02, 377.65it/s, env_step=805888, len=25, n/ep=3, n/st=64, player_1/loss=222.293, player_2/loss=160.488, rew=733.33]


Epoch #787: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #788: 1025it [00:02, 381.02it/s, env_step=806912, len=26, n/ep=2, n/st=64, player_1/loss=113.842, player_2/loss=332.860, rew=709.00]


Epoch #788: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #789: 1025it [00:02, 379.47it/s, env_step=807936, len=30, n/ep=2, n/st=64, player_1/loss=336.030, player_2/loss=473.124, rew=953.00]


Epoch #789: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #790: 1025it [00:02, 376.55it/s, env_step=808960, len=31, n/ep=2, n/st=64, player_1/loss=546.044, player_2/loss=282.745, rew=1094.00]


Epoch #790: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #791: 1025it [00:02, 379.89it/s, env_step=809984, len=21, n/ep=2, n/st=64, player_1/loss=385.049, player_2/loss=159.683, rew=524.00]


Epoch #791: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #792: 1025it [00:02, 380.32it/s, env_step=811008, len=26, n/ep=3, n/st=64, player_1/loss=357.691, player_2/loss=134.481, rew=826.00]


Epoch #792: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #793: 1025it [00:02, 377.79it/s, env_step=812032, len=25, n/ep=3, n/st=64, player_1/loss=438.766, player_2/loss=202.854, rew=686.67]


Epoch #793: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #794: 1025it [00:02, 380.32it/s, env_step=813056, len=16, n/ep=4, n/st=64, player_1/loss=698.557, player_2/loss=246.698, rew=279.00]


Epoch #794: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #795: 1025it [00:02, 376.41it/s, env_step=814080, len=27, n/ep=3, n/st=64, player_1/loss=763.976, player_2/loss=399.025, rew=758.67]


Epoch #795: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #796: 1025it [00:02, 380.32it/s, env_step=815104, len=20, n/ep=4, n/st=64, player_1/loss=611.093, player_2/loss=403.809, rew=448.00]


Epoch #796: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #797: 1025it [00:02, 379.33it/s, env_step=816128, len=21, n/ep=3, n/st=64, player_1/loss=253.494, player_2/loss=469.450, rew=490.00]


Epoch #797: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #798: 1025it [00:02, 377.93it/s, env_step=817152, len=20, n/ep=3, n/st=64, player_1/loss=466.384, player_2/loss=396.979, rew=435.33]


Epoch #798: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #799: 1025it [00:02, 381.02it/s, env_step=818176, len=22, n/ep=3, n/st=64, player_1/loss=390.882, player_2/loss=308.638, rew=506.67]


Epoch #799: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #800: 1025it [00:02, 381.31it/s, env_step=819200, len=22, n/ep=3, n/st=64, player_1/loss=242.870, player_2/loss=325.508, rew=538.67]


Epoch #800: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #801: 1025it [00:02, 379.47it/s, env_step=820224, len=24, n/ep=2, n/st=64, player_1/loss=257.430, player_2/loss=438.498, rew=623.00]


Epoch #801: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #802: 1025it [00:02, 380.74it/s, env_step=821248, len=18, n/ep=4, n/st=64, player_1/loss=310.523, player_2/loss=466.849, rew=390.50]


Epoch #802: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #803: 1025it [00:02, 379.61it/s, env_step=822272, len=33, n/ep=2, n/st=64, player_1/loss=240.100, player_2/loss=431.021, rew=1145.00]


Epoch #803: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #804: 1025it [00:02, 377.52it/s, env_step=823296, len=22, n/ep=3, n/st=64, player_1/loss=53.038, player_2/loss=242.233, rew=536.00]


Epoch #804: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #805: 1025it [00:02, 379.05it/s, env_step=824320, len=34, n/ep=2, n/st=64, player_1/loss=238.455, player_2/loss=114.640, rew=1224.00]


Epoch #805: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #806: 1025it [00:02, 381.16it/s, env_step=825344, len=17, n/ep=3, n/st=64, player_1/loss=406.520, player_2/loss=197.871, rew=306.67]


Epoch #806: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #807: 1025it [00:02, 376.96it/s, env_step=826368, len=14, n/ep=3, n/st=64, player_1/loss=323.448, player_2/loss=217.441, rew=218.67]


Epoch #807: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #808: 1025it [00:02, 378.91it/s, env_step=827392, len=18, n/ep=4, n/st=64, player_1/loss=641.806, rew=373.50]    


Epoch #808: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #809: 1025it [00:02, 371.63it/s, env_step=828416, len=23, n/ep=3, n/st=64, player_1/loss=852.720, player_2/loss=215.513, rew=582.67]


Epoch #809: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #810: 1025it [00:02, 372.44it/s, env_step=829440, len=28, n/ep=3, n/st=64, player_1/loss=775.585, player_2/loss=173.848, rew=842.67]


Epoch #810: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #811: 1025it [00:02, 367.11it/s, env_step=830464, len=21, n/ep=3, n/st=64, player_1/loss=538.017, player_2/loss=118.231, rew=478.00]


Epoch #811: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #812: 1025it [00:02, 376.82it/s, env_step=831488, len=23, n/ep=3, n/st=64, player_1/loss=488.435, player_2/loss=164.194, rew=567.33]


Epoch #812: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #813: 1025it [00:02, 374.76it/s, env_step=832512, len=22, n/ep=3, n/st=64, player_1/loss=111.149, player_2/loss=160.193, rew=525.33]


Epoch #813: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #814: 1025it [00:02, 379.33it/s, env_step=833536, len=23, n/ep=3, n/st=64, player_1/loss=225.038, player_2/loss=277.351, rew=596.00]


Epoch #814: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #815: 1025it [00:02, 379.89it/s, env_step=834560, len=20, n/ep=3, n/st=64, player_1/loss=415.456, player_2/loss=269.872, rew=437.33]


Epoch #815: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #816: 1025it [00:02, 380.03it/s, env_step=835584, len=20, n/ep=3, n/st=64, player_1/loss=433.342, player_2/loss=134.662, rew=448.67]


Epoch #816: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #817: 1025it [00:02, 375.44it/s, env_step=836608, len=28, n/ep=2, n/st=64, player_1/loss=460.927, player_2/loss=207.970, rew=819.00]


Epoch #817: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #818: 1025it [00:02, 377.38it/s, env_step=837632, len=18, n/ep=4, n/st=64, player_1/loss=432.362, player_2/loss=228.099, rew=381.50]


Epoch #818: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #819: 1025it [00:02, 380.32it/s, env_step=838656, len=27, n/ep=2, n/st=64, player_1/loss=290.632, player_2/loss=240.620, rew=803.00]


Epoch #819: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #820: 1025it [00:02, 377.79it/s, env_step=839680, len=23, n/ep=3, n/st=64, player_1/loss=129.578, player_2/loss=158.644, rew=585.33]


Epoch #820: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #821: 1025it [00:02, 380.03it/s, env_step=840704, len=23, n/ep=3, n/st=64, player_1/loss=419.952, player_2/loss=140.459, rew=576.67]


Epoch #821: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #822: 1025it [00:02, 374.89it/s, env_step=841728, len=37, n/ep=2, n/st=64, player_1/loss=310.538, player_2/loss=162.554, rew=1405.00]


Epoch #822: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #823: 1025it [00:02, 371.63it/s, env_step=842752, len=23, n/ep=3, n/st=64, player_1/loss=159.708, player_2/loss=268.448, rew=570.00]


Epoch #823: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #824: 1025it [00:02, 380.03it/s, env_step=843776, len=22, n/ep=2, n/st=64, player_1/loss=45.543, player_2/loss=357.772, rew=540.00]


Epoch #824: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #825: 1025it [00:02, 376.96it/s, env_step=844800, len=39, n/ep=2, n/st=64, player_1/loss=40.752, player_2/loss=117.796, rew=1600.00]


Epoch #825: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #826: 1025it [00:02, 373.39it/s, env_step=845824, len=19, n/ep=4, n/st=64, player_1/loss=186.388, player_2/loss=73.571, rew=405.00]


Epoch #826: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #827: 1025it [00:02, 377.52it/s, env_step=846848, len=16, n/ep=4, n/st=64, player_1/loss=327.816, player_2/loss=233.303, rew=271.00]


Epoch #827: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #828: 1025it [00:02, 380.46it/s, env_step=847872, len=23, n/ep=3, n/st=64, player_1/loss=367.879, player_2/loss=464.752, rew=566.00]


Epoch #828: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #829: 1025it [00:02, 370.69it/s, env_step=848896, len=19, n/ep=4, n/st=64, player_1/loss=271.265, player_2/loss=432.536, rew=500.50]


Epoch #829: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #830: 1025it [00:02, 378.91it/s, env_step=849920, len=20, n/ep=4, n/st=64, player_1/loss=161.937, player_2/loss=284.046, rew=450.50]


Epoch #830: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #831: 1025it [00:02, 378.07it/s, env_step=850944, len=21, n/ep=3, n/st=64, player_1/loss=400.832, player_2/loss=222.239, rew=489.33]


Epoch #831: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #832: 1025it [00:02, 380.46it/s, env_step=851968, len=17, n/ep=4, n/st=64, player_1/loss=656.363, player_2/loss=116.484, rew=324.00]


Epoch #832: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #833: 1025it [00:02, 375.30it/s, env_step=852992, len=22, n/ep=3, n/st=64, player_1/loss=366.992, player_2/loss=214.159, rew=504.00]


Epoch #833: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #834: 1025it [00:02, 375.30it/s, env_step=854016, len=20, n/ep=3, n/st=64, player_1/loss=155.558, player_2/loss=329.323, rew=432.67]


Epoch #834: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #835: 1025it [00:02, 374.35it/s, env_step=855040, len=15, n/ep=4, n/st=64, player_1/loss=210.135, player_2/loss=240.278, rew=257.00]


Epoch #835: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #836: 1025it [00:02, 379.89it/s, env_step=856064, len=15, n/ep=5, n/st=64, player_1/loss=488.405, player_2/loss=221.169, rew=245.20]


Epoch #836: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #837: 1025it [00:02, 380.60it/s, env_step=857088, len=17, n/ep=4, n/st=64, player_1/loss=516.565, rew=320.00]    


Epoch #837: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #838: 1025it [00:02, 379.75it/s, env_step=858112, len=21, n/ep=3, n/st=64, player_1/loss=247.805, player_2/loss=244.259, rew=516.00]


Epoch #838: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #839: 1025it [00:02, 378.21it/s, env_step=859136, len=21, n/ep=3, n/st=64, player_1/loss=200.234, player_2/loss=217.020, rew=482.67]


Epoch #839: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #840: 1025it [00:02, 378.77it/s, env_step=860160, len=22, n/ep=3, n/st=64, player_1/loss=166.870, player_2/loss=225.061, rew=508.67]


Epoch #840: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #841: 1025it [00:02, 379.47it/s, env_step=861184, len=17, n/ep=4, n/st=64, player_1/loss=202.273, player_2/loss=232.079, rew=352.00]


Epoch #841: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #842: 1025it [00:02, 380.03it/s, env_step=862208, len=24, n/ep=3, n/st=64, player_1/loss=262.128, player_2/loss=225.453, rew=616.00]


Epoch #842: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #843: 1025it [00:02, 380.46it/s, env_step=863232, len=26, n/ep=2, n/st=64, player_1/loss=194.504, player_2/loss=180.260, rew=757.00]


Epoch #843: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #844: 1025it [00:02, 376.13it/s, env_step=864256, len=36, n/ep=2, n/st=64, player_1/loss=397.001, player_2/loss=157.197, rew=1334.00]


Epoch #844: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #845: 1025it [00:02, 379.89it/s, env_step=865280, len=20, n/ep=3, n/st=64, player_1/loss=552.285, player_2/loss=591.693, rew=490.67]


Epoch #845: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #846: 1025it [00:02, 379.05it/s, env_step=866304, len=19, n/ep=3, n/st=64, player_1/loss=394.283, player_2/loss=519.202, rew=428.67]


Epoch #846: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #847: 1025it [00:02, 377.65it/s, env_step=867328, len=20, n/ep=3, n/st=64, player_1/loss=203.320, player_2/loss=288.888, rew=450.67]


Epoch #847: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #848: 1025it [00:02, 379.33it/s, env_step=868352, len=28, n/ep=2, n/st=64, player_1/loss=390.003, player_2/loss=319.955, rew=839.00]


Epoch #848: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #849: 1025it [00:02, 379.19it/s, env_step=869376, len=28, n/ep=2, n/st=64, player_1/loss=385.958, player_2/loss=176.264, rew=851.00]


Epoch #849: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #850: 1025it [00:02, 381.87it/s, env_step=870400, len=42, n/ep=1, n/st=64, player_1/loss=377.218, player_2/loss=256.223, rew=1834.00]


Epoch #850: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #851: 1025it [00:02, 381.02it/s, env_step=871424, len=35, n/ep=2, n/st=64, player_1/loss=371.703, player_2/loss=317.526, rew=1258.00]


Epoch #851: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #852: 1025it [00:02, 378.07it/s, env_step=872448, len=22, n/ep=3, n/st=64, player_1/loss=462.017, player_2/loss=296.959, rew=510.00]


Epoch #852: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #853: 1025it [00:02, 378.35it/s, env_step=873472, len=30, n/ep=2, n/st=64, player_1/loss=292.099, player_2/loss=428.762, rew=937.00]


Epoch #853: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #854: 1025it [00:02, 379.75it/s, env_step=874496, len=29, n/ep=2, n/st=64, player_1/loss=327.978, player_2/loss=440.247, rew=918.00]


Epoch #854: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #855: 1025it [00:02, 369.09it/s, env_step=875520, len=29, n/ep=2, n/st=64, player_1/loss=262.099, player_2/loss=50.494, rew=872.00]


Epoch #855: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #856: 1025it [00:02, 366.98it/s, env_step=876544, len=29, n/ep=2, n/st=64, player_1/loss=328.015, player_2/loss=302.420, rew=868.00]


Epoch #856: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #857: 1025it [00:02, 377.93it/s, env_step=877568, len=27, n/ep=2, n/st=64, player_2/loss=373.160, rew=784.00]    


Epoch #857: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #858: 1025it [00:02, 380.32it/s, env_step=878592, len=19, n/ep=3, n/st=64, player_1/loss=227.538, player_2/loss=165.358, rew=472.67]


Epoch #858: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #859: 1025it [00:02, 379.05it/s, env_step=879616, len=23, n/ep=2, n/st=64, player_1/loss=241.797, player_2/loss=150.796, rew=574.00]


Epoch #859: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #860: 1025it [00:02, 377.24it/s, env_step=880640, len=25, n/ep=3, n/st=64, player_1/loss=319.656, player_2/loss=146.762, rew=656.67]


Epoch #860: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #861: 1025it [00:02, 361.16it/s, env_step=881664, len=21, n/ep=2, n/st=64, player_1/loss=667.029, player_2/loss=175.809, rew=592.00]


Epoch #861: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #862: 1025it [00:02, 380.46it/s, env_step=882688, len=19, n/ep=4, n/st=64, player_1/loss=640.201, player_2/loss=261.306, rew=418.50]


Epoch #862: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #863: 1025it [00:02, 375.86it/s, env_step=883712, len=15, n/ep=4, n/st=64, player_1/loss=390.589, player_2/loss=241.204, rew=247.50]


Epoch #863: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #864: 1025it [00:02, 375.44it/s, env_step=884736, len=22, n/ep=3, n/st=64, player_1/loss=470.605, player_2/loss=91.756, rew=537.33]


Epoch #864: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #865: 1025it [00:02, 381.02it/s, env_step=885760, len=26, n/ep=2, n/st=64, player_1/loss=561.945, player_2/loss=224.729, rew=700.00]


Epoch #865: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #866: 1025it [00:02, 377.51it/s, env_step=886784, len=25, n/ep=2, n/st=64, player_1/loss=266.206, player_2/loss=363.555, rew=652.00]


Epoch #866: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #867: 1025it [00:02, 373.53it/s, env_step=887808, len=34, n/ep=2, n/st=64, player_1/loss=156.027, player_2/loss=177.933, rew=1253.00]


Epoch #867: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #868: 1025it [00:02, 381.87it/s, env_step=888832, len=15, n/ep=4, n/st=64, player_1/loss=266.878, player_2/loss=44.571, rew=275.50]


Epoch #868: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #869: 1025it [00:02, 376.96it/s, env_step=889856, len=16, n/ep=3, n/st=64, player_1/loss=284.924, player_2/loss=40.842, rew=284.67]


Epoch #869: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #870: 1025it [00:02, 373.94it/s, env_step=890880, len=17, n/ep=4, n/st=64, player_1/loss=139.015, player_2/loss=144.376, rew=330.50]


Epoch #870: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #871: 1025it [00:02, 377.10it/s, env_step=891904, len=14, n/ep=4, n/st=64, player_1/loss=152.347, player_2/loss=142.185, rew=231.00]


Epoch #871: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #872: 1025it [00:02, 374.62it/s, env_step=892928, len=20, n/ep=3, n/st=64, player_1/loss=245.252, player_2/loss=18.344, rew=422.67]


Epoch #872: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #873: 1025it [00:02, 381.16it/s, env_step=893952, len=18, n/ep=2, n/st=64, player_1/loss=312.488, player_2/loss=228.634, rew=389.00]


Epoch #873: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #874: 1025it [00:02, 378.35it/s, env_step=894976, len=15, n/ep=4, n/st=64, player_1/loss=238.427, player_2/loss=231.889, rew=257.00]


Epoch #874: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #875: 1025it [00:02, 374.89it/s, env_step=896000, len=22, n/ep=3, n/st=64, player_1/loss=145.588, player_2/loss=325.713, rew=510.00]


Epoch #875: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #876: 1025it [00:02, 376.41it/s, env_step=897024, len=23, n/ep=3, n/st=64, player_1/loss=347.188, player_2/loss=247.504, rew=554.67]


Epoch #876: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #877: 1025it [00:02, 376.68it/s, env_step=898048, len=19, n/ep=3, n/st=64, player_1/loss=363.572, player_2/loss=289.984, rew=406.00]


Epoch #877: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #878: 1025it [00:02, 376.82it/s, env_step=899072, len=21, n/ep=3, n/st=64, player_1/loss=174.165, player_2/loss=381.018, rew=493.33]


Epoch #878: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #879: 1025it [00:02, 375.17it/s, env_step=900096, len=24, n/ep=3, n/st=64, player_1/loss=192.216, player_2/loss=285.885, rew=780.67]


Epoch #879: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #880: 1025it [00:02, 377.38it/s, env_step=901120, len=25, n/ep=3, n/st=64, player_1/loss=257.232, player_2/loss=192.244, rew=648.67]


Epoch #880: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #881: 1025it [00:02, 377.93it/s, env_step=902144, len=31, n/ep=2, n/st=64, player_1/loss=385.669, player_2/loss=115.217, rew=1006.00]


Epoch #881: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #882: 1025it [00:02, 379.05it/s, env_step=903168, len=14, n/ep=5, n/st=64, player_1/loss=455.505, player_2/loss=307.906, rew=228.00]


Epoch #882: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #883: 1025it [00:02, 380.88it/s, env_step=904192, len=29, n/ep=2, n/st=64, player_1/loss=563.690, player_2/loss=326.375, rew=904.00]


Epoch #883: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #884: 1025it [00:02, 377.65it/s, env_step=905216, len=26, n/ep=3, n/st=64, player_1/loss=515.029, player_2/loss=294.779, rew=716.67]


Epoch #884: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #885: 1025it [00:02, 379.75it/s, env_step=906240, len=22, n/ep=3, n/st=64, player_1/loss=84.856, player_2/loss=272.550, rew=520.00]


Epoch #885: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #886: 1025it [00:02, 380.74it/s, env_step=907264, len=19, n/ep=3, n/st=64, player_1/loss=70.731, player_2/loss=251.910, rew=391.33]


Epoch #886: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #887: 1025it [00:02, 377.93it/s, env_step=908288, len=26, n/ep=2, n/st=64, player_1/loss=159.095, player_2/loss=195.732, rew=733.00]


Epoch #887: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #888: 1025it [00:02, 378.35it/s, env_step=909312, len=22, n/ep=3, n/st=64, player_1/loss=222.447, player_2/loss=83.481, rew=538.00]


Epoch #888: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #889: 1025it [00:02, 376.27it/s, env_step=910336, len=8, n/ep=9, n/st=64, player_1/loss=171.282, player_2/loss=112.057, rew=72.22]


Epoch #889: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #890: 1025it [00:02, 378.07it/s, env_step=911360, len=8, n/ep=7, n/st=64, player_1/loss=234.335, player_2/loss=313.279, rew=80.57]


Epoch #890: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #891: 1025it [00:02, 376.55it/s, env_step=912384, len=9, n/ep=7, n/st=64, player_1/loss=220.677, player_2/loss=333.143, rew=93.14]


Epoch #891: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #892: 1025it [00:02, 376.55it/s, env_step=913408, len=10, n/ep=6, n/st=64, player_1/loss=47.507, player_2/loss=247.148, rew=112.00]


Epoch #892: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #893: 1025it [00:02, 378.49it/s, env_step=914432, len=17, n/ep=4, n/st=64, player_2/loss=223.404, rew=338.00]    


Epoch #893: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #894: 1025it [00:02, 376.54it/s, env_step=915456, len=15, n/ep=4, n/st=64, player_1/loss=79.618, player_2/loss=175.611, rew=244.00]


Epoch #894: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #895: 1025it [00:02, 377.93it/s, env_step=916480, len=22, n/ep=3, n/st=64, player_1/loss=89.049, player_2/loss=227.913, rew=629.33]


Epoch #895: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #896: 1025it [00:02, 380.18it/s, env_step=917504, len=22, n/ep=3, n/st=64, player_1/loss=76.468, player_2/loss=205.658, rew=564.00]


Epoch #896: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #897: 1025it [00:02, 375.86it/s, env_step=918528, len=13, n/ep=5, n/st=64, player_1/loss=102.538, player_2/loss=289.154, rew=264.40]


Epoch #897: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #898: 1025it [00:02, 376.41it/s, env_step=919552, len=10, n/ep=7, n/st=64, player_1/loss=211.305, player_2/loss=417.895, rew=136.86]


Epoch #898: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #899: 1025it [00:02, 375.03it/s, env_step=920576, len=16, n/ep=4, n/st=64, player_1/loss=168.735, player_2/loss=379.585, rew=314.50]


Epoch #899: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #900: 1025it [00:02, 377.38it/s, env_step=921600, len=14, n/ep=5, n/st=64, player_1/loss=178.162, player_2/loss=263.118, rew=250.00]


Epoch #900: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #901: 1025it [00:02, 379.47it/s, env_step=922624, len=23, n/ep=3, n/st=64, player_1/loss=178.868, player_2/loss=140.044, rew=552.67]


Epoch #901: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #902: 1025it [00:02, 379.19it/s, env_step=923648, len=19, n/ep=3, n/st=64, player_1/loss=129.891, player_2/loss=69.514, rew=418.00]


Epoch #902: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #903: 1025it [00:02, 379.61it/s, env_step=924672, len=21, n/ep=3, n/st=64, player_1/loss=126.286, player_2/loss=68.141, rew=462.67]


Epoch #903: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #904: 1025it [00:02, 375.86it/s, env_step=925696, len=21, n/ep=3, n/st=64, player_1/loss=138.069, player_2/loss=138.051, rew=489.33]


Epoch #904: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #905: 1025it [00:02, 379.19it/s, env_step=926720, len=20, n/ep=3, n/st=64, player_1/loss=175.480, player_2/loss=219.146, rew=418.67]


Epoch #905: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #906: 1025it [00:02, 377.79it/s, env_step=927744, len=14, n/ep=4, n/st=64, player_1/loss=156.056, player_2/loss=326.210, rew=212.50]


Epoch #906: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #907: 1025it [00:02, 380.46it/s, env_step=928768, len=25, n/ep=2, n/st=64, player_1/loss=131.335, player_2/loss=398.991, rew=649.00]


Epoch #907: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #908: 1025it [00:02, 373.94it/s, env_step=929792, len=25, n/ep=2, n/st=64, player_1/loss=85.944, player_2/loss=425.841, rew=856.00]


Epoch #908: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #909: 1025it [00:02, 379.47it/s, env_step=930816, len=25, n/ep=3, n/st=64, player_1/loss=48.708, player_2/loss=435.797, rew=684.00]


Epoch #909: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #910: 1025it [00:02, 376.41it/s, env_step=931840, len=27, n/ep=2, n/st=64, player_1/loss=84.897, player_2/loss=430.496, rew=854.00]


Epoch #910: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #911: 1025it [00:02, 376.68it/s, env_step=932864, len=15, n/ep=4, n/st=64, player_1/loss=150.904, player_2/loss=328.256, rew=255.00]


Epoch #911: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #912: 1025it [00:02, 374.76it/s, env_step=933888, len=14, n/ep=4, n/st=64, player_1/loss=165.266, player_2/loss=201.373, rew=234.00]


Epoch #912: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #913: 1025it [00:02, 378.91it/s, env_step=934912, len=14, n/ep=4, n/st=64, player_1/loss=193.492, player_2/loss=202.584, rew=225.50]


Epoch #913: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #914: 1025it [00:02, 376.82it/s, env_step=935936, len=16, n/ep=4, n/st=64, player_1/loss=166.692, player_2/loss=168.099, rew=274.50]


Epoch #914: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #915: 1025it [00:02, 378.35it/s, env_step=936960, len=20, n/ep=3, n/st=64, player_1/loss=99.768, player_2/loss=85.787, rew=436.00]


Epoch #915: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #916: 1025it [00:02, 377.65it/s, env_step=937984, len=24, n/ep=3, n/st=64, player_1/loss=145.402, rew=788.67]    


Epoch #916: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #917: 1025it [00:02, 376.27it/s, env_step=939008, len=19, n/ep=4, n/st=64, player_1/loss=428.886, player_2/loss=175.730, rew=440.00]


Epoch #917: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #918: 1025it [00:02, 377.51it/s, env_step=940032, len=16, n/ep=4, n/st=64, player_1/loss=488.206, player_2/loss=94.318, rew=270.50]


Epoch #918: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #919: 1025it [00:02, 376.82it/s, env_step=941056, len=22, n/ep=3, n/st=64, player_1/loss=215.795, player_2/loss=314.474, rew=504.67]


Epoch #919: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #920: 1025it [00:02, 372.31it/s, env_step=942080, len=20, n/ep=3, n/st=64, player_1/loss=79.341, player_2/loss=306.707, rew=420.67]


Epoch #920: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #921: 1025it [00:02, 379.05it/s, env_step=943104, len=26, n/ep=2, n/st=64, player_1/loss=115.951, player_2/loss=82.763, rew=700.00]


Epoch #921: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #922: 1025it [00:02, 378.91it/s, env_step=944128, len=20, n/ep=2, n/st=64, player_1/loss=170.478, player_2/loss=422.675, rew=443.00]


Epoch #922: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #923: 1025it [00:02, 378.35it/s, env_step=945152, len=20, n/ep=3, n/st=64, player_1/loss=111.500, player_2/loss=482.799, rew=434.67]


Epoch #923: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #924: 1025it [00:02, 377.38it/s, env_step=946176, len=13, n/ep=5, n/st=64, player_1/loss=339.916, player_2/loss=204.151, rew=241.60]


Epoch #924: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #925: 1025it [00:02, 376.82it/s, env_step=947200, len=10, n/ep=7, n/st=64, player_1/loss=440.970, player_2/loss=419.690, rew=122.29]


Epoch #925: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #926: 1025it [00:02, 375.72it/s, env_step=948224, len=21, n/ep=3, n/st=64, player_1/loss=255.048, player_2/loss=470.422, rew=462.00]


Epoch #926: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #927: 1025it [00:02, 379.05it/s, env_step=949248, len=15, n/ep=4, n/st=64, player_1/loss=120.897, player_2/loss=517.433, rew=256.00]


Epoch #927: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #928: 1025it [00:02, 376.96it/s, env_step=950272, len=14, n/ep=4, n/st=64, player_1/loss=237.618, player_2/loss=263.990, rew=216.00]


Epoch #928: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #929: 1025it [00:02, 378.35it/s, env_step=951296, len=12, n/ep=4, n/st=64, player_1/loss=253.102, player_2/loss=309.487, rew=167.50]


Epoch #929: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #930: 1025it [00:02, 377.65it/s, env_step=952320, len=8, n/ep=5, n/st=64, player_1/loss=204.002, player_2/loss=416.416, rew=77.20]


Epoch #930: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #931: 1025it [00:02, 380.74it/s, env_step=953344, len=23, n/ep=3, n/st=64, player_1/loss=228.672, player_2/loss=265.563, rew=576.67]


Epoch #931: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #932: 1025it [00:02, 377.10it/s, env_step=954368, len=19, n/ep=4, n/st=64, player_1/loss=143.594, player_2/loss=109.033, rew=428.00]


Epoch #932: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #933: 1025it [00:02, 379.33it/s, env_step=955392, len=22, n/ep=3, n/st=64, player_1/loss=273.282, player_2/loss=220.727, rew=522.00]


Epoch #933: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #934: 1025it [00:02, 377.79it/s, env_step=956416, len=9, n/ep=7, n/st=64, player_1/loss=271.293, player_2/loss=309.817, rew=96.86]


Epoch #934: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #935: 1025it [00:02, 376.68it/s, env_step=957440, len=28, n/ep=2, n/st=64, player_1/loss=144.733, player_2/loss=396.401, rew=811.00]


Epoch #935: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #936: 1025it [00:02, 378.07it/s, env_step=958464, len=21, n/ep=3, n/st=64, player_1/loss=326.036, player_2/loss=349.496, rew=542.67]


Epoch #936: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #937: 1025it [00:02, 377.65it/s, env_step=959488, len=19, n/ep=3, n/st=64, player_1/loss=247.308, player_2/loss=309.082, rew=396.00]


Epoch #937: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #938: 1025it [00:02, 376.96it/s, env_step=960512, len=23, n/ep=2, n/st=64, player_1/loss=178.361, player_2/loss=357.086, rew=574.00]


Epoch #938: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #939: 1025it [00:02, 375.03it/s, env_step=961536, len=26, n/ep=2, n/st=64, player_1/loss=228.743, player_2/loss=460.027, rew=729.00]


Epoch #939: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #940: 1025it [00:02, 378.21it/s, env_step=962560, len=12, n/ep=7, n/st=64, player_1/loss=166.538, player_2/loss=353.601, rew=220.29]


Epoch #940: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #941: 1025it [00:02, 377.37it/s, env_step=963584, len=18, n/ep=3, n/st=64, player_1/loss=273.176, player_2/loss=282.644, rew=437.33]


Epoch #941: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #942: 1025it [00:02, 377.65it/s, env_step=964608, len=15, n/ep=4, n/st=64, player_1/loss=233.421, player_2/loss=423.167, rew=258.50]


Epoch #942: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #943: 1025it [00:02, 375.30it/s, env_step=965632, len=13, n/ep=5, n/st=64, player_1/loss=176.141, player_2/loss=380.068, rew=197.20]


Epoch #943: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #944: 1025it [00:02, 374.35it/s, env_step=966656, len=16, n/ep=4, n/st=64, player_2/loss=152.355, rew=291.50]    


Epoch #944: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #945: 1025it [00:02, 378.63it/s, env_step=967680, len=14, n/ep=5, n/st=64, player_1/loss=333.730, player_2/loss=144.419, rew=220.40]


Epoch #945: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #946: 1025it [00:02, 380.60it/s, env_step=968704, len=20, n/ep=3, n/st=64, player_1/loss=306.961, player_2/loss=331.900, rew=473.33]


Epoch #946: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #947: 1025it [00:02, 376.96it/s, env_step=969728, len=20, n/ep=3, n/st=64, player_1/loss=228.937, player_2/loss=249.647, rew=432.00]


Epoch #947: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #948: 1025it [00:02, 374.07it/s, env_step=970752, len=41, n/ep=2, n/st=64, player_1/loss=171.605, player_2/loss=447.268, rew=1736.00]


Epoch #948: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #949: 1025it [00:02, 378.35it/s, env_step=971776, len=22, n/ep=3, n/st=64, player_1/loss=129.705, player_2/loss=374.037, rew=520.00]


Epoch #949: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #950: 1025it [00:02, 379.19it/s, env_step=972800, len=21, n/ep=3, n/st=64, player_1/loss=22.366, player_2/loss=69.003, rew=476.00]


Epoch #950: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #951: 1025it [00:02, 378.77it/s, env_step=973824, len=25, n/ep=3, n/st=64, player_1/loss=394.949, player_2/loss=62.022, rew=730.67]


Epoch #951: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #952: 1025it [00:02, 377.65it/s, env_step=974848, len=27, n/ep=2, n/st=64, player_1/loss=472.421, player_2/loss=447.823, rew=755.00]


Epoch #952: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #953: 1025it [00:02, 378.49it/s, env_step=975872, len=13, n/ep=5, n/st=64, player_1/loss=428.206, player_2/loss=552.941, rew=205.20]


Epoch #953: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #954: 1025it [00:02, 377.24it/s, env_step=976896, len=15, n/ep=5, n/st=64, player_1/loss=173.658, player_2/loss=347.376, rew=262.40]


Epoch #954: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #955: 1025it [00:02, 378.77it/s, env_step=977920, len=18, n/ep=3, n/st=64, player_1/loss=163.997, player_2/loss=406.132, rew=369.33]


Epoch #955: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #956: 1025it [00:02, 376.68it/s, env_step=978944, len=30, n/ep=2, n/st=64, player_1/loss=287.272, player_2/loss=468.480, rew=1009.00]


Epoch #956: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #957: 1025it [00:02, 377.52it/s, env_step=979968, len=34, n/ep=2, n/st=64, player_1/loss=265.555, player_2/loss=445.424, rew=1192.00]


Epoch #957: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #958: 1025it [00:02, 380.60it/s, env_step=980992, len=17, n/ep=4, n/st=64, player_1/loss=226.363, player_2/loss=340.611, rew=352.00]


Epoch #958: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #959: 1025it [00:02, 374.89it/s, env_step=982016, len=16, n/ep=4, n/st=64, player_1/loss=203.630, player_2/loss=193.430, rew=296.50]


Epoch #959: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #960: 1025it [00:02, 376.27it/s, env_step=983040, len=16, n/ep=4, n/st=64, player_1/loss=166.974, player_2/loss=234.103, rew=289.50]


Epoch #960: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #961: 1025it [00:02, 378.49it/s, env_step=984064, len=18, n/ep=3, n/st=64, player_1/loss=246.202, player_2/loss=152.318, rew=382.67]


Epoch #961: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #962: 1025it [00:02, 377.10it/s, env_step=985088, len=15, n/ep=4, n/st=64, player_1/loss=304.329, player_2/loss=185.324, rew=265.50]


Epoch #962: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #963: 1025it [00:02, 374.21it/s, env_step=986112, len=12, n/ep=5, n/st=64, player_1/loss=243.770, player_2/loss=276.338, rew=168.80]


Epoch #963: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #964: 1025it [00:02, 361.03it/s, env_step=987136, len=14, n/ep=4, n/st=64, player_1/loss=213.426, player_2/loss=179.262, rew=231.50]


Epoch #964: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #965: 1025it [00:02, 361.67it/s, env_step=988160, len=14, n/ep=5, n/st=64, player_1/loss=194.089, player_2/loss=249.757, rew=217.60]


Epoch #965: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #966: 1025it [00:02, 375.03it/s, env_step=989184, len=27, n/ep=3, n/st=64, player_1/loss=256.197, player_2/loss=290.235, rew=775.33]


Epoch #966: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #967: 1025it [00:02, 377.52it/s, env_step=990208, len=17, n/ep=3, n/st=64, player_1/loss=294.513, player_2/loss=209.186, rew=328.67]


Epoch #967: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #968: 1025it [00:02, 375.86it/s, env_step=991232, len=20, n/ep=4, n/st=64, player_1/loss=194.378, player_2/loss=168.579, rew=491.50]


Epoch #968: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #969: 1025it [00:02, 375.99it/s, env_step=992256, len=18, n/ep=4, n/st=64, player_1/loss=141.364, player_2/loss=156.960, rew=356.00]


Epoch #969: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #970: 1025it [00:02, 374.89it/s, env_step=993280, len=20, n/ep=2, n/st=64, player_1/loss=304.451, player_2/loss=263.379, rew=422.00]


Epoch #970: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #971: 1025it [00:02, 376.41it/s, env_step=994304, len=26, n/ep=3, n/st=64, player_1/loss=401.449, player_2/loss=254.699, rew=748.67]


Epoch #971: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #972: 1025it [00:02, 377.24it/s, env_step=995328, len=19, n/ep=3, n/st=64, player_1/loss=225.304, player_2/loss=176.563, rew=391.33]


Epoch #972: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #973: 1025it [00:02, 377.24it/s, env_step=996352, len=31, n/ep=2, n/st=64, player_1/loss=119.641, player_2/loss=131.231, rew=991.00]


Epoch #973: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #974: 1025it [00:02, 380.74it/s, env_step=997376, len=14, n/ep=4, n/st=64, player_1/loss=104.217, player_2/loss=166.723, rew=209.00]


Epoch #974: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #975: 1025it [00:02, 378.63it/s, env_step=998400, len=21, n/ep=3, n/st=64, player_1/loss=158.638, rew=527.33]    


Epoch #975: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #976: 1025it [00:02, 376.13it/s, env_step=999424, len=28, n/ep=2, n/st=64, player_1/loss=219.486, player_2/loss=202.937, rew=811.00]


Epoch #976: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #977: 1025it [00:02, 377.10it/s, env_step=1000448, len=23, n/ep=3, n/st=64, player_1/loss=151.094, player_2/loss=213.532, rew=583.33]


Epoch #977: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #978: 1025it [00:02, 378.35it/s, env_step=1001472, len=15, n/ep=4, n/st=64, player_1/loss=106.706, player_2/loss=249.751, rew=257.00]


Epoch #978: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #979: 1025it [00:02, 376.96it/s, env_step=1002496, len=16, n/ep=4, n/st=64, player_1/loss=157.179, player_2/loss=208.951, rew=273.00]


Epoch #979: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #980: 1025it [00:02, 375.44it/s, env_step=1003520, len=27, n/ep=3, n/st=64, player_2/loss=189.188, rew=772.67]   


Epoch #980: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #981: 1025it [00:02, 375.30it/s, env_step=1004544, len=24, n/ep=3, n/st=64, player_1/loss=223.930, rew=625.33]   


Epoch #981: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #982: 1025it [00:02, 376.68it/s, env_step=1005568, len=22, n/ep=3, n/st=64, player_1/loss=54.429, player_2/loss=162.084, rew=536.00]


Epoch #982: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #983: 1025it [00:02, 381.31it/s, env_step=1006592, len=18, n/ep=4, n/st=64, player_1/loss=168.992, player_2/loss=164.951, rew=379.00]


Epoch #983: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #984: 1025it [00:02, 377.79it/s, env_step=1007616, len=30, n/ep=2, n/st=64, player_1/loss=226.978, player_2/loss=96.594, rew=1001.00]


Epoch #984: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #985: 1025it [00:02, 375.72it/s, env_step=1008640, len=28, n/ep=2, n/st=64, player_1/loss=103.621, player_2/loss=55.189, rew=811.00]


Epoch #985: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #986: 1025it [00:02, 377.24it/s, env_step=1009664, len=24, n/ep=2, n/st=64, player_1/loss=167.971, player_2/loss=88.610, rew=625.00]


Epoch #986: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #987: 1025it [00:02, 376.55it/s, env_step=1010688, len=14, n/ep=4, n/st=64, player_1/loss=180.347, player_2/loss=73.548, rew=234.50]


Epoch #987: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #988: 1025it [00:02, 375.99it/s, env_step=1011712, len=18, n/ep=4, n/st=64, player_1/loss=141.129, player_2/loss=196.325, rew=377.00]


Epoch #988: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #989: 1025it [00:02, 377.79it/s, env_step=1012736, len=25, n/ep=3, n/st=64, player_1/loss=304.625, player_2/loss=328.526, rew=660.67]


Epoch #989: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #990: 1025it [00:02, 376.41it/s, env_step=1013760, len=15, n/ep=4, n/st=64, player_1/loss=279.473, player_2/loss=342.010, rew=242.50]


Epoch #990: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #991: 1025it [00:02, 378.77it/s, env_step=1014784, len=25, n/ep=3, n/st=64, player_1/loss=79.917, player_2/loss=150.443, rew=686.00]


Epoch #991: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #992: 1025it [00:02, 375.03it/s, env_step=1015808, len=26, n/ep=2, n/st=64, player_1/loss=116.489, player_2/loss=216.577, rew=747.00]


Epoch #992: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #993: 1025it [00:02, 375.03it/s, env_step=1016832, len=33, n/ep=2, n/st=64, player_1/loss=244.084, player_2/loss=283.338, rew=1156.00]


Epoch #993: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #994: 1025it [00:02, 377.52it/s, env_step=1017856, len=14, n/ep=5, n/st=64, player_1/loss=354.595, player_2/loss=286.870, rew=216.40]


Epoch #994: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #995: 1025it [00:02, 377.93it/s, env_step=1018880, len=17, n/ep=4, n/st=64, player_1/loss=226.533, player_2/loss=222.030, rew=318.00]


Epoch #995: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #996: 1025it [00:02, 376.27it/s, env_step=1019904, len=15, n/ep=5, n/st=64, player_1/loss=136.624, player_2/loss=224.558, rew=241.60]


Epoch #996: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #997: 1025it [00:02, 378.49it/s, env_step=1020928, len=31, n/ep=2, n/st=64, player_1/loss=317.130, player_2/loss=247.015, rew=1006.00]


Epoch #997: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #998: 1025it [00:02, 377.24it/s, env_step=1021952, len=37, n/ep=1, n/st=64, player_1/loss=254.143, player_2/loss=229.280, rew=1404.00]


Epoch #998: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #999: 1025it [00:02, 375.72it/s, env_step=1022976, len=18, n/ep=4, n/st=64, player_1/loss=147.909, player_2/loss=310.487, rew=380.50]


Epoch #999: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1000: 1025it [00:02, 375.17it/s, env_step=1024000, len=9, n/ep=6, n/st=64, player_1/loss=148.571, player_2/loss=367.143, rew=96.33]


Epoch #1000: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1001: 1025it [00:02, 377.52it/s, env_step=1025024, len=23, n/ep=2, n/st=64, player_1/loss=87.850, player_2/loss=237.664, rew=576.00]


Epoch #1001: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1002: 1025it [00:02, 376.68it/s, env_step=1026048, len=7, n/ep=8, n/st=64, player_1/loss=109.757, player_2/loss=189.430, rew=58.25]


Epoch #1002: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1003: 1025it [00:02, 376.68it/s, env_step=1027072, len=12, n/ep=5, n/st=64, player_1/loss=136.834, player_2/loss=295.167, rew=184.80]


Epoch #1003: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1004: 1025it [00:02, 374.21it/s, env_step=1028096, len=7, n/ep=9, n/st=64, player_1/loss=134.031, player_2/loss=388.976, rew=66.22]


Epoch #1004: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1005: 1025it [00:02, 375.72it/s, env_step=1029120, len=16, n/ep=4, n/st=64, player_1/loss=115.806, player_2/loss=408.281, rew=283.00]


Epoch #1005: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1006: 1025it [00:02, 373.66it/s, env_step=1030144, len=20, n/ep=3, n/st=64, player_1/loss=56.842, player_2/loss=318.118, rew=491.33]


Epoch #1006: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1007: 1025it [00:02, 372.71it/s, env_step=1031168, len=8, n/ep=7, n/st=64, player_2/loss=259.140, rew=86.57]    


Epoch #1007: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1008: 1025it [00:02, 376.27it/s, env_step=1032192, len=8, n/ep=8, n/st=64, player_1/loss=167.530, player_2/loss=300.797, rew=72.00]


Epoch #1008: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1009: 1025it [00:02, 378.07it/s, env_step=1033216, len=31, n/ep=2, n/st=64, player_1/loss=313.912, player_2/loss=258.503, rew=991.00]


Epoch #1009: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1010: 1025it [00:02, 374.89it/s, env_step=1034240, len=23, n/ep=2, n/st=64, player_1/loss=365.927, player_2/loss=195.853, rew=576.00]


Epoch #1010: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1011: 1025it [00:02, 377.24it/s, env_step=1035264, len=23, n/ep=3, n/st=64, player_1/loss=357.029, player_2/loss=177.763, rew=642.00]


Epoch #1011: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1012: 1025it [00:02, 375.58it/s, env_step=1036288, len=20, n/ep=3, n/st=64, player_1/loss=386.845, player_2/loss=181.181, rew=543.33]


Epoch #1012: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1013: 1025it [00:02, 377.79it/s, env_step=1037312, len=13, n/ep=6, n/st=64, player_1/loss=279.966, player_2/loss=154.046, rew=192.67]


Epoch #1013: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1014: 1025it [00:02, 377.65it/s, env_step=1038336, len=13, n/ep=4, n/st=64, player_1/loss=99.872, player_2/loss=254.891, rew=205.00]


Epoch #1014: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1015: 1025it [00:02, 376.41it/s, env_step=1039360, len=17, n/ep=4, n/st=64, player_1/loss=74.118, player_2/loss=262.913, rew=338.00]


Epoch #1015: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1016: 1025it [00:02, 375.58it/s, env_step=1040384, len=16, n/ep=4, n/st=64, player_1/loss=95.877, player_2/loss=304.754, rew=275.00]


Epoch #1016: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1017: 1025it [00:02, 376.54it/s, env_step=1041408, len=14, n/ep=4, n/st=64, player_1/loss=89.207, player_2/loss=136.468, rew=229.00]


Epoch #1017: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1018: 1025it [00:02, 376.68it/s, env_step=1042432, len=16, n/ep=4, n/st=64, player_1/loss=107.917, player_2/loss=46.895, rew=271.00]


Epoch #1018: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1019: 1025it [00:02, 374.48it/s, env_step=1043456, len=15, n/ep=5, n/st=64, player_1/loss=159.832, player_2/loss=95.378, rew=262.40]


Epoch #1019: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1020: 1025it [00:02, 380.03it/s, env_step=1044480, len=15, n/ep=4, n/st=64, player_1/loss=114.669, player_2/loss=145.328, rew=248.00]


Epoch #1020: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1021: 1025it [00:02, 374.48it/s, env_step=1045504, len=13, n/ep=4, n/st=64, player_1/loss=144.652, player_2/loss=86.868, rew=182.00]


Epoch #1021: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1022: 1025it [00:02, 375.86it/s, env_step=1046528, len=16, n/ep=3, n/st=64, player_1/loss=182.055, player_2/loss=84.928, rew=298.00]


Epoch #1022: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1023: 1025it [00:02, 377.52it/s, env_step=1047552, len=16, n/ep=3, n/st=64, player_1/loss=180.172, player_2/loss=75.033, rew=303.33]


Epoch #1023: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1024: 1025it [00:02, 375.72it/s, env_step=1048576, len=17, n/ep=4, n/st=64, player_1/loss=153.725, player_2/loss=73.046, rew=307.50]


Epoch #1024: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1025: 1025it [00:02, 376.27it/s, env_step=1049600, len=16, n/ep=4, n/st=64, player_1/loss=233.656, player_2/loss=80.069, rew=296.00]


Epoch #1025: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1026: 1025it [00:02, 380.88it/s, env_step=1050624, len=16, n/ep=4, n/st=64, player_1/loss=206.438, player_2/loss=134.019, rew=275.00]


Epoch #1026: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1027: 1025it [00:02, 372.44it/s, env_step=1051648, len=21, n/ep=3, n/st=64, player_1/loss=123.764, player_2/loss=193.356, rew=512.00]


Epoch #1027: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1028: 1025it [00:02, 377.79it/s, env_step=1052672, len=14, n/ep=4, n/st=64, player_1/loss=97.070, player_2/loss=184.788, rew=232.50]


Epoch #1028: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1029: 1025it [00:02, 377.52it/s, env_step=1053696, len=22, n/ep=3, n/st=64, player_1/loss=140.871, player_2/loss=144.521, rew=563.33]


Epoch #1029: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1030: 1025it [00:02, 377.52it/s, env_step=1054720, len=16, n/ep=3, n/st=64, player_1/loss=211.290, player_2/loss=137.327, rew=296.67]


Epoch #1030: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1031: 1025it [00:02, 376.96it/s, env_step=1055744, len=15, n/ep=5, n/st=64, player_1/loss=165.787, player_2/loss=314.825, rew=250.00]


Epoch #1031: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1032: 1025it [00:02, 376.41it/s, env_step=1056768, len=21, n/ep=3, n/st=64, player_1/loss=176.098, player_2/loss=273.581, rew=464.67]


Epoch #1032: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1033: 1025it [00:02, 376.27it/s, env_step=1057792, len=31, n/ep=2, n/st=64, player_1/loss=203.064, player_2/loss=102.151, rew=1052.00]


Epoch #1033: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1034: 1025it [00:02, 378.07it/s, env_step=1058816, len=26, n/ep=2, n/st=64, player_1/loss=163.031, player_2/loss=115.165, rew=700.00]


Epoch #1034: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1035: 1025it [00:02, 376.41it/s, env_step=1059840, len=32, n/ep=2, n/st=64, player_1/loss=264.403, player_2/loss=190.505, rew=1192.00]


Epoch #1035: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1036: 1025it [00:02, 377.38it/s, env_step=1060864, len=27, n/ep=2, n/st=64, player_1/loss=223.005, player_2/loss=232.647, rew=782.00]


Epoch #1036: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1037: 1025it [00:02, 377.52it/s, env_step=1061888, len=19, n/ep=3, n/st=64, player_1/loss=94.092, player_2/loss=225.813, rew=396.67]


Epoch #1037: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1038: 1025it [00:02, 378.07it/s, env_step=1062912, len=17, n/ep=4, n/st=64, player_1/loss=78.090, player_2/loss=148.107, rew=307.50]


Epoch #1038: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1039: 1025it [00:02, 374.48it/s, env_step=1063936, len=15, n/ep=4, n/st=64, player_1/loss=178.867, player_2/loss=123.214, rew=282.00]


Epoch #1039: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1040: 1025it [00:02, 377.65it/s, env_step=1064960, len=21, n/ep=3, n/st=64, player_1/loss=165.469, player_2/loss=93.241, rew=477.33]


Epoch #1040: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1041: 1025it [00:02, 378.63it/s, env_step=1065984, len=28, n/ep=3, n/st=64, player_1/loss=114.422, player_2/loss=216.159, rew=818.00]


Epoch #1041: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1042: 1025it [00:02, 376.13it/s, env_step=1067008, len=22, n/ep=3, n/st=64, player_1/loss=199.206, player_2/loss=266.820, rew=534.67]


Epoch #1042: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1043: 1025it [00:02, 371.50it/s, env_step=1068032, len=19, n/ep=4, n/st=64, player_1/loss=270.690, player_2/loss=262.569, rew=430.50]


Epoch #1043: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1044: 1025it [00:02, 376.41it/s, env_step=1069056, len=23, n/ep=3, n/st=64, player_1/loss=239.619, player_2/loss=185.383, rew=602.00]


Epoch #1044: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1045: 1025it [00:02, 379.19it/s, env_step=1070080, len=23, n/ep=3, n/st=64, player_1/loss=166.976, player_2/loss=95.827, rew=568.67]


Epoch #1045: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1046: 1025it [00:02, 375.03it/s, env_step=1071104, len=22, n/ep=3, n/st=64, player_1/loss=106.201, player_2/loss=77.344, rew=506.00]


Epoch #1046: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1047: 1025it [00:02, 374.89it/s, env_step=1072128, len=23, n/ep=3, n/st=64, player_1/loss=93.944, player_2/loss=73.487, rew=566.00]


Epoch #1047: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1048: 1025it [00:02, 375.86it/s, env_step=1073152, len=8, n/ep=8, n/st=64, player_1/loss=120.594, player_2/loss=83.738, rew=71.00]


Epoch #1048: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1049: 1025it [00:02, 374.89it/s, env_step=1074176, len=8, n/ep=8, n/st=64, player_1/loss=92.121, player_2/loss=171.590, rew=74.00]


Epoch #1049: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1050: 1025it [00:02, 371.09it/s, env_step=1075200, len=21, n/ep=3, n/st=64, player_1/loss=26.812, player_2/loss=354.274, rew=495.33]


Epoch #1050: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1051: 1025it [00:02, 375.58it/s, env_step=1076224, len=20, n/ep=3, n/st=64, player_1/loss=182.633, player_2/loss=368.176, rew=446.67]


Epoch #1051: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1052: 1025it [00:02, 377.65it/s, env_step=1077248, len=16, n/ep=3, n/st=64, player_1/loss=191.323, player_2/loss=349.052, rew=317.33]


Epoch #1052: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1053: 1025it [00:02, 377.24it/s, env_step=1078272, len=21, n/ep=3, n/st=64, player_1/loss=27.279, player_2/loss=232.201, rew=475.33]


Epoch #1053: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1054: 1025it [00:02, 376.41it/s, env_step=1079296, len=21, n/ep=3, n/st=64, player_1/loss=50.863, player_2/loss=210.127, rew=462.67]


Epoch #1054: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1055: 1025it [00:02, 377.93it/s, env_step=1080320, len=28, n/ep=2, n/st=64, player_1/loss=161.093, player_2/loss=314.456, rew=814.00]


Epoch #1055: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1056: 1025it [00:02, 378.63it/s, env_step=1081344, len=26, n/ep=2, n/st=64, player_1/loss=233.381, player_2/loss=249.549, rew=739.00]


Epoch #1056: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1057: 1025it [00:02, 377.52it/s, env_step=1082368, len=25, n/ep=3, n/st=64, player_1/loss=328.625, player_2/loss=263.292, rew=694.00]


Epoch #1057: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1058: 1025it [00:02, 373.94it/s, env_step=1083392, len=19, n/ep=4, n/st=64, player_1/loss=284.992, player_2/loss=253.697, rew=462.50]


Epoch #1058: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1059: 1025it [00:02, 379.33it/s, env_step=1084416, len=27, n/ep=2, n/st=64, player_1/loss=114.228, player_2/loss=144.454, rew=755.00]


Epoch #1059: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1060: 1025it [00:02, 374.07it/s, env_step=1085440, len=28, n/ep=3, n/st=64, player_1/loss=124.862, player_2/loss=143.511, rew=947.33]


Epoch #1060: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1061: 1025it [00:02, 378.21it/s, env_step=1086464, len=30, n/ep=2, n/st=64, player_1/loss=198.995, player_2/loss=281.359, rew=928.00]


Epoch #1061: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1062: 1025it [00:02, 376.27it/s, env_step=1087488, len=8, n/ep=8, n/st=64, player_1/loss=147.069, player_2/loss=272.990, rew=71.00]


Epoch #1062: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1063: 1025it [00:02, 375.30it/s, env_step=1088512, len=15, n/ep=4, n/st=64, player_1/loss=177.071, player_2/loss=367.970, rew=258.00]


Epoch #1063: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1064: 1025it [00:02, 377.10it/s, env_step=1089536, len=31, n/ep=2, n/st=64, player_1/loss=233.269, player_2/loss=324.209, rew=1026.00]


Epoch #1064: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1065: 1025it [00:02, 376.13it/s, env_step=1090560, len=19, n/ep=4, n/st=64, player_1/loss=206.823, player_2/loss=291.501, rew=380.00]


Epoch #1065: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1066: 1025it [00:02, 380.17it/s, env_step=1091584, len=20, n/ep=3, n/st=64, player_1/loss=148.497, player_2/loss=260.093, rew=576.00]


Epoch #1066: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1067: 1025it [00:02, 377.38it/s, env_step=1092608, len=16, n/ep=4, n/st=64, player_1/loss=152.519, player_2/loss=144.481, rew=279.00]


Epoch #1067: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1068: 1025it [00:02, 375.72it/s, env_step=1093632, len=17, n/ep=4, n/st=64, player_1/loss=220.677, player_2/loss=290.886, rew=347.00]


Epoch #1068: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1069: 1025it [00:02, 375.99it/s, env_step=1094656, len=23, n/ep=2, n/st=64, player_1/loss=157.180, player_2/loss=284.791, rew=559.00]


Epoch #1069: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1070: 1025it [00:02, 377.38it/s, env_step=1095680, len=21, n/ep=3, n/st=64, player_1/loss=182.753, rew=490.00]  


Epoch #1070: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1071: 1025it [00:02, 377.38it/s, env_step=1096704, len=28, n/ep=2, n/st=64, player_1/loss=136.842, player_2/loss=284.172, rew=869.00]


Epoch #1071: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1072: 1025it [00:02, 349.35it/s, env_step=1097728, len=27, n/ep=2, n/st=64, player_1/loss=193.564, player_2/loss=307.578, rew=782.00]


Epoch #1072: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1073: 1025it [00:02, 377.10it/s, env_step=1098752, len=28, n/ep=2, n/st=64, player_1/loss=204.800, player_2/loss=362.237, rew=851.00]


Epoch #1073: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1074: 1025it [00:02, 367.24it/s, env_step=1099776, len=26, n/ep=2, n/st=64, player_1/loss=140.840, player_2/loss=349.369, rew=729.00]


Epoch #1074: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1075: 1025it [00:02, 368.56it/s, env_step=1100800, len=27, n/ep=2, n/st=64, player_1/loss=99.180, player_2/loss=176.376, rew=784.00]


Epoch #1075: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1076: 1025it [00:02, 373.80it/s, env_step=1101824, len=14, n/ep=4, n/st=64, player_1/loss=166.999, player_2/loss=207.877, rew=237.00]


Epoch #1076: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1077: 1025it [00:02, 366.71it/s, env_step=1102848, len=15, n/ep=4, n/st=64, player_1/loss=183.218, player_2/loss=240.173, rew=240.00]


Epoch #1077: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1078: 1025it [00:02, 376.82it/s, env_step=1103872, len=25, n/ep=3, n/st=64, player_1/loss=77.283, player_2/loss=299.648, rew=696.67]


Epoch #1078: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1079: 1025it [00:02, 376.68it/s, env_step=1104896, len=18, n/ep=3, n/st=64, player_1/loss=106.908, player_2/loss=226.658, rew=368.00]


Epoch #1079: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1080: 1025it [00:02, 375.30it/s, env_step=1105920, len=18, n/ep=4, n/st=64, player_1/loss=103.051, player_2/loss=83.341, rew=382.50]


Epoch #1080: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1081: 1025it [00:02, 378.21it/s, env_step=1106944, len=27, n/ep=2, n/st=64, player_1/loss=149.198, player_2/loss=176.209, rew=782.00]


Epoch #1081: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1082: 1025it [00:02, 378.07it/s, env_step=1107968, len=29, n/ep=2, n/st=64, player_1/loss=172.090, player_2/loss=198.096, rew=877.00]


Epoch #1082: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1083: 1025it [00:02, 377.65it/s, env_step=1108992, len=25, n/ep=3, n/st=64, player_1/loss=99.907, player_2/loss=154.083, rew=686.67]


Epoch #1083: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1084: 1025it [00:02, 375.58it/s, env_step=1110016, len=29, n/ep=2, n/st=64, player_1/loss=151.066, player_2/loss=124.818, rew=893.00]


Epoch #1084: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1085: 1025it [00:02, 380.60it/s, env_step=1111040, len=21, n/ep=3, n/st=64, player_1/loss=180.447, player_2/loss=238.209, rew=462.67]


Epoch #1085: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1086: 1025it [00:02, 378.35it/s, env_step=1112064, len=27, n/ep=2, n/st=64, player_1/loss=239.020, player_2/loss=281.592, rew=802.00]


Epoch #1086: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1087: 1025it [00:02, 378.77it/s, env_step=1113088, len=8, n/ep=7, n/st=64, player_1/loss=220.273, player_2/loss=191.207, rew=85.14]


Epoch #1087: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1088: 1025it [00:02, 374.75it/s, env_step=1114112, len=19, n/ep=3, n/st=64, player_1/loss=261.244, player_2/loss=256.062, rew=386.00]


Epoch #1088: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1089: 1025it [00:02, 378.21it/s, env_step=1115136, len=28, n/ep=3, n/st=64, player_1/loss=327.251, player_2/loss=246.846, rew=877.33]


Epoch #1089: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1090: 1025it [00:02, 377.65it/s, env_step=1116160, len=19, n/ep=4, n/st=64, player_1/loss=442.148, player_2/loss=118.114, rew=456.50]


Epoch #1090: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1091: 1025it [00:02, 378.63it/s, env_step=1117184, len=22, n/ep=3, n/st=64, player_1/loss=258.844, player_2/loss=180.251, rew=520.67]


Epoch #1091: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1092: 1025it [00:02, 374.48it/s, env_step=1118208, len=17, n/ep=3, n/st=64, player_1/loss=204.345, player_2/loss=102.198, rew=336.00]


Epoch #1092: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1093: 1025it [00:02, 377.10it/s, env_step=1119232, len=20, n/ep=4, n/st=64, player_1/loss=213.007, player_2/loss=42.130, rew=461.00]


Epoch #1093: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1094: 1025it [00:02, 373.53it/s, env_step=1120256, len=10, n/ep=6, n/st=64, player_1/loss=355.663, player_2/loss=26.429, rew=154.67]


Epoch #1094: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1095: 1025it [00:02, 374.48it/s, env_step=1121280, len=18, n/ep=3, n/st=64, player_1/loss=533.065, player_2/loss=237.228, rew=354.00]


Epoch #1095: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1096: 1025it [00:02, 378.63it/s, env_step=1122304, len=21, n/ep=3, n/st=64, player_1/loss=351.540, player_2/loss=275.962, rew=476.00]


Epoch #1096: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1097: 1025it [00:02, 376.55it/s, env_step=1123328, len=20, n/ep=3, n/st=64, player_1/loss=335.103, player_2/loss=287.688, rew=418.67]


Epoch #1097: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1098: 1025it [00:02, 373.12it/s, env_step=1124352, len=17, n/ep=4, n/st=64, player_1/loss=449.696, player_2/loss=106.007, rew=357.00]


Epoch #1098: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1099: 1025it [00:02, 376.13it/s, env_step=1125376, len=29, n/ep=2, n/st=64, player_1/loss=201.001, player_2/loss=360.804, rew=949.00]


Epoch #1099: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1100: 1025it [00:02, 376.55it/s, env_step=1126400, len=27, n/ep=2, n/st=64, player_1/loss=134.747, player_2/loss=537.826, rew=758.00]


Epoch #1100: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1101: 1025it [00:02, 376.55it/s, env_step=1127424, len=32, n/ep=2, n/st=64, player_1/loss=116.458, player_2/loss=606.683, rew=1079.00]


Epoch #1101: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1102: 1025it [00:02, 375.99it/s, env_step=1128448, len=32, n/ep=2, n/st=64, player_1/loss=151.104, player_2/loss=507.824, rew=1079.00]


Epoch #1102: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1103: 1025it [00:02, 375.58it/s, env_step=1129472, len=21, n/ep=3, n/st=64, player_1/loss=127.938, player_2/loss=181.770, rew=492.00]


Epoch #1103: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1104: 1025it [00:02, 378.49it/s, env_step=1130496, len=21, n/ep=3, n/st=64, player_1/loss=209.547, player_2/loss=179.641, rew=490.00]


Epoch #1104: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1105: 1025it [00:02, 375.44it/s, env_step=1131520, len=21, n/ep=3, n/st=64, player_1/loss=332.394, player_2/loss=144.638, rew=466.00]


Epoch #1105: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1106: 1025it [00:02, 378.49it/s, env_step=1132544, len=21, n/ep=2, n/st=64, player_1/loss=346.731, player_2/loss=37.870, rew=485.00]


Epoch #1106: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1107: 1025it [00:02, 375.17it/s, env_step=1133568, len=21, n/ep=3, n/st=64, player_2/loss=258.125, rew=510.67]  


Epoch #1107: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1108: 1025it [00:02, 377.24it/s, env_step=1134592, len=17, n/ep=3, n/st=64, player_1/loss=410.778, player_2/loss=426.933, rew=336.00]


Epoch #1108: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1109: 1025it [00:02, 374.35it/s, env_step=1135616, len=28, n/ep=2, n/st=64, player_1/loss=676.076, player_2/loss=269.060, rew=859.00]


Epoch #1109: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1110: 1025it [00:02, 376.27it/s, env_step=1136640, len=31, n/ep=3, n/st=64, player_1/loss=332.357, player_2/loss=486.817, rew=1022.67]


Epoch #1110: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1111: 1025it [00:02, 373.39it/s, env_step=1137664, len=34, n/ep=2, n/st=64, player_1/loss=66.383, player_2/loss=576.423, rew=1225.00]


Epoch #1111: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1112: 1025it [00:02, 375.44it/s, env_step=1138688, len=37, n/ep=2, n/st=64, player_1/loss=35.406, player_2/loss=395.635, rew=1477.00]


Epoch #1112: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1113: 1025it [00:02, 377.38it/s, env_step=1139712, len=23, n/ep=2, n/st=64, player_1/loss=94.835, player_2/loss=275.578, rew=576.00]


Epoch #1113: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1114: 1025it [00:02, 377.37it/s, env_step=1140736, len=33, n/ep=2, n/st=64, player_1/loss=203.138, player_2/loss=337.463, rew=1124.00]


Epoch #1114: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1115: 1025it [00:02, 376.96it/s, env_step=1141760, len=35, n/ep=2, n/st=64, player_1/loss=405.774, player_2/loss=174.157, rew=1296.00]


Epoch #1115: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1116: 1025it [00:02, 378.35it/s, env_step=1142784, len=26, n/ep=3, n/st=64, player_1/loss=505.995, player_2/loss=192.093, rew=720.67]


Epoch #1116: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1117: 1025it [00:02, 378.07it/s, env_step=1143808, len=14, n/ep=5, n/st=64, player_1/loss=212.428, player_2/loss=283.714, rew=214.80]


Epoch #1117: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1118: 1025it [00:02, 380.03it/s, env_step=1144832, len=21, n/ep=4, n/st=64, player_1/loss=141.163, player_2/loss=170.484, rew=513.50]


Epoch #1118: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1119: 1025it [00:02, 375.72it/s, env_step=1145856, len=19, n/ep=4, n/st=64, player_1/loss=223.478, player_2/loss=119.324, rew=428.00]


Epoch #1119: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1120: 1025it [00:02, 381.45it/s, env_step=1146880, len=24, n/ep=3, n/st=64, player_1/loss=305.149, player_2/loss=226.769, rew=632.67]


Epoch #1120: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1121: 1025it [00:02, 375.72it/s, env_step=1147904, len=29, n/ep=3, n/st=64, player_1/loss=349.413, player_2/loss=308.715, rew=922.00]


Epoch #1121: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1122: 1025it [00:02, 378.21it/s, env_step=1148928, len=26, n/ep=2, n/st=64, player_1/loss=320.546, player_2/loss=302.903, rew=725.00]


Epoch #1122: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1123: 1025it [00:02, 376.82it/s, env_step=1149952, len=24, n/ep=3, n/st=64, player_1/loss=519.398, player_2/loss=258.309, rew=728.00]


Epoch #1123: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1124: 1025it [00:02, 376.96it/s, env_step=1150976, len=15, n/ep=4, n/st=64, player_1/loss=352.682, player_2/loss=327.395, rew=256.50]


Epoch #1124: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1125: 1025it [00:02, 377.65it/s, env_step=1152000, len=15, n/ep=4, n/st=64, player_1/loss=187.251, player_2/loss=327.379, rew=256.00]


Epoch #1125: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1126: 1025it [00:02, 376.13it/s, env_step=1153024, len=14, n/ep=4, n/st=64, player_1/loss=397.225, player_2/loss=267.953, rew=243.00]


Epoch #1126: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1127: 1025it [00:02, 373.80it/s, env_step=1154048, len=17, n/ep=3, n/st=64, player_1/loss=302.307, player_2/loss=183.318, rew=304.00]


Epoch #1127: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1128: 1025it [00:02, 377.52it/s, env_step=1155072, len=16, n/ep=4, n/st=64, player_1/loss=124.261, player_2/loss=110.563, rew=283.50]


Epoch #1128: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1129: 1025it [00:02, 376.68it/s, env_step=1156096, len=24, n/ep=3, n/st=64, player_1/loss=191.528, player_2/loss=172.736, rew=676.00]


Epoch #1129: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1130: 1025it [00:02, 376.13it/s, env_step=1157120, len=34, n/ep=2, n/st=64, player_1/loss=215.081, player_2/loss=157.872, rew=1197.00]


Epoch #1130: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1131: 1025it [00:02, 377.24it/s, env_step=1158144, len=27, n/ep=2, n/st=64, player_1/loss=256.518, player_2/loss=40.192, rew=755.00]


Epoch #1131: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1132: 1025it [00:02, 376.82it/s, env_step=1159168, len=15, n/ep=4, n/st=64, player_1/loss=190.732, player_2/loss=112.543, rew=248.50]


Epoch #1132: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1133: 1025it [00:02, 378.07it/s, env_step=1160192, len=23, n/ep=2, n/st=64, player_1/loss=299.385, player_2/loss=134.939, rew=566.00]


Epoch #1133: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1134: 1025it [00:02, 376.55it/s, env_step=1161216, len=19, n/ep=3, n/st=64, player_1/loss=333.122, player_2/loss=715.909, rew=386.00]


Epoch #1134: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1135: 1025it [00:02, 375.44it/s, env_step=1162240, len=16, n/ep=5, n/st=64, player_1/loss=134.778, player_2/loss=951.922, rew=303.20]


Epoch #1135: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1136: 1025it [00:02, 377.24it/s, env_step=1163264, len=20, n/ep=4, n/st=64, player_1/loss=130.537, player_2/loss=512.861, rew=455.00]


Epoch #1136: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1137: 1025it [00:02, 375.58it/s, env_step=1164288, len=19, n/ep=4, n/st=64, player_1/loss=189.641, player_2/loss=410.287, rew=424.00]


Epoch #1137: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1138: 1025it [00:02, 374.07it/s, env_step=1165312, len=17, n/ep=4, n/st=64, player_1/loss=253.684, player_2/loss=302.252, rew=321.50]


Epoch #1138: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1139: 1025it [00:02, 379.05it/s, env_step=1166336, len=15, n/ep=4, n/st=64, player_1/loss=206.227, player_2/loss=369.402, rew=247.00]


Epoch #1139: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1140: 1025it [00:02, 376.82it/s, env_step=1167360, len=22, n/ep=3, n/st=64, player_1/loss=130.088, player_2/loss=322.869, rew=508.67]


Epoch #1140: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1141: 1025it [00:02, 375.99it/s, env_step=1168384, len=21, n/ep=2, n/st=64, player_1/loss=168.866, player_2/loss=272.384, rew=482.00]


Epoch #1141: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1142: 1025it [00:02, 375.17it/s, env_step=1169408, len=31, n/ep=2, n/st=64, player_1/loss=135.571, player_2/loss=307.257, rew=990.00]


Epoch #1142: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1143: 1025it [00:02, 377.38it/s, env_step=1170432, len=17, n/ep=4, n/st=64, player_1/loss=129.121, player_2/loss=257.236, rew=351.50]


Epoch #1143: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1144: 1025it [00:02, 375.85it/s, env_step=1171456, len=27, n/ep=3, n/st=64, player_1/loss=143.026, player_2/loss=159.273, rew=810.00]


Epoch #1144: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1145: 1025it [00:02, 376.13it/s, env_step=1172480, len=16, n/ep=3, n/st=64, player_1/loss=360.837, player_2/loss=335.216, rew=278.67]


Epoch #1145: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1146: 1025it [00:02, 376.27it/s, env_step=1173504, len=19, n/ep=4, n/st=64, player_1/loss=354.340, player_2/loss=426.308, rew=396.50]


Epoch #1146: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1147: 1025it [00:02, 379.33it/s, env_step=1174528, len=22, n/ep=3, n/st=64, player_1/loss=32.389, player_2/loss=539.163, rew=590.00]


Epoch #1147: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1148: 1025it [00:02, 375.99it/s, env_step=1175552, len=14, n/ep=5, n/st=64, player_1/loss=224.359, player_2/loss=538.328, rew=227.60]


Epoch #1148: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1149: 1025it [00:02, 375.99it/s, env_step=1176576, len=16, n/ep=4, n/st=64, player_1/loss=361.382, player_2/loss=428.508, rew=273.00]


Epoch #1149: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1150: 1025it [00:02, 377.79it/s, env_step=1177600, len=19, n/ep=4, n/st=64, player_1/loss=214.463, player_2/loss=229.523, rew=436.50]


Epoch #1150: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1151: 1025it [00:02, 375.72it/s, env_step=1178624, len=19, n/ep=3, n/st=64, player_1/loss=185.563, player_2/loss=237.828, rew=386.67]


Epoch #1151: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1152: 1025it [00:02, 372.31it/s, env_step=1179648, len=22, n/ep=2, n/st=64, player_1/loss=266.373, player_2/loss=118.299, rew=539.00]


Epoch #1152: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1153: 1025it [00:02, 375.99it/s, env_step=1180672, len=14, n/ep=5, n/st=64, player_1/loss=141.908, player_2/loss=324.046, rew=234.40]


Epoch #1153: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1154: 1025it [00:02, 375.72it/s, env_step=1181696, len=25, n/ep=3, n/st=64, player_1/loss=177.184, player_2/loss=348.673, rew=686.00]


Epoch #1154: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1155: 1025it [00:02, 375.86it/s, env_step=1182720, len=17, n/ep=5, n/st=64, player_1/loss=291.760, player_2/loss=287.654, rew=354.40]


Epoch #1155: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1156: 1025it [00:02, 371.77it/s, env_step=1183744, len=15, n/ep=4, n/st=64, player_1/loss=322.042, player_2/loss=161.550, rew=240.00]


Epoch #1156: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1157: 1025it [00:02, 377.24it/s, env_step=1184768, len=22, n/ep=4, n/st=64, player_1/loss=248.639, player_2/loss=231.625, rew=537.00]


Epoch #1157: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1158: 1025it [00:02, 377.10it/s, env_step=1185792, len=22, n/ep=3, n/st=64, player_1/loss=125.022, player_2/loss=363.017, rew=548.67]


Epoch #1158: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1159: 1025it [00:02, 376.27it/s, env_step=1186816, len=27, n/ep=1, n/st=64, player_1/loss=166.601, player_2/loss=358.863, rew=754.00]


Epoch #1159: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1160: 1025it [00:02, 374.48it/s, env_step=1187840, len=25, n/ep=3, n/st=64, player_1/loss=122.214, player_2/loss=272.038, rew=700.00]


Epoch #1160: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1161: 1025it [00:02, 377.10it/s, env_step=1188864, len=18, n/ep=3, n/st=64, player_1/loss=144.671, player_2/loss=141.918, rew=344.67]


Epoch #1161: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1162: 1025it [00:02, 378.21it/s, env_step=1189888, len=23, n/ep=2, n/st=64, player_1/loss=304.342, player_2/loss=110.180, rew=550.00]


Epoch #1162: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1163: 1025it [00:02, 376.13it/s, env_step=1190912, len=21, n/ep=3, n/st=64, player_1/loss=388.575, player_2/loss=248.709, rew=525.33]


Epoch #1163: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1164: 1025it [00:02, 376.27it/s, env_step=1191936, len=17, n/ep=4, n/st=64, player_1/loss=430.430, rew=307.00]  


Epoch #1164: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1165: 1025it [00:02, 380.18it/s, env_step=1192960, len=15, n/ep=3, n/st=64, player_1/loss=508.625, player_2/loss=46.808, rew=260.00]


Epoch #1165: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1166: 1025it [00:02, 375.72it/s, env_step=1193984, len=19, n/ep=4, n/st=64, player_1/loss=403.942, player_2/loss=203.852, rew=414.00]


Epoch #1166: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1167: 1025it [00:02, 375.17it/s, env_step=1195008, len=28, n/ep=3, n/st=64, player_1/loss=265.954, player_2/loss=325.194, rew=892.00]


Epoch #1167: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1168: 1025it [00:02, 379.05it/s, env_step=1196032, len=19, n/ep=3, n/st=64, player_1/loss=233.889, player_2/loss=303.117, rew=522.67]


Epoch #1168: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1169: 1025it [00:02, 375.30it/s, env_step=1197056, len=30, n/ep=2, n/st=64, player_1/loss=413.620, player_2/loss=272.109, rew=928.00]


Epoch #1169: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1170: 1025it [00:02, 374.76it/s, env_step=1198080, len=17, n/ep=4, n/st=64, player_1/loss=318.415, player_2/loss=219.926, rew=334.00]


Epoch #1170: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1171: 1025it [00:02, 375.86it/s, env_step=1199104, len=29, n/ep=2, n/st=64, player_1/loss=219.978, player_2/loss=219.002, rew=954.00]


Epoch #1171: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1172: 1025it [00:02, 377.93it/s, env_step=1200128, len=21, n/ep=3, n/st=64, player_1/loss=306.543, player_2/loss=425.850, rew=484.00]


Epoch #1172: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1173: 1025it [00:02, 373.26it/s, env_step=1201152, len=22, n/ep=3, n/st=64, player_1/loss=364.290, player_2/loss=501.093, rew=548.00]


Epoch #1173: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1174: 1025it [00:02, 376.13it/s, env_step=1202176, len=13, n/ep=6, n/st=64, player_1/loss=278.509, player_2/loss=360.722, rew=224.00]


Epoch #1174: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1175: 1025it [00:02, 376.13it/s, env_step=1203200, len=14, n/ep=4, n/st=64, player_1/loss=335.331, player_2/loss=246.080, rew=218.00]


Epoch #1175: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1176: 1025it [00:02, 378.49it/s, env_step=1204224, len=14, n/ep=5, n/st=64, player_1/loss=279.969, player_2/loss=226.063, rew=224.80]


Epoch #1176: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1177: 1025it [00:02, 375.99it/s, env_step=1205248, len=18, n/ep=4, n/st=64, player_1/loss=229.880, player_2/loss=306.804, rew=373.50]


Epoch #1177: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1178: 1025it [00:02, 377.65it/s, env_step=1206272, len=18, n/ep=4, n/st=64, player_1/loss=156.735, player_2/loss=227.064, rew=367.00]


Epoch #1178: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1179: 1025it [00:02, 378.21it/s, env_step=1207296, len=16, n/ep=4, n/st=64, player_1/loss=202.972, player_2/loss=196.443, rew=297.50]


Epoch #1179: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1180: 1025it [00:02, 357.76it/s, env_step=1208320, len=16, n/ep=4, n/st=64, player_1/loss=257.842, player_2/loss=124.188, rew=291.50]


Epoch #1180: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1181: 1025it [00:02, 374.89it/s, env_step=1209344, len=15, n/ep=5, n/st=64, player_1/loss=259.510, player_2/loss=30.421, rew=257.60]


Epoch #1181: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1182: 1025it [00:02, 376.82it/s, env_step=1210368, len=15, n/ep=5, n/st=64, player_1/loss=190.288, player_2/loss=30.291, rew=241.20]


Epoch #1182: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1183: 1025it [00:02, 377.10it/s, env_step=1211392, len=21, n/ep=3, n/st=64, player_1/loss=200.277, rew=528.00]  


Epoch #1183: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1184: 1025it [00:02, 376.27it/s, env_step=1212416, len=15, n/ep=4, n/st=64, player_1/loss=241.203, player_2/loss=54.108, rew=254.50]


Epoch #1184: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1185: 1025it [00:02, 375.44it/s, env_step=1213440, len=15, n/ep=4, n/st=64, player_1/loss=169.444, player_2/loss=138.523, rew=256.50]


Epoch #1185: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1186: 1025it [00:02, 364.89it/s, env_step=1214464, len=23, n/ep=3, n/st=64, player_1/loss=176.011, player_2/loss=157.005, rew=590.67]


Epoch #1186: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1187: 1025it [00:02, 362.18it/s, env_step=1215488, len=24, n/ep=2, n/st=64, player_1/loss=234.545, player_2/loss=110.642, rew=665.00]


Epoch #1187: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1188: 1025it [00:02, 375.58it/s, env_step=1216512, len=21, n/ep=3, n/st=64, player_1/loss=386.362, player_2/loss=66.679, rew=582.67]


Epoch #1188: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1189: 1025it [00:02, 371.09it/s, env_step=1217536, len=16, n/ep=4, n/st=64, player_1/loss=427.184, player_2/loss=176.062, rew=271.50]


Epoch #1189: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1190: 1025it [00:02, 377.79it/s, env_step=1218560, len=16, n/ep=4, n/st=64, player_1/loss=272.561, player_2/loss=296.775, rew=301.50]


Epoch #1190: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1191: 1025it [00:02, 372.98it/s, env_step=1219584, len=15, n/ep=4, n/st=64, player_1/loss=196.741, player_2/loss=220.884, rew=245.50]


Epoch #1191: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1192: 1025it [00:02, 376.96it/s, env_step=1220608, len=30, n/ep=2, n/st=64, player_1/loss=192.124, player_2/loss=144.818, rew=1015.00]


Epoch #1192: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1193: 1025it [00:02, 381.73it/s, env_step=1221632, len=21, n/ep=3, n/st=64, player_1/loss=238.183, player_2/loss=146.678, rew=470.67]


Epoch #1193: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1194: 1025it [00:02, 374.21it/s, env_step=1222656, len=26, n/ep=3, n/st=64, player_1/loss=390.410, player_2/loss=87.148, rew=719.33]


Epoch #1194: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1195: 1025it [00:02, 377.79it/s, env_step=1223680, len=32, n/ep=2, n/st=64, player_1/loss=354.101, player_2/loss=126.694, rew=1143.00]


Epoch #1195: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1196: 1025it [00:02, 375.17it/s, env_step=1224704, len=24, n/ep=2, n/st=64, player_1/loss=152.965, player_2/loss=155.503, rew=599.00]


Epoch #1196: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1197: 1025it [00:02, 375.17it/s, env_step=1225728, len=13, n/ep=6, n/st=64, player_1/loss=126.545, player_2/loss=423.003, rew=328.33]


Epoch #1197: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1198: 1025it [00:02, 371.76it/s, env_step=1226752, len=17, n/ep=5, n/st=64, player_1/loss=139.822, player_2/loss=539.570, rew=395.60]


Epoch #1198: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1199: 1025it [00:02, 378.07it/s, env_step=1227776, len=27, n/ep=2, n/st=64, player_1/loss=424.255, player_2/loss=537.270, rew=763.00]


Epoch #1199: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1200: 1025it [00:02, 376.55it/s, env_step=1228800, len=18, n/ep=3, n/st=64, player_1/loss=454.277, player_2/loss=405.780, rew=369.33]


Epoch #1200: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1201: 1025it [00:02, 377.52it/s, env_step=1229824, len=29, n/ep=2, n/st=64, player_1/loss=339.713, player_2/loss=373.668, rew=877.00]


Epoch #1201: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1202: 1025it [00:02, 377.10it/s, env_step=1230848, len=33, n/ep=2, n/st=64, player_1/loss=224.568, player_2/loss=484.803, rew=1156.00]


Epoch #1202: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1203: 1025it [00:02, 372.44it/s, env_step=1231872, len=22, n/ep=3, n/st=64, player_1/loss=212.731, player_2/loss=308.450, rew=538.67]


Epoch #1203: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1204: 1025it [00:02, 377.65it/s, env_step=1232896, len=16, n/ep=3, n/st=64, player_1/loss=249.529, player_2/loss=324.539, rew=349.33]


Epoch #1204: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1205: 1025it [00:02, 377.79it/s, env_step=1233920, len=32, n/ep=2, n/st=64, player_1/loss=185.090, player_2/loss=374.766, rew=1079.00]


Epoch #1205: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1206: 1025it [00:02, 376.96it/s, env_step=1234944, len=33, n/ep=2, n/st=64, player_1/loss=224.849, player_2/loss=152.838, rew=1120.00]


Epoch #1206: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1207: 1025it [00:02, 378.07it/s, env_step=1235968, len=31, n/ep=2, n/st=64, player_1/loss=189.237, player_2/loss=270.872, rew=1024.00]


Epoch #1207: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1208: 1025it [00:02, 377.10it/s, env_step=1236992, len=28, n/ep=2, n/st=64, player_1/loss=296.174, player_2/loss=304.905, rew=881.00]


Epoch #1208: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1209: 1025it [00:02, 373.66it/s, env_step=1238016, len=24, n/ep=3, n/st=64, player_1/loss=254.876, player_2/loss=193.954, rew=631.33]


Epoch #1209: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1210: 1025it [00:02, 376.68it/s, env_step=1239040, len=21, n/ep=3, n/st=64, player_1/loss=352.224, player_2/loss=175.494, rew=493.33]


Epoch #1210: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1211: 1025it [00:02, 374.21it/s, env_step=1240064, len=15, n/ep=4, n/st=64, player_1/loss=419.016, player_2/loss=98.511, rew=266.50]


Epoch #1211: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1212: 1025it [00:02, 378.77it/s, env_step=1241088, len=19, n/ep=3, n/st=64, player_1/loss=314.422, player_2/loss=215.732, rew=384.00]


Epoch #1212: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1213: 1025it [00:02, 377.52it/s, env_step=1242112, len=28, n/ep=3, n/st=64, player_1/loss=357.682, player_2/loss=340.909, rew=838.67]


Epoch #1213: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1214: 1025it [00:02, 377.65it/s, env_step=1243136, len=34, n/ep=2, n/st=64, player_1/loss=273.073, player_2/loss=210.553, rew=1189.00]


Epoch #1214: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1215: 1025it [00:02, 376.82it/s, env_step=1244160, len=21, n/ep=3, n/st=64, player_1/loss=367.020, player_2/loss=95.517, rew=565.33]


Epoch #1215: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1216: 1025it [00:02, 378.77it/s, env_step=1245184, len=15, n/ep=4, n/st=64, player_1/loss=350.280, player_2/loss=112.799, rew=264.00]


Epoch #1216: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1217: 1025it [00:02, 375.44it/s, env_step=1246208, len=16, n/ep=4, n/st=64, player_1/loss=125.648, player_2/loss=277.014, rew=286.50]


Epoch #1217: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1218: 1025it [00:02, 376.41it/s, env_step=1247232, len=14, n/ep=5, n/st=64, player_1/loss=119.889, player_2/loss=361.864, rew=227.60]


Epoch #1218: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1219: 1025it [00:02, 377.65it/s, env_step=1248256, len=25, n/ep=2, n/st=64, player_1/loss=181.585, player_2/loss=185.390, rew=674.00]


Epoch #1219: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1220: 1025it [00:02, 378.49it/s, env_step=1249280, len=9, n/ep=6, n/st=64, player_1/loss=233.713, player_2/loss=349.835, rew=110.33]


Epoch #1220: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1221: 1025it [00:02, 376.41it/s, env_step=1250304, len=10, n/ep=6, n/st=64, player_1/loss=218.057, player_2/loss=510.442, rew=121.33]


Epoch #1221: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1222: 1025it [00:02, 373.80it/s, env_step=1251328, len=15, n/ep=4, n/st=64, player_1/loss=224.265, player_2/loss=441.576, rew=238.50]


Epoch #1222: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1223: 1025it [00:02, 373.53it/s, env_step=1252352, len=14, n/ep=5, n/st=64, player_1/loss=177.732, player_2/loss=106.036, rew=380.00]


Epoch #1223: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1224: 1025it [00:02, 379.61it/s, env_step=1253376, len=11, n/ep=6, n/st=64, player_1/loss=83.541, player_2/loss=211.814, rew=163.67]


Epoch #1224: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1225: 1025it [00:02, 372.85it/s, env_step=1254400, len=19, n/ep=3, n/st=64, player_1/loss=181.501, rew=394.67]  


Epoch #1225: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1226: 1025it [00:02, 375.17it/s, env_step=1255424, len=25, n/ep=2, n/st=64, player_1/loss=297.624, player_2/loss=75.191, rew=676.00]


Epoch #1226: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1227: 1025it [00:02, 376.27it/s, env_step=1256448, len=21, n/ep=3, n/st=64, player_1/loss=262.224, player_2/loss=92.630, rew=490.67]


Epoch #1227: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1228: 1025it [00:02, 376.82it/s, env_step=1257472, len=19, n/ep=3, n/st=64, player_1/loss=260.952, player_2/loss=172.693, rew=402.67]


Epoch #1228: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1229: 1025it [00:02, 375.86it/s, env_step=1258496, len=24, n/ep=3, n/st=64, player_1/loss=131.421, player_2/loss=310.701, rew=711.33]


Epoch #1229: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1230: 1025it [00:02, 379.19it/s, env_step=1259520, len=17, n/ep=3, n/st=64, player_1/loss=57.998, player_2/loss=300.003, rew=398.00]


Epoch #1230: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1231: 1025it [00:02, 375.85it/s, env_step=1260544, len=16, n/ep=3, n/st=64, player_1/loss=103.428, player_2/loss=128.205, rew=270.67]


Epoch #1231: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1232: 1025it [00:02, 375.58it/s, env_step=1261568, len=20, n/ep=4, n/st=64, player_1/loss=161.683, player_2/loss=242.787, rew=418.50]


Epoch #1232: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1233: 1025it [00:02, 375.86it/s, env_step=1262592, len=15, n/ep=4, n/st=64, player_1/loss=149.720, player_2/loss=261.212, rew=255.00]


Epoch #1233: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1234: 1025it [00:02, 375.58it/s, env_step=1263616, len=21, n/ep=3, n/st=64, player_1/loss=224.252, player_2/loss=279.149, rew=492.67]


Epoch #1234: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1235: 1025it [00:02, 372.98it/s, env_step=1264640, len=18, n/ep=3, n/st=64, player_1/loss=192.429, player_2/loss=223.250, rew=396.00]


Epoch #1235: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1236: 1025it [00:02, 373.66it/s, env_step=1265664, len=21, n/ep=2, n/st=64, player_1/loss=133.252, player_2/loss=149.061, rew=482.00]


Epoch #1236: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1237: 1025it [00:02, 377.79it/s, env_step=1266688, len=20, n/ep=3, n/st=64, player_1/loss=218.798, player_2/loss=161.538, rew=456.67]


Epoch #1237: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1238: 1025it [00:02, 375.44it/s, env_step=1267712, len=15, n/ep=4, n/st=64, player_1/loss=234.500, player_2/loss=168.069, rew=260.50]


Epoch #1238: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1239: 1025it [00:02, 375.03it/s, env_step=1268736, len=11, n/ep=6, n/st=64, player_1/loss=108.249, player_2/loss=262.843, rew=140.00]


Epoch #1239: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1240: 1025it [00:02, 372.58it/s, env_step=1269760, len=7, n/ep=8, n/st=64, player_1/loss=151.544, player_2/loss=228.218, rew=67.75]


Epoch #1240: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1241: 1025it [00:02, 376.27it/s, env_step=1270784, len=15, n/ep=5, n/st=64, player_1/loss=321.351, player_2/loss=126.619, rew=315.60]


Epoch #1241: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1242: 1025it [00:02, 374.21it/s, env_step=1271808, len=23, n/ep=3, n/st=64, player_1/loss=435.780, player_2/loss=248.903, rew=558.67]


Epoch #1242: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1243: 1025it [00:02, 374.89it/s, env_step=1272832, len=15, n/ep=4, n/st=64, player_2/loss=296.496, rew=251.00]  


Epoch #1243: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1244: 1025it [00:02, 378.63it/s, env_step=1273856, len=18, n/ep=3, n/st=64, player_1/loss=235.531, player_2/loss=215.634, rew=380.00]


Epoch #1244: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1245: 1025it [00:02, 374.35it/s, env_step=1274880, len=16, n/ep=4, n/st=64, player_2/loss=156.787, rew=272.50]  


Epoch #1245: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1246: 1025it [00:02, 376.82it/s, env_step=1275904, len=28, n/ep=3, n/st=64, player_1/loss=223.543, player_2/loss=218.093, rew=812.00]


Epoch #1246: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1247: 1025it [00:02, 378.07it/s, env_step=1276928, len=25, n/ep=3, n/st=64, player_1/loss=238.694, player_2/loss=181.882, rew=686.00]


Epoch #1247: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1248: 1025it [00:02, 377.10it/s, env_step=1277952, len=24, n/ep=3, n/st=64, player_1/loss=309.945, player_2/loss=108.544, rew=638.67]


Epoch #1248: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1249: 1025it [00:02, 377.24it/s, env_step=1278976, len=22, n/ep=2, n/st=64, player_1/loss=344.683, player_2/loss=172.892, rew=505.00]


Epoch #1249: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1250: 1025it [00:02, 375.86it/s, env_step=1280000, len=24, n/ep=3, n/st=64, player_1/loss=515.158, player_2/loss=199.343, rew=642.00]


Epoch #1250: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1251: 1025it [00:02, 376.41it/s, env_step=1281024, len=29, n/ep=2, n/st=64, player_1/loss=533.870, player_2/loss=117.862, rew=900.00]


Epoch #1251: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1252: 1025it [00:02, 375.86it/s, env_step=1282048, len=15, n/ep=4, n/st=64, player_1/loss=269.710, player_2/loss=125.868, rew=264.50]


Epoch #1252: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1253: 1025it [00:02, 375.44it/s, env_step=1283072, len=15, n/ep=4, n/st=64, player_1/loss=296.468, player_2/loss=89.142, rew=263.50]


Epoch #1253: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1254: 1025it [00:02, 376.41it/s, env_step=1284096, len=21, n/ep=3, n/st=64, player_1/loss=258.181, player_2/loss=65.286, rew=495.33]


Epoch #1254: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1255: 1025it [00:02, 377.52it/s, env_step=1285120, len=20, n/ep=3, n/st=64, player_1/loss=306.769, player_2/loss=71.787, rew=433.33]


Epoch #1255: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1256: 1025it [00:02, 377.10it/s, env_step=1286144, len=22, n/ep=3, n/st=64, player_1/loss=289.218, player_2/loss=107.680, rew=549.33]


Epoch #1256: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1257: 1025it [00:02, 375.03it/s, env_step=1287168, len=33, n/ep=2, n/st=64, player_1/loss=191.175, player_2/loss=234.413, rew=1154.00]


Epoch #1257: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1258: 1025it [00:02, 377.38it/s, env_step=1288192, len=33, n/ep=2, n/st=64, player_1/loss=110.893, player_2/loss=201.521, rew=1216.00]


Epoch #1258: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1259: 1025it [00:02, 376.68it/s, env_step=1289216, len=35, n/ep=2, n/st=64, player_1/loss=128.338, player_2/loss=270.106, rew=1267.00]


Epoch #1259: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1260: 1025it [00:02, 376.96it/s, env_step=1290240, len=22, n/ep=3, n/st=64, player_1/loss=133.592, player_2/loss=308.694, rew=549.33]


Epoch #1260: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1261: 1025it [00:02, 376.82it/s, env_step=1291264, len=20, n/ep=3, n/st=64, player_1/loss=286.919, player_2/loss=84.353, rew=447.33]


Epoch #1261: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1262: 1025it [00:02, 374.35it/s, env_step=1292288, len=34, n/ep=2, n/st=64, player_1/loss=354.105, player_2/loss=60.209, rew=1225.00]


Epoch #1262: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1263: 1025it [00:02, 377.52it/s, env_step=1293312, len=21, n/ep=3, n/st=64, player_1/loss=169.883, player_2/loss=57.647, rew=477.33]


Epoch #1263: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1264: 1025it [00:02, 373.39it/s, env_step=1294336, len=26, n/ep=3, n/st=64, player_1/loss=292.655, player_2/loss=70.200, rew=777.33]


Epoch #1264: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1265: 1025it [00:02, 374.62it/s, env_step=1295360, len=27, n/ep=2, n/st=64, player_1/loss=411.903, player_2/loss=145.049, rew=770.00]


Epoch #1265: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1266: 1025it [00:02, 372.44it/s, env_step=1296384, len=17, n/ep=3, n/st=64, player_1/loss=328.897, player_2/loss=111.753, rew=329.33]


Epoch #1266: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1267: 1025it [00:02, 376.13it/s, env_step=1297408, len=32, n/ep=3, n/st=64, player_1/loss=328.501, player_2/loss=240.660, rew=1129.33]


Epoch #1267: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1268: 1025it [00:02, 374.75it/s, env_step=1298432, len=30, n/ep=2, n/st=64, player_1/loss=366.440, player_2/loss=331.095, rew=1049.00]


Epoch #1268: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1269: 1025it [00:02, 373.94it/s, env_step=1299456, len=21, n/ep=3, n/st=64, player_1/loss=467.195, player_2/loss=234.785, rew=462.67]


Epoch #1269: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1270: 1025it [00:02, 377.65it/s, env_step=1300480, len=19, n/ep=3, n/st=64, player_1/loss=440.152, player_2/loss=248.739, rew=378.00]


Epoch #1270: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1271: 1025it [00:02, 377.24it/s, env_step=1301504, len=22, n/ep=3, n/st=64, player_1/loss=270.783, player_2/loss=222.496, rew=520.00]


Epoch #1271: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1272: 1025it [00:02, 375.86it/s, env_step=1302528, len=22, n/ep=2, n/st=64, player_1/loss=233.782, player_2/loss=115.094, rew=625.00]


Epoch #1272: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1273: 1025it [00:02, 377.10it/s, env_step=1303552, len=28, n/ep=3, n/st=64, player_1/loss=211.094, player_2/loss=94.685, rew=851.33]


Epoch #1273: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1274: 1025it [00:02, 379.61it/s, env_step=1304576, len=30, n/ep=2, n/st=64, player_1/loss=421.713, player_2/loss=90.635, rew=929.00]


Epoch #1274: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1275: 1025it [00:02, 373.39it/s, env_step=1305600, len=30, n/ep=2, n/st=64, player_2/loss=223.036, rew=959.00]  


Epoch #1275: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1276: 1025it [00:02, 375.03it/s, env_step=1306624, len=23, n/ep=4, n/st=64, player_1/loss=326.970, player_2/loss=325.930, rew=761.00]


Epoch #1276: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1277: 1025it [00:02, 375.99it/s, env_step=1307648, len=29, n/ep=3, n/st=64, player_1/loss=289.672, player_2/loss=255.180, rew=942.67]


Epoch #1277: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1278: 1025it [00:02, 374.89it/s, env_step=1308672, len=25, n/ep=2, n/st=64, player_1/loss=250.064, player_2/loss=177.581, rew=664.00]


Epoch #1278: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1279: 1025it [00:02, 379.19it/s, env_step=1309696, len=22, n/ep=3, n/st=64, player_1/loss=274.725, player_2/loss=215.860, rew=538.00]


Epoch #1279: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1280: 1025it [00:02, 373.12it/s, env_step=1310720, len=21, n/ep=3, n/st=64, player_1/loss=210.963, player_2/loss=167.786, rew=464.67]


Epoch #1280: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1281: 1025it [00:02, 377.10it/s, env_step=1311744, len=30, n/ep=3, n/st=64, player_1/loss=289.174, player_2/loss=264.917, rew=960.67]


Epoch #1281: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1282: 1025it [00:02, 377.10it/s, env_step=1312768, len=30, n/ep=2, n/st=64, player_1/loss=237.287, player_2/loss=320.050, rew=1001.00]


Epoch #1282: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1283: 1025it [00:02, 377.51it/s, env_step=1313792, len=25, n/ep=3, n/st=64, player_1/loss=65.842, player_2/loss=132.453, rew=747.33]


Epoch #1283: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1284: 1025it [00:02, 370.42it/s, env_step=1314816, len=29, n/ep=2, n/st=64, player_1/loss=182.065, player_2/loss=150.697, rew=928.00]


Epoch #1284: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1285: 1025it [00:02, 377.65it/s, env_step=1315840, len=29, n/ep=3, n/st=64, player_1/loss=232.527, rew=977.33]  


Epoch #1285: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1286: 1025it [00:02, 374.62it/s, env_step=1316864, len=26, n/ep=2, n/st=64, player_1/loss=70.346, player_2/loss=245.210, rew=704.00]


Epoch #1286: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1287: 1025it [00:02, 379.61it/s, env_step=1317888, len=15, n/ep=4, n/st=64, player_1/loss=275.681, player_2/loss=277.435, rew=258.50]


Epoch #1287: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1288: 1025it [00:02, 367.37it/s, env_step=1318912, len=23, n/ep=3, n/st=64, player_1/loss=312.753, player_2/loss=362.999, rew=552.00]


Epoch #1288: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1289: 1025it [00:02, 375.58it/s, env_step=1319936, len=16, n/ep=4, n/st=64, player_1/loss=238.638, player_2/loss=285.012, rew=271.00]


Epoch #1289: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1290: 1025it [00:02, 374.07it/s, env_step=1320960, len=15, n/ep=4, n/st=64, player_1/loss=209.125, player_2/loss=359.516, rew=251.00]


Epoch #1290: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1291: 1025it [00:02, 374.62it/s, env_step=1321984, len=14, n/ep=4, n/st=64, player_1/loss=248.490, player_2/loss=392.540, rew=228.00]


Epoch #1291: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1292: 1025it [00:02, 376.68it/s, env_step=1323008, len=22, n/ep=3, n/st=64, player_1/loss=84.134, player_2/loss=350.675, rew=544.67]


Epoch #1292: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1293: 1025it [00:02, 374.35it/s, env_step=1324032, len=23, n/ep=3, n/st=64, player_1/loss=127.433, player_2/loss=255.387, rew=582.67]


Epoch #1293: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1294: 1025it [00:02, 371.63it/s, env_step=1325056, len=24, n/ep=3, n/st=64, player_1/loss=676.342, player_2/loss=257.105, rew=615.33]


Epoch #1294: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1295: 1025it [00:02, 375.99it/s, env_step=1326080, len=12, n/ep=5, n/st=64, player_1/loss=734.001, player_2/loss=508.946, rew=182.80]


Epoch #1295: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1296: 1025it [00:02, 377.38it/s, env_step=1327104, len=23, n/ep=2, n/st=64, player_1/loss=550.530, player_2/loss=386.997, rew=604.00]


Epoch #1296: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1297: 1025it [00:02, 375.58it/s, env_step=1328128, len=32, n/ep=2, n/st=64, player_1/loss=491.380, player_2/loss=85.041, rew=1169.00]


Epoch #1297: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1298: 1025it [00:02, 377.52it/s, env_step=1329152, len=30, n/ep=2, n/st=64, player_1/loss=395.215, player_2/loss=200.382, rew=1015.00]


Epoch #1298: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1299: 1025it [00:02, 375.86it/s, env_step=1330176, len=21, n/ep=3, n/st=64, player_1/loss=203.525, player_2/loss=234.945, rew=490.00]


Epoch #1299: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1300: 1025it [00:02, 375.86it/s, env_step=1331200, len=26, n/ep=2, n/st=64, player_1/loss=344.290, player_2/loss=423.968, rew=700.00]


Epoch #1300: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1301: 1025it [00:02, 376.55it/s, env_step=1332224, len=21, n/ep=3, n/st=64, player_1/loss=470.545, player_2/loss=466.550, rew=564.00]


Epoch #1301: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1302: 1025it [00:02, 377.10it/s, env_step=1333248, len=21, n/ep=3, n/st=64, player_1/loss=237.889, player_2/loss=601.060, rew=490.67]


Epoch #1302: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1303: 1025it [00:02, 379.19it/s, env_step=1334272, len=27, n/ep=2, n/st=64, player_1/loss=192.698, player_2/loss=373.350, rew=758.00]


Epoch #1303: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1304: 1025it [00:02, 373.80it/s, env_step=1335296, len=24, n/ep=3, n/st=64, player_1/loss=225.450, player_2/loss=488.033, rew=618.67]


Epoch #1304: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1305: 1025it [00:02, 373.94it/s, env_step=1336320, len=39, n/ep=1, n/st=64, player_1/loss=179.442, player_2/loss=581.447, rew=1558.00]


Epoch #1305: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1306: 1025it [00:02, 375.72it/s, env_step=1337344, len=21, n/ep=3, n/st=64, player_1/loss=313.482, player_2/loss=535.554, rew=490.00]


Epoch #1306: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1307: 1025it [00:02, 373.39it/s, env_step=1338368, len=14, n/ep=5, n/st=64, player_1/loss=480.495, player_2/loss=337.541, rew=295.60]


Epoch #1307: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1308: 1025it [00:02, 376.41it/s, env_step=1339392, len=33, n/ep=2, n/st=64, player_1/loss=329.367, player_2/loss=124.948, rew=1124.00]


Epoch #1308: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1309: 1025it [00:02, 372.85it/s, env_step=1340416, len=20, n/ep=4, n/st=64, player_1/loss=507.977, player_2/loss=218.572, rew=505.50]


Epoch #1309: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1310: 1025it [00:02, 370.29it/s, env_step=1341440, len=24, n/ep=3, n/st=64, player_1/loss=1141.117, player_2/loss=210.401, rew=604.00]


Epoch #1310: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1311: 1025it [00:02, 376.41it/s, env_step=1342464, len=34, n/ep=2, n/st=64, player_1/loss=868.000, player_2/loss=111.762, rew=1235.00]


Epoch #1311: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1312: 1025it [00:02, 372.85it/s, env_step=1343488, len=13, n/ep=4, n/st=64, player_1/loss=769.185, player_2/loss=535.190, rew=204.50]


Epoch #1312: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1313: 1025it [00:02, 375.17it/s, env_step=1344512, len=28, n/ep=3, n/st=64, player_1/loss=768.546, player_2/loss=584.254, rew=892.67]


Epoch #1313: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1314: 1025it [00:02, 375.44it/s, env_step=1345536, len=23, n/ep=3, n/st=64, player_1/loss=448.018, player_2/loss=595.084, rew=716.00]


Epoch #1314: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1315: 1025it [00:02, 377.93it/s, env_step=1346560, len=23, n/ep=3, n/st=64, player_1/loss=429.577, player_2/loss=722.085, rew=658.67]


Epoch #1315: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1316: 1025it [00:02, 378.07it/s, env_step=1347584, len=23, n/ep=3, n/st=64, player_1/loss=392.378, player_2/loss=251.626, rew=623.33]


Epoch #1316: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1317: 1025it [00:02, 372.85it/s, env_step=1348608, len=21, n/ep=3, n/st=64, player_1/loss=166.549, player_2/loss=173.123, rew=478.67]


Epoch #1317: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1318: 1025it [00:02, 375.17it/s, env_step=1349632, len=21, n/ep=3, n/st=64, player_1/loss=144.105, player_2/loss=142.914, rew=476.00]


Epoch #1318: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1319: 1025it [00:02, 379.05it/s, env_step=1350656, len=28, n/ep=2, n/st=64, player_1/loss=277.121, player_2/loss=632.874, rew=839.00]


Epoch #1319: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1320: 1025it [00:02, 371.09it/s, env_step=1351680, len=21, n/ep=3, n/st=64, player_1/loss=364.544, player_2/loss=585.829, rew=462.67]


Epoch #1320: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1321: 1025it [00:02, 375.58it/s, env_step=1352704, len=27, n/ep=2, n/st=64, player_1/loss=332.305, player_2/loss=129.274, rew=754.00]


Epoch #1321: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1322: 1025it [00:02, 373.94it/s, env_step=1353728, len=23, n/ep=3, n/st=64, player_1/loss=249.528, player_2/loss=159.156, rew=672.67]


Epoch #1322: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1323: 1025it [00:02, 374.76it/s, env_step=1354752, len=28, n/ep=2, n/st=64, player_1/loss=290.575, player_2/loss=229.089, rew=839.00]


Epoch #1323: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1324: 1025it [00:02, 376.13it/s, env_step=1355776, len=30, n/ep=2, n/st=64, player_1/loss=676.819, player_2/loss=371.720, rew=965.00]


Epoch #1324: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1325: 1025it [00:02, 377.24it/s, env_step=1356800, len=24, n/ep=2, n/st=64, player_1/loss=390.114, player_2/loss=198.889, rew=598.00]


Epoch #1325: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1326: 1025it [00:02, 376.13it/s, env_step=1357824, len=22, n/ep=3, n/st=64, player_1/loss=377.609, player_2/loss=290.786, rew=506.00]


Epoch #1326: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1327: 1025it [00:02, 371.63it/s, env_step=1358848, len=24, n/ep=2, n/st=64, player_1/loss=552.099, player_2/loss=343.023, rew=623.00]


Epoch #1327: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1328: 1025it [00:02, 376.82it/s, env_step=1359872, len=35, n/ep=2, n/st=64, player_1/loss=404.996, player_2/loss=402.025, rew=1267.00]


Epoch #1328: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1329: 1025it [00:02, 376.82it/s, env_step=1360896, len=28, n/ep=2, n/st=64, player_1/loss=43.590, player_2/loss=223.640, rew=911.00]


Epoch #1329: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1330: 1025it [00:02, 376.68it/s, env_step=1361920, len=20, n/ep=3, n/st=64, player_1/loss=193.216, player_2/loss=78.848, rew=428.67]


Epoch #1330: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1331: 1025it [00:02, 375.30it/s, env_step=1362944, len=30, n/ep=2, n/st=64, player_1/loss=198.470, player_2/loss=235.725, rew=961.00]


Epoch #1331: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1332: 1025it [00:02, 374.48it/s, env_step=1363968, len=25, n/ep=3, n/st=64, player_1/loss=231.929, player_2/loss=235.327, rew=704.00]


Epoch #1332: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1333: 1025it [00:02, 372.31it/s, env_step=1364992, len=17, n/ep=4, n/st=64, player_1/loss=723.940, player_2/loss=144.381, rew=330.00]


Epoch #1333: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1334: 1025it [00:02, 374.21it/s, env_step=1366016, len=23, n/ep=3, n/st=64, player_1/loss=802.062, player_2/loss=154.801, rew=674.00]


Epoch #1334: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1335: 1025it [00:02, 376.27it/s, env_step=1367040, len=21, n/ep=4, n/st=64, player_1/loss=484.720, player_2/loss=192.964, rew=514.00]


Epoch #1335: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1336: 1025it [00:02, 375.99it/s, env_step=1368064, len=14, n/ep=5, n/st=64, player_1/loss=248.774, player_2/loss=191.046, rew=214.80]


Epoch #1336: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1337: 1025it [00:02, 377.52it/s, env_step=1369088, len=15, n/ep=4, n/st=64, player_1/loss=144.504, player_2/loss=160.521, rew=272.50]


Epoch #1337: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1338: 1025it [00:02, 376.41it/s, env_step=1370112, len=19, n/ep=4, n/st=64, player_1/loss=66.386, player_2/loss=143.660, rew=414.00]


Epoch #1338: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1339: 1025it [00:02, 376.13it/s, env_step=1371136, len=15, n/ep=3, n/st=64, player_1/loss=572.779, player_2/loss=67.199, rew=252.67]


Epoch #1339: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1340: 1025it [00:02, 374.89it/s, env_step=1372160, len=23, n/ep=3, n/st=64, player_1/loss=553.383, player_2/loss=228.529, rew=572.00]


Epoch #1340: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1341: 1025it [00:02, 374.75it/s, env_step=1373184, len=20, n/ep=3, n/st=64, player_1/loss=55.247, player_2/loss=215.528, rew=433.33]


Epoch #1341: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1342: 1025it [00:02, 375.72it/s, env_step=1374208, len=21, n/ep=3, n/st=64, player_1/loss=91.389, player_2/loss=48.909, rew=490.00]


Epoch #1342: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1343: 1025it [00:02, 374.07it/s, env_step=1375232, len=18, n/ep=3, n/st=64, player_1/loss=251.481, player_2/loss=133.740, rew=366.00]


Epoch #1343: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1344: 1025it [00:02, 374.76it/s, env_step=1376256, len=21, n/ep=3, n/st=64, player_1/loss=329.543, player_2/loss=445.588, rew=462.00]


Epoch #1344: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1345: 1025it [00:02, 372.85it/s, env_step=1377280, len=32, n/ep=2, n/st=64, player_1/loss=405.286, player_2/loss=436.623, rew=1093.00]


Epoch #1345: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1346: 1025it [00:02, 375.03it/s, env_step=1378304, len=25, n/ep=3, n/st=64, player_1/loss=350.319, player_2/loss=111.529, rew=736.00]


Epoch #1346: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1347: 1025it [00:02, 374.21it/s, env_step=1379328, len=25, n/ep=3, n/st=64, player_1/loss=187.811, player_2/loss=53.894, rew=710.00]


Epoch #1347: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1348: 1025it [00:02, 375.86it/s, env_step=1380352, len=29, n/ep=2, n/st=64, player_1/loss=225.701, player_2/loss=47.928, rew=893.00]


Epoch #1348: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1349: 1025it [00:02, 374.21it/s, env_step=1381376, len=31, n/ep=1, n/st=64, player_1/loss=307.499, player_2/loss=438.755, rew=990.00]


Epoch #1349: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1350: 1025it [00:02, 377.10it/s, env_step=1382400, len=29, n/ep=2, n/st=64, player_1/loss=331.616, player_2/loss=800.451, rew=872.00]


Epoch #1350: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1351: 1025it [00:02, 375.58it/s, env_step=1383424, len=20, n/ep=4, n/st=64, player_1/loss=352.249, player_2/loss=673.069, rew=419.00]


Epoch #1351: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1352: 1025it [00:02, 372.17it/s, env_step=1384448, len=21, n/ep=3, n/st=64, player_1/loss=348.457, player_2/loss=484.907, rew=464.67]


Epoch #1352: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1353: 1025it [00:02, 372.17it/s, env_step=1385472, len=27, n/ep=3, n/st=64, player_1/loss=190.970, player_2/loss=373.774, rew=840.00]


Epoch #1353: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1354: 1025it [00:02, 376.13it/s, env_step=1386496, len=30, n/ep=2, n/st=64, player_1/loss=262.669, player_2/loss=380.424, rew=959.00]


Epoch #1354: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1355: 1025it [00:02, 373.53it/s, env_step=1387520, len=31, n/ep=2, n/st=64, player_1/loss=463.137, player_2/loss=226.763, rew=1026.00]


Epoch #1355: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1356: 1025it [00:02, 373.80it/s, env_step=1388544, len=27, n/ep=3, n/st=64, player_1/loss=726.094, player_2/loss=167.835, rew=756.00]


Epoch #1356: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1357: 1025it [00:02, 376.41it/s, env_step=1389568, len=28, n/ep=2, n/st=64, player_1/loss=725.965, player_2/loss=300.409, rew=814.00]


Epoch #1357: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1358: 1025it [00:02, 375.30it/s, env_step=1390592, len=37, n/ep=2, n/st=64, player_1/loss=456.295, player_2/loss=457.893, rew=1404.00]


Epoch #1358: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1359: 1025it [00:02, 373.39it/s, env_step=1391616, len=29, n/ep=2, n/st=64, player_1/loss=339.223, player_2/loss=357.576, rew=918.00]


Epoch #1359: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1360: 1025it [00:02, 377.10it/s, env_step=1392640, len=29, n/ep=3, n/st=64, player_1/loss=646.269, player_2/loss=113.588, rew=868.67]


Epoch #1360: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1361: 1025it [00:02, 375.99it/s, env_step=1393664, len=22, n/ep=3, n/st=64, player_1/loss=650.578, player_2/loss=93.824, rew=533.33]


Epoch #1361: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1362: 1025it [00:02, 375.44it/s, env_step=1394688, len=23, n/ep=3, n/st=64, player_1/loss=281.071, player_2/loss=71.012, rew=558.00]


Epoch #1362: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1363: 1025it [00:02, 375.86it/s, env_step=1395712, len=21, n/ep=3, n/st=64, player_1/loss=521.131, player_2/loss=59.382, rew=490.00]


Epoch #1363: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1364: 1025it [00:02, 375.17it/s, env_step=1396736, len=27, n/ep=3, n/st=64, player_1/loss=451.431, player_2/loss=333.473, rew=794.67]


Epoch #1364: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1365: 1025it [00:02, 374.21it/s, env_step=1397760, len=18, n/ep=4, n/st=64, player_1/loss=207.219, player_2/loss=631.249, rew=383.50]


Epoch #1365: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1366: 1025it [00:02, 372.58it/s, env_step=1398784, len=26, n/ep=2, n/st=64, player_1/loss=205.147, player_2/loss=613.783, rew=716.00]


Epoch #1366: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1367: 1025it [00:02, 374.76it/s, env_step=1399808, len=17, n/ep=4, n/st=64, player_1/loss=129.981, player_2/loss=429.953, rew=321.00]


Epoch #1367: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1368: 1025it [00:02, 376.55it/s, env_step=1400832, len=16, n/ep=4, n/st=64, player_1/loss=163.262, player_2/loss=584.324, rew=281.50]


Epoch #1368: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1369: 1025it [00:02, 375.03it/s, env_step=1401856, len=21, n/ep=4, n/st=64, player_1/loss=374.867, player_2/loss=654.718, rew=532.50]


Epoch #1369: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1370: 1025it [00:02, 375.72it/s, env_step=1402880, len=16, n/ep=3, n/st=64, player_1/loss=373.526, player_2/loss=746.798, rew=292.67]


Epoch #1370: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1371: 1025it [00:02, 374.76it/s, env_step=1403904, len=14, n/ep=4, n/st=64, player_1/loss=251.275, player_2/loss=757.757, rew=208.50]


Epoch #1371: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1372: 1025it [00:02, 371.50it/s, env_step=1404928, len=19, n/ep=3, n/st=64, player_1/loss=168.463, player_2/loss=235.311, rew=380.67]


Epoch #1372: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1373: 1025it [00:02, 376.27it/s, env_step=1405952, len=22, n/ep=2, n/st=64, player_1/loss=255.865, player_2/loss=158.215, rew=529.00]


Epoch #1373: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1374: 1025it [00:02, 373.39it/s, env_step=1406976, len=24, n/ep=3, n/st=64, player_1/loss=363.991, player_2/loss=327.371, rew=628.00]


Epoch #1374: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1375: 1025it [00:02, 373.66it/s, env_step=1408000, len=19, n/ep=3, n/st=64, player_1/loss=396.828, player_2/loss=463.245, rew=435.33]


Epoch #1375: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1376: 1025it [00:02, 375.30it/s, env_step=1409024, len=30, n/ep=2, n/st=64, player_1/loss=350.587, player_2/loss=565.564, rew=929.00]


Epoch #1376: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1377: 1025it [00:02, 376.96it/s, env_step=1410048, len=37, n/ep=1, n/st=64, player_1/loss=337.505, player_2/loss=639.086, rew=1404.00]


Epoch #1377: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1378: 1025it [00:02, 374.89it/s, env_step=1411072, len=22, n/ep=4, n/st=64, player_1/loss=234.727, player_2/loss=411.976, rew=548.00]


Epoch #1378: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1379: 1025it [00:02, 375.58it/s, env_step=1412096, len=28, n/ep=2, n/st=64, player_1/loss=189.612, player_2/loss=201.170, rew=841.00]


Epoch #1379: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1380: 1025it [00:02, 375.03it/s, env_step=1413120, len=23, n/ep=3, n/st=64, player_1/loss=274.414, player_2/loss=30.517, rew=558.00]


Epoch #1380: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1381: 1025it [00:02, 373.53it/s, env_step=1414144, len=26, n/ep=3, n/st=64, player_1/loss=281.648, player_2/loss=676.571, rew=704.67]


Epoch #1381: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1382: 1025it [00:02, 374.35it/s, env_step=1415168, len=21, n/ep=3, n/st=64, player_1/loss=346.735, player_2/loss=665.138, rew=492.67]


Epoch #1382: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1383: 1025it [00:02, 376.13it/s, env_step=1416192, len=25, n/ep=3, n/st=64, player_1/loss=713.948, player_2/loss=151.199, rew=648.00]


Epoch #1383: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1384: 1025it [00:02, 376.27it/s, env_step=1417216, len=28, n/ep=2, n/st=64, player_1/loss=607.921, player_2/loss=80.738, rew=859.00]


Epoch #1384: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1385: 1025it [00:02, 369.76it/s, env_step=1418240, len=30, n/ep=2, n/st=64, player_1/loss=792.451, player_2/loss=164.375, rew=1015.00]


Epoch #1385: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1386: 1025it [00:02, 375.17it/s, env_step=1419264, len=20, n/ep=3, n/st=64, player_1/loss=887.581, player_2/loss=241.789, rew=418.67]


Epoch #1386: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1387: 1025it [00:02, 376.96it/s, env_step=1420288, len=20, n/ep=3, n/st=64, player_1/loss=314.460, player_2/loss=384.112, rew=420.67]


Epoch #1387: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1388: 1025it [00:02, 376.41it/s, env_step=1421312, len=25, n/ep=3, n/st=64, player_1/loss=244.464, player_2/loss=491.995, rew=652.67]


Epoch #1388: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1389: 1025it [00:02, 377.24it/s, env_step=1422336, len=21, n/ep=3, n/st=64, player_1/loss=119.060, player_2/loss=251.057, rew=462.67]


Epoch #1389: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1390: 1025it [00:02, 375.58it/s, env_step=1423360, len=20, n/ep=3, n/st=64, player_1/loss=38.428, player_2/loss=54.010, rew=459.33]


Epoch #1390: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1391: 1025it [00:02, 376.96it/s, env_step=1424384, len=22, n/ep=3, n/st=64, player_1/loss=48.472, player_2/loss=563.890, rew=520.00]


Epoch #1391: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1392: 1025it [00:02, 376.96it/s, env_step=1425408, len=31, n/ep=2, n/st=64, player_1/loss=93.658, player_2/loss=755.500, rew=1064.00]


Epoch #1392: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1393: 1025it [00:02, 375.58it/s, env_step=1426432, len=31, n/ep=3, n/st=64, player_1/loss=110.389, player_2/loss=725.231, rew=1112.67]


Epoch #1393: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1394: 1025it [00:02, 372.71it/s, env_step=1427456, len=27, n/ep=3, n/st=64, player_1/loss=370.591, player_2/loss=736.825, rew=868.67]


Epoch #1394: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1395: 1025it [00:02, 374.35it/s, env_step=1428480, len=21, n/ep=3, n/st=64, player_1/loss=444.114, player_2/loss=679.534, rew=498.67]


Epoch #1395: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1396: 1025it [00:02, 356.39it/s, env_step=1429504, len=23, n/ep=3, n/st=64, player_1/loss=442.300, player_2/loss=696.290, rew=550.67]


Epoch #1396: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1397: 1025it [00:02, 374.21it/s, env_step=1430528, len=31, n/ep=2, n/st=64, player_2/loss=555.797, rew=990.00]  


Epoch #1397: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1398: 1025it [00:02, 376.41it/s, env_step=1431552, len=31, n/ep=2, n/st=64, player_1/loss=773.568, player_2/loss=161.883, rew=990.00]


Epoch #1398: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1399: 1025it [00:02, 377.65it/s, env_step=1432576, len=41, n/ep=1, n/st=64, player_1/loss=801.017, player_2/loss=446.633, rew=1720.00]


Epoch #1399: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1400: 1025it [00:02, 373.94it/s, env_step=1433600, len=26, n/ep=3, n/st=64, player_1/loss=505.441, player_2/loss=642.033, rew=702.67]


Epoch #1400: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1401: 1025it [00:02, 377.24it/s, env_step=1434624, len=39, n/ep=1, n/st=64, player_1/loss=398.861, player_2/loss=374.454, rew=1558.00]


Epoch #1401: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1402: 1025it [00:02, 374.75it/s, env_step=1435648, len=39, n/ep=2, n/st=64, player_1/loss=496.804, player_2/loss=500.096, rew=1619.00]


Epoch #1402: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1403: 1025it [00:02, 374.35it/s, env_step=1436672, len=20, n/ep=3, n/st=64, player_1/loss=415.867, player_2/loss=721.091, rew=448.67]


Epoch #1403: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1404: 1025it [00:02, 375.30it/s, env_step=1437696, len=17, n/ep=4, n/st=64, player_1/loss=482.498, player_2/loss=770.606, rew=358.50]


Epoch #1404: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1405: 1025it [00:02, 374.21it/s, env_step=1438720, len=27, n/ep=2, n/st=64, player_1/loss=462.151, player_2/loss=681.092, rew=784.00]


Epoch #1405: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1406: 1025it [00:02, 372.85it/s, env_step=1439744, len=24, n/ep=3, n/st=64, player_1/loss=1059.963, player_2/loss=784.044, rew=629.33]


Epoch #1406: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1407: 1025it [00:02, 374.48it/s, env_step=1440768, len=27, n/ep=3, n/st=64, player_1/loss=1073.320, player_2/loss=498.235, rew=892.67]


Epoch #1407: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1408: 1025it [00:02, 373.80it/s, env_step=1441792, len=38, n/ep=2, n/st=64, player_1/loss=386.655, player_2/loss=332.284, rew=1481.00]


Epoch #1408: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1409: 1025it [00:02, 374.48it/s, env_step=1442816, len=16, n/ep=4, n/st=64, player_1/loss=396.508, player_2/loss=226.997, rew=313.50]


Epoch #1409: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1410: 1025it [00:02, 376.13it/s, env_step=1443840, len=28, n/ep=3, n/st=64, player_1/loss=627.196, player_2/loss=715.563, rew=818.67]


Epoch #1410: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1411: 1025it [00:02, 374.89it/s, env_step=1444864, len=34, n/ep=2, n/st=64, player_1/loss=389.805, player_2/loss=676.056, rew=1189.00]


Epoch #1411: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1412: 1025it [00:02, 375.03it/s, env_step=1445888, len=27, n/ep=3, n/st=64, player_1/loss=167.373, player_2/loss=370.026, rew=808.67]


Epoch #1412: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1413: 1025it [00:02, 371.50it/s, env_step=1446912, len=24, n/ep=2, n/st=64, player_1/loss=536.522, player_2/loss=273.805, rew=679.00]


Epoch #1413: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1414: 1025it [00:02, 376.13it/s, env_step=1447936, len=35, n/ep=2, n/st=64, player_1/loss=727.925, player_2/loss=96.299, rew=1296.00]


Epoch #1414: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1415: 1025it [00:02, 375.58it/s, env_step=1448960, len=30, n/ep=2, n/st=64, player_1/loss=447.253, player_2/loss=81.252, rew=929.00]


Epoch #1415: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1416: 1025it [00:02, 375.85it/s, env_step=1449984, len=28, n/ep=2, n/st=64, player_1/loss=272.647, player_2/loss=83.148, rew=859.00]


Epoch #1416: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1417: 1025it [00:02, 378.07it/s, env_step=1451008, len=26, n/ep=2, n/st=64, player_1/loss=203.452, player_2/loss=241.319, rew=709.00]


Epoch #1417: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1418: 1025it [00:02, 371.50it/s, env_step=1452032, len=26, n/ep=2, n/st=64, player_1/loss=553.952, player_2/loss=454.536, rew=709.00]


Epoch #1418: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1419: 1025it [00:02, 376.55it/s, env_step=1453056, len=38, n/ep=1, n/st=64, player_1/loss=593.727, player_2/loss=692.270, rew=1480.00]


Epoch #1419: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1420: 1025it [00:02, 378.21it/s, env_step=1454080, len=35, n/ep=2, n/st=64, player_1/loss=232.241, player_2/loss=438.814, rew=1300.00]


Epoch #1420: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1421: 1025it [00:02, 375.72it/s, env_step=1455104, len=28, n/ep=3, n/st=64, player_1/loss=348.028, player_2/loss=310.542, rew=862.00]


Epoch #1421: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1422: 1025it [00:02, 376.68it/s, env_step=1456128, len=31, n/ep=2, n/st=64, player_1/loss=41.483, player_2/loss=435.059, rew=991.00]


Epoch #1422: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1423: 1025it [00:02, 376.82it/s, env_step=1457152, len=15, n/ep=5, n/st=64, player_1/loss=68.526, player_2/loss=539.784, rew=329.20]


Epoch #1423: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1424: 1025it [00:02, 374.34it/s, env_step=1458176, len=16, n/ep=4, n/st=64, player_1/loss=113.272, player_2/loss=770.598, rew=298.00]


Epoch #1424: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1425: 1025it [00:02, 374.35it/s, env_step=1459200, len=26, n/ep=2, n/st=64, player_1/loss=157.821, player_2/loss=954.601, rew=769.00]


Epoch #1425: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1426: 1025it [00:02, 374.76it/s, env_step=1460224, len=8, n/ep=7, n/st=64, player_1/loss=271.215, player_2/loss=800.266, rew=74.57]


Epoch #1426: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1427: 1025it [00:02, 374.48it/s, env_step=1461248, len=16, n/ep=5, n/st=64, player_1/loss=271.752, player_2/loss=639.289, rew=385.20]


Epoch #1427: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1428: 1025it [00:02, 375.72it/s, env_step=1462272, len=12, n/ep=6, n/st=64, player_1/loss=255.153, player_2/loss=666.830, rew=173.33]


Epoch #1428: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1429: 1025it [00:02, 373.12it/s, env_step=1463296, len=15, n/ep=4, n/st=64, player_1/loss=199.691, player_2/loss=849.256, rew=277.50]


Epoch #1429: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1430: 1025it [00:02, 372.58it/s, env_step=1464320, len=33, n/ep=2, n/st=64, player_1/loss=178.921, player_2/loss=650.709, rew=1184.00]


Epoch #1430: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1431: 1025it [00:02, 375.44it/s, env_step=1465344, len=21, n/ep=2, n/st=64, player_1/loss=189.651, player_2/loss=517.211, rew=512.00]


Epoch #1431: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1432: 1025it [00:02, 374.21it/s, env_step=1466368, len=28, n/ep=2, n/st=64, player_1/loss=328.410, player_2/loss=285.434, rew=841.00]


Epoch #1432: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1433: 1025it [00:02, 373.94it/s, env_step=1467392, len=16, n/ep=4, n/st=64, player_1/loss=417.637, player_2/loss=313.812, rew=275.00]


Epoch #1433: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1434: 1025it [00:02, 374.35it/s, env_step=1468416, len=31, n/ep=2, n/st=64, player_1/loss=395.027, player_2/loss=444.292, rew=1071.00]


Epoch #1434: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1435: 1025it [00:02, 375.30it/s, env_step=1469440, len=28, n/ep=3, n/st=64, player_1/loss=422.417, player_2/loss=357.990, rew=850.67]


Epoch #1435: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1436: 1025it [00:02, 371.77it/s, env_step=1470464, len=34, n/ep=2, n/st=64, player_1/loss=261.938, player_2/loss=533.112, rew=1189.00]


Epoch #1436: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1437: 1025it [00:02, 377.24it/s, env_step=1471488, len=23, n/ep=3, n/st=64, player_1/loss=276.278, player_2/loss=421.055, rew=558.00]


Epoch #1437: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1438: 1025it [00:02, 373.53it/s, env_step=1472512, len=27, n/ep=2, n/st=64, player_1/loss=463.739, player_2/loss=370.990, rew=782.00]


Epoch #1438: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1439: 1025it [00:02, 374.21it/s, env_step=1473536, len=29, n/ep=3, n/st=64, player_1/loss=623.117, player_2/loss=716.101, rew=870.67]


Epoch #1439: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1440: 1025it [00:02, 375.30it/s, env_step=1474560, len=23, n/ep=3, n/st=64, player_1/loss=584.157, rew=586.00]  


Epoch #1440: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1441: 1025it [00:02, 377.10it/s, env_step=1475584, len=14, n/ep=4, n/st=64, player_1/loss=233.860, player_2/loss=344.307, rew=249.00]


Epoch #1441: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1442: 1025it [00:02, 374.48it/s, env_step=1476608, len=25, n/ep=3, n/st=64, player_1/loss=297.650, player_2/loss=466.050, rew=682.67]


Epoch #1442: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1443: 1025it [00:02, 375.17it/s, env_step=1477632, len=23, n/ep=3, n/st=64, player_1/loss=524.200, player_2/loss=608.104, rew=596.00]


Epoch #1443: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1444: 1025it [00:02, 375.03it/s, env_step=1478656, len=33, n/ep=2, n/st=64, player_1/loss=842.389, player_2/loss=674.108, rew=1156.00]


Epoch #1444: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1445: 1025it [00:02, 372.04it/s, env_step=1479680, len=32, n/ep=2, n/st=64, player_1/loss=664.820, player_2/loss=173.924, rew=1079.00]


Epoch #1445: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1446: 1025it [00:02, 375.17it/s, env_step=1480704, len=28, n/ep=3, n/st=64, player_1/loss=457.771, player_2/loss=160.565, rew=850.67]


Epoch #1446: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1447: 1025it [00:02, 373.39it/s, env_step=1481728, len=28, n/ep=2, n/st=64, player_1/loss=415.914, player_2/loss=251.605, rew=949.00]


Epoch #1447: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1448: 1025it [00:02, 374.35it/s, env_step=1482752, len=18, n/ep=3, n/st=64, player_1/loss=152.359, player_2/loss=300.472, rew=390.00]


Epoch #1448: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1449: 1025it [00:02, 373.80it/s, env_step=1483776, len=23, n/ep=3, n/st=64, player_1/loss=219.888, player_2/loss=279.029, rew=662.67]


Epoch #1449: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1450: 1025it [00:02, 376.13it/s, env_step=1484800, len=15, n/ep=4, n/st=64, player_1/loss=229.186, player_2/loss=290.496, rew=256.00]


Epoch #1450: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1451: 1025it [00:02, 374.76it/s, env_step=1485824, len=31, n/ep=2, n/st=64, player_1/loss=151.229, player_2/loss=254.904, rew=1132.00]


Epoch #1451: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1452: 1025it [00:02, 374.35it/s, env_step=1486848, len=27, n/ep=3, n/st=64, player_1/loss=294.335, player_2/loss=491.545, rew=789.33]


Epoch #1452: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1453: 1025it [00:02, 376.27it/s, env_step=1487872, len=18, n/ep=3, n/st=64, player_1/loss=296.030, player_2/loss=357.282, rew=369.33]


Epoch #1453: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1454: 1025it [00:02, 374.89it/s, env_step=1488896, len=29, n/ep=2, n/st=64, player_1/loss=421.000, player_2/loss=231.438, rew=872.00]


Epoch #1454: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1455: 1025it [00:02, 374.76it/s, env_step=1489920, len=21, n/ep=4, n/st=64, player_1/loss=764.320, player_2/loss=267.023, rew=564.00]


Epoch #1455: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1456: 1025it [00:02, 374.48it/s, env_step=1490944, len=22, n/ep=3, n/st=64, player_1/loss=481.871, player_2/loss=474.485, rew=506.67]


Epoch #1456: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1457: 1025it [00:02, 372.98it/s, env_step=1491968, len=15, n/ep=4, n/st=64, player_1/loss=386.435, player_2/loss=515.549, rew=248.00]


Epoch #1457: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1458: 1025it [00:02, 372.04it/s, env_step=1492992, len=22, n/ep=3, n/st=64, player_1/loss=468.844, player_2/loss=420.498, rew=506.67]


Epoch #1458: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1459: 1025it [00:02, 378.21it/s, env_step=1494016, len=20, n/ep=3, n/st=64, player_1/loss=266.017, player_2/loss=92.857, rew=448.67]


Epoch #1459: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1460: 1025it [00:02, 373.25it/s, env_step=1495040, len=21, n/ep=3, n/st=64, player_1/loss=48.889, player_2/loss=360.652, rew=492.67]


Epoch #1460: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1461: 1025it [00:02, 375.03it/s, env_step=1496064, len=27, n/ep=3, n/st=64, player_1/loss=96.414, player_2/loss=570.878, rew=856.00]


Epoch #1461: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1462: 1025it [00:02, 377.38it/s, env_step=1497088, len=35, n/ep=2, n/st=64, player_1/loss=210.888, player_2/loss=367.624, rew=1267.00]


Epoch #1462: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1463: 1025it [00:02, 373.53it/s, env_step=1498112, len=29, n/ep=2, n/st=64, player_1/loss=392.771, player_2/loss=253.117, rew=898.00]


Epoch #1463: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1464: 1025it [00:02, 374.62it/s, env_step=1499136, len=39, n/ep=1, n/st=64, player_1/loss=274.835, player_2/loss=540.605, rew=1558.00]


Epoch #1464: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1465: 1025it [00:02, 375.58it/s, env_step=1500160, len=18, n/ep=3, n/st=64, player_1/loss=91.106, player_2/loss=591.223, rew=348.67]


Epoch #1465: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1466: 1025it [00:02, 374.89it/s, env_step=1501184, len=8, n/ep=8, n/st=64, player_1/loss=639.266, player_2/loss=422.631, rew=77.00]


Epoch #1466: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1467: 1025it [00:02, 374.07it/s, env_step=1502208, len=17, n/ep=4, n/st=64, player_1/loss=632.880, player_2/loss=180.017, rew=343.50]


Epoch #1467: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1468: 1025it [00:02, 374.89it/s, env_step=1503232, len=22, n/ep=3, n/st=64, player_1/loss=339.384, player_2/loss=158.775, rew=508.67]


Epoch #1468: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1469: 1025it [00:02, 374.07it/s, env_step=1504256, len=8, n/ep=8, n/st=64, player_1/loss=158.490, player_2/loss=148.188, rew=78.00]


Epoch #1469: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1470: 1025it [00:02, 375.86it/s, env_step=1505280, len=32, n/ep=2, n/st=64, player_1/loss=301.217, player_2/loss=183.245, rew=1054.00]


Epoch #1470: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1471: 1025it [00:02, 376.82it/s, env_step=1506304, len=38, n/ep=2, n/st=64, player_1/loss=339.081, player_2/loss=301.209, rew=1519.00]


Epoch #1471: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1472: 1025it [00:02, 372.44it/s, env_step=1507328, len=11, n/ep=6, n/st=64, player_1/loss=244.525, player_2/loss=474.565, rew=146.33]


Epoch #1472: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1473: 1025it [00:02, 375.30it/s, env_step=1508352, len=24, n/ep=3, n/st=64, player_1/loss=320.420, player_2/loss=609.527, rew=632.67]


Epoch #1473: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1474: 1025it [00:02, 376.68it/s, env_step=1509376, len=15, n/ep=3, n/st=64, player_1/loss=163.412, player_2/loss=368.349, rew=273.33]


Epoch #1474: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1475: 1025it [00:02, 371.23it/s, env_step=1510400, len=31, n/ep=3, n/st=64, player_1/loss=330.497, player_2/loss=320.108, rew=996.00]


Epoch #1475: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1476: 1025it [00:02, 375.30it/s, env_step=1511424, len=23, n/ep=3, n/st=64, player_1/loss=288.162, player_2/loss=245.267, rew=632.67]


Epoch #1476: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1477: 1025it [00:02, 375.17it/s, env_step=1512448, len=18, n/ep=4, n/st=64, player_1/loss=500.704, player_2/loss=592.663, rew=389.00]


Epoch #1477: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1478: 1025it [00:02, 372.58it/s, env_step=1513472, len=17, n/ep=5, n/st=64, player_1/loss=441.803, player_2/loss=539.532, rew=380.00]


Epoch #1478: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1479: 1025it [00:02, 372.04it/s, env_step=1514496, len=31, n/ep=2, n/st=64, player_1/loss=465.213, player_2/loss=250.324, rew=1026.00]


Epoch #1479: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1480: 1025it [00:02, 375.44it/s, env_step=1515520, len=15, n/ep=3, n/st=64, player_1/loss=459.432, player_2/loss=408.632, rew=251.33]


Epoch #1480: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1481: 1025it [00:02, 371.63it/s, env_step=1516544, len=16, n/ep=5, n/st=64, player_1/loss=263.969, player_2/loss=444.298, rew=274.00]


Epoch #1481: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1482: 1025it [00:02, 374.89it/s, env_step=1517568, len=25, n/ep=2, n/st=64, player_1/loss=410.174, player_2/loss=253.929, rew=684.00]


Epoch #1482: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1483: 1025it [00:02, 371.23it/s, env_step=1518592, len=22, n/ep=3, n/st=64, player_1/loss=376.759, player_2/loss=135.773, rew=504.67]


Epoch #1483: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1484: 1025it [00:02, 375.85it/s, env_step=1519616, len=22, n/ep=3, n/st=64, player_1/loss=182.771, player_2/loss=210.921, rew=519.33]


Epoch #1484: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1485: 1025it [00:02, 377.10it/s, env_step=1520640, len=21, n/ep=3, n/st=64, player_1/loss=183.617, player_2/loss=250.610, rew=490.67]


Epoch #1485: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1486: 1025it [00:02, 371.90it/s, env_step=1521664, len=21, n/ep=3, n/st=64, player_1/loss=280.101, player_2/loss=340.071, rew=492.67]


Epoch #1486: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1487: 1025it [00:02, 373.12it/s, env_step=1522688, len=24, n/ep=3, n/st=64, player_1/loss=274.788, player_2/loss=389.550, rew=684.67]


Epoch #1487: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1488: 1025it [00:02, 372.71it/s, env_step=1523712, len=32, n/ep=2, n/st=64, player_1/loss=139.832, player_2/loss=187.537, rew=1054.00]


Epoch #1488: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1489: 1025it [00:02, 374.76it/s, env_step=1524736, len=26, n/ep=3, n/st=64, player_1/loss=99.576, player_2/loss=103.544, rew=915.33]


Epoch #1489: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1490: 1025it [00:02, 373.39it/s, env_step=1525760, len=27, n/ep=3, n/st=64, player_1/loss=292.790, player_2/loss=68.569, rew=872.67]


Epoch #1490: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1491: 1025it [00:02, 373.39it/s, env_step=1526784, len=13, n/ep=5, n/st=64, player_1/loss=859.697, player_2/loss=210.359, rew=210.00]


Epoch #1491: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1492: 1025it [00:02, 372.30it/s, env_step=1527808, len=24, n/ep=2, n/st=64, player_1/loss=753.356, player_2/loss=204.586, rew=607.00]


Epoch #1492: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1493: 1025it [00:02, 374.62it/s, env_step=1528832, len=11, n/ep=6, n/st=64, player_1/loss=389.705, player_2/loss=256.458, rew=136.67]


Epoch #1493: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1494: 1025it [00:02, 376.13it/s, env_step=1529856, len=28, n/ep=2, n/st=64, player_1/loss=603.413, player_2/loss=391.600, rew=851.00]


Epoch #1494: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1495: 1025it [00:02, 372.58it/s, env_step=1530880, len=15, n/ep=4, n/st=64, player_1/loss=732.350, player_2/loss=197.138, rew=240.00]


Epoch #1495: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1496: 1025it [00:02, 374.76it/s, env_step=1531904, len=18, n/ep=3, n/st=64, player_1/loss=869.716, player_2/loss=193.751, rew=388.67]


Epoch #1496: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1497: 1025it [00:02, 377.10it/s, env_step=1532928, len=27, n/ep=2, n/st=64, player_1/loss=637.092, player_2/loss=188.011, rew=755.00]


Epoch #1497: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1498: 1025it [00:02, 374.34it/s, env_step=1533952, len=29, n/ep=2, n/st=64, player_1/loss=328.262, player_2/loss=316.128, rew=904.00]


Epoch #1498: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1499: 1025it [00:02, 376.82it/s, env_step=1534976, len=15, n/ep=5, n/st=64, player_1/loss=416.091, player_2/loss=697.421, rew=248.40]


Epoch #1499: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1500: 1025it [00:02, 375.03it/s, env_step=1536000, len=15, n/ep=5, n/st=64, player_1/loss=342.425, rew=259.20]  


Epoch #1500: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1501: 1025it [00:02, 369.22it/s, env_step=1537024, len=29, n/ep=2, n/st=64, player_1/loss=322.011, player_2/loss=558.533, rew=868.00]


Epoch #1501: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1502: 1025it [00:02, 375.72it/s, env_step=1538048, len=15, n/ep=5, n/st=64, player_1/loss=398.031, player_2/loss=393.588, rew=270.80]


Epoch #1502: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1503: 1025it [00:02, 367.24it/s, env_step=1539072, len=19, n/ep=4, n/st=64, player_1/loss=271.836, player_2/loss=293.008, rew=381.50]


Epoch #1503: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1504: 1025it [00:02, 372.58it/s, env_step=1540096, len=15, n/ep=4, n/st=64, player_1/loss=338.589, player_2/loss=242.026, rew=280.50]


Epoch #1504: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1505: 1025it [00:02, 372.17it/s, env_step=1541120, len=21, n/ep=3, n/st=64, player_1/loss=251.622, player_2/loss=187.988, rew=460.67]


Epoch #1505: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1506: 1025it [00:02, 372.17it/s, env_step=1542144, len=22, n/ep=2, n/st=64, player_1/loss=149.154, player_2/loss=336.063, rew=513.00]


Epoch #1506: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1507: 1025it [00:02, 375.44it/s, env_step=1543168, len=20, n/ep=3, n/st=64, player_1/loss=229.017, player_2/loss=194.478, rew=460.00]


Epoch #1507: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1508: 1025it [00:02, 373.66it/s, env_step=1544192, len=8, n/ep=8, n/st=64, player_1/loss=336.788, player_2/loss=281.857, rew=71.75]


Epoch #1508: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1509: 1025it [00:02, 370.16it/s, env_step=1545216, len=31, n/ep=2, n/st=64, player_1/loss=408.332, player_2/loss=523.031, rew=1052.00]


Epoch #1509: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1510: 1025it [00:02, 373.39it/s, env_step=1546240, len=20, n/ep=4, n/st=64, player_1/loss=495.848, player_2/loss=258.336, rew=530.50]


Epoch #1510: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1511: 1025it [00:02, 358.01it/s, env_step=1547264, len=15, n/ep=3, n/st=64, player_1/loss=429.546, player_2/loss=231.112, rew=251.33]


Epoch #1511: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1512: 1025it [00:02, 366.32it/s, env_step=1548288, len=22, n/ep=3, n/st=64, player_1/loss=178.160, player_2/loss=383.032, rew=578.67]


Epoch #1512: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1513: 1025it [00:02, 374.07it/s, env_step=1549312, len=15, n/ep=4, n/st=64, player_1/loss=183.742, player_2/loss=348.480, rew=243.50]


Epoch #1513: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1514: 1025it [00:02, 374.07it/s, env_step=1550336, len=14, n/ep=4, n/st=64, player_1/loss=180.727, player_2/loss=442.824, rew=233.00]


Epoch #1514: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1515: 1025it [00:02, 368.82it/s, env_step=1551360, len=24, n/ep=2, n/st=64, player_1/loss=265.566, player_2/loss=359.532, rew=599.00]


Epoch #1515: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1516: 1025it [00:02, 372.58it/s, env_step=1552384, len=21, n/ep=3, n/st=64, player_1/loss=323.015, player_2/loss=203.167, rew=528.00]


Epoch #1516: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1517: 1025it [00:02, 375.99it/s, env_step=1553408, len=28, n/ep=2, n/st=64, player_1/loss=236.132, player_2/loss=69.535, rew=841.00]


Epoch #1517: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1518: 1025it [00:02, 369.89it/s, env_step=1554432, len=18, n/ep=3, n/st=64, player_1/loss=233.567, player_2/loss=56.852, rew=401.33]


Epoch #1518: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1519: 1025it [00:02, 373.53it/s, env_step=1555456, len=22, n/ep=2, n/st=64, player_1/loss=261.369, player_2/loss=193.502, rew=505.00]


Epoch #1519: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1520: 1025it [00:02, 372.44it/s, env_step=1556480, len=15, n/ep=4, n/st=64, player_1/loss=135.603, player_2/loss=283.375, rew=287.50]


Epoch #1520: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1521: 1025it [00:02, 375.17it/s, env_step=1557504, len=16, n/ep=3, n/st=64, player_1/loss=74.877, player_2/loss=440.554, rew=303.33]


Epoch #1521: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1522: 1025it [00:02, 372.44it/s, env_step=1558528, len=14, n/ep=4, n/st=64, player_1/loss=259.459, player_2/loss=379.024, rew=233.50]


Epoch #1522: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1523: 1025it [00:02, 371.36it/s, env_step=1559552, len=20, n/ep=3, n/st=64, player_1/loss=558.987, player_2/loss=156.765, rew=475.33]


Epoch #1523: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1524: 1025it [00:02, 371.23it/s, env_step=1560576, len=10, n/ep=7, n/st=64, player_1/loss=599.438, player_2/loss=254.011, rew=116.57]


Epoch #1524: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1525: 1025it [00:02, 372.58it/s, env_step=1561600, len=18, n/ep=4, n/st=64, player_1/loss=310.206, player_2/loss=341.024, rew=351.50]


Epoch #1525: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1526: 1025it [00:02, 370.96it/s, env_step=1562624, len=7, n/ep=9, n/st=64, player_1/loss=211.465, player_2/loss=300.410, rew=61.33]


Epoch #1526: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1527: 1025it [00:02, 373.12it/s, env_step=1563648, len=14, n/ep=6, n/st=64, player_1/loss=98.766, player_2/loss=228.032, rew=351.33]


Epoch #1527: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1528: 1025it [00:02, 372.04it/s, env_step=1564672, len=8, n/ep=8, n/st=64, player_1/loss=130.673, player_2/loss=135.789, rew=85.75]


Epoch #1528: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1529: 1025it [00:02, 372.44it/s, env_step=1565696, len=25, n/ep=3, n/st=64, player_1/loss=169.490, player_2/loss=77.165, rew=720.67]


Epoch #1529: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1530: 1025it [00:02, 372.44it/s, env_step=1566720, len=14, n/ep=4, n/st=64, player_1/loss=131.255, player_2/loss=189.100, rew=246.00]


Epoch #1530: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1531: 1025it [00:02, 373.12it/s, env_step=1567744, len=30, n/ep=2, n/st=64, player_1/loss=113.519, player_2/loss=465.306, rew=1015.00]


Epoch #1531: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1532: 1025it [00:02, 375.30it/s, env_step=1568768, len=15, n/ep=4, n/st=64, player_1/loss=186.890, player_2/loss=481.777, rew=264.00]


Epoch #1532: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1533: 1025it [00:02, 374.48it/s, env_step=1569792, len=13, n/ep=4, n/st=64, player_1/loss=254.133, player_2/loss=344.292, rew=201.50]


Epoch #1533: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1534: 1025it [00:02, 369.36it/s, env_step=1570816, len=18, n/ep=3, n/st=64, player_1/loss=247.735, player_2/loss=242.048, rew=366.67]


Epoch #1534: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1535: 1025it [00:02, 372.98it/s, env_step=1571840, len=18, n/ep=3, n/st=64, player_1/loss=229.926, player_2/loss=208.338, rew=360.67]


Epoch #1535: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1536: 1025it [00:02, 375.03it/s, env_step=1572864, len=22, n/ep=3, n/st=64, player_1/loss=176.799, player_2/loss=126.535, rew=504.67]


Epoch #1536: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1537: 1025it [00:02, 373.80it/s, env_step=1573888, len=15, n/ep=5, n/st=64, player_1/loss=145.755, player_2/loss=99.565, rew=239.60]


Epoch #1537: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1538: 1025it [00:02, 374.89it/s, env_step=1574912, len=21, n/ep=3, n/st=64, player_1/loss=120.578, player_2/loss=175.341, rew=504.00]


Epoch #1538: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1539: 1025it [00:02, 373.94it/s, env_step=1575936, len=26, n/ep=3, n/st=64, player_1/loss=116.224, player_2/loss=254.587, rew=721.33]


Epoch #1539: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1540: 1025it [00:02, 371.36it/s, env_step=1576960, len=15, n/ep=5, n/st=64, player_1/loss=165.734, player_2/loss=129.242, rew=244.80]


Epoch #1540: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1541: 1025it [00:02, 372.04it/s, env_step=1577984, len=13, n/ep=5, n/st=64, player_1/loss=155.461, player_2/loss=48.509, rew=201.20]


Epoch #1541: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1542: 1025it [00:02, 369.22it/s, env_step=1579008, len=24, n/ep=3, n/st=64, player_1/loss=189.416, player_2/loss=145.600, rew=698.00]


Epoch #1542: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1543: 1025it [00:02, 374.48it/s, env_step=1580032, len=14, n/ep=4, n/st=64, player_1/loss=342.577, player_2/loss=245.770, rew=217.00]


Epoch #1543: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1544: 1025it [00:02, 375.99it/s, env_step=1581056, len=18, n/ep=3, n/st=64, player_1/loss=406.627, player_2/loss=394.908, rew=344.67]


Epoch #1544: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1545: 1025it [00:02, 374.48it/s, env_step=1582080, len=27, n/ep=2, n/st=64, player_1/loss=226.377, player_2/loss=349.311, rew=754.00]


Epoch #1545: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1546: 1025it [00:02, 372.04it/s, env_step=1583104, len=21, n/ep=3, n/st=64, player_1/loss=169.749, player_2/loss=157.966, rew=492.00]


Epoch #1546: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1547: 1025it [00:02, 373.66it/s, env_step=1584128, len=42, n/ep=1, n/st=64, player_1/loss=243.140, player_2/loss=213.044, rew=1804.00]


Epoch #1547: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1548: 1025it [00:02, 374.35it/s, env_step=1585152, len=21, n/ep=3, n/st=64, player_1/loss=298.550, player_2/loss=204.818, rew=460.00]


Epoch #1548: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1549: 1025it [00:02, 372.71it/s, env_step=1586176, len=35, n/ep=2, n/st=64, player_1/loss=278.931, player_2/loss=139.815, rew=1300.00]


Epoch #1549: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1550: 1025it [00:02, 376.82it/s, env_step=1587200, len=21, n/ep=3, n/st=64, player_1/loss=252.506, player_2/loss=319.707, rew=558.67]


Epoch #1550: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1551: 1025it [00:02, 374.76it/s, env_step=1588224, len=14, n/ep=4, n/st=64, player_1/loss=340.154, player_2/loss=443.016, rew=232.50]


Epoch #1551: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1552: 1025it [00:02, 371.09it/s, env_step=1589248, len=21, n/ep=3, n/st=64, player_1/loss=273.957, player_2/loss=228.926, rew=490.00]


Epoch #1552: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1553: 1025it [00:02, 377.79it/s, env_step=1590272, len=22, n/ep=3, n/st=64, player_1/loss=336.849, player_2/loss=164.439, rew=522.67]


Epoch #1553: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1554: 1025it [00:02, 376.13it/s, env_step=1591296, len=21, n/ep=3, n/st=64, player_1/loss=391.561, player_2/loss=198.547, rew=490.67]


Epoch #1554: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1555: 1025it [00:02, 372.85it/s, env_step=1592320, len=18, n/ep=4, n/st=64, player_1/loss=226.708, player_2/loss=308.672, rew=371.50]


Epoch #1555: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1556: 1025it [00:02, 377.38it/s, env_step=1593344, len=29, n/ep=2, n/st=64, player_1/loss=286.901, player_2/loss=238.040, rew=868.00]


Epoch #1556: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1557: 1025it [00:02, 374.21it/s, env_step=1594368, len=26, n/ep=3, n/st=64, player_1/loss=382.298, player_2/loss=291.045, rew=738.00]


Epoch #1557: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1558: 1025it [00:02, 374.48it/s, env_step=1595392, len=21, n/ep=3, n/st=64, player_1/loss=204.261, player_2/loss=178.948, rew=462.00]


Epoch #1558: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1559: 1025it [00:02, 375.30it/s, env_step=1596416, len=19, n/ep=3, n/st=64, player_1/loss=69.400, player_2/loss=165.096, rew=404.67]


Epoch #1559: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1560: 1025it [00:02, 370.69it/s, env_step=1597440, len=31, n/ep=2, n/st=64, player_1/loss=571.681, rew=1026.00] 


Epoch #1560: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1561: 1025it [00:02, 374.07it/s, env_step=1598464, len=21, n/ep=3, n/st=64, player_1/loss=755.861, player_2/loss=216.008, rew=462.67]


Epoch #1561: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1562: 1025it [00:02, 374.62it/s, env_step=1599488, len=22, n/ep=3, n/st=64, player_1/loss=460.732, player_2/loss=267.405, rew=508.67]


Epoch #1562: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1563: 1025it [00:02, 372.98it/s, env_step=1600512, len=20, n/ep=3, n/st=64, player_1/loss=392.501, player_2/loss=217.139, rew=438.00]


Epoch #1563: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1564: 1025it [00:02, 377.38it/s, env_step=1601536, len=21, n/ep=2, n/st=64, player_1/loss=391.822, player_2/loss=111.761, rew=464.00]


Epoch #1564: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1565: 1025it [00:02, 374.48it/s, env_step=1602560, len=19, n/ep=4, n/st=64, player_1/loss=488.154, player_2/loss=284.897, rew=402.50]


Epoch #1565: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1566: 1025it [00:02, 371.50it/s, env_step=1603584, len=26, n/ep=2, n/st=64, player_1/loss=477.230, player_2/loss=233.854, rew=747.00]


Epoch #1566: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1567: 1025it [00:02, 373.66it/s, env_step=1604608, len=13, n/ep=3, n/st=64, player_1/loss=353.996, player_2/loss=99.808, rew=199.33]


Epoch #1567: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1568: 1025it [00:02, 373.94it/s, env_step=1605632, len=15, n/ep=4, n/st=64, player_1/loss=270.694, player_2/loss=238.193, rew=241.50]


Epoch #1568: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1569: 1025it [00:02, 373.66it/s, env_step=1606656, len=33, n/ep=2, n/st=64, player_1/loss=326.250, player_2/loss=213.421, rew=1124.00]


Epoch #1569: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1570: 1025it [00:02, 373.26it/s, env_step=1607680, len=19, n/ep=3, n/st=64, player_1/loss=337.815, player_2/loss=138.647, rew=462.00]


Epoch #1570: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1571: 1025it [00:02, 371.77it/s, env_step=1608704, len=29, n/ep=2, n/st=64, player_1/loss=312.040, player_2/loss=200.018, rew=949.00]


Epoch #1571: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1572: 1025it [00:02, 375.44it/s, env_step=1609728, len=39, n/ep=1, n/st=64, player_1/loss=491.797, player_2/loss=232.528, rew=1558.00]


Epoch #1572: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1573: 1025it [00:02, 376.13it/s, env_step=1610752, len=21, n/ep=3, n/st=64, player_1/loss=515.612, player_2/loss=133.799, rew=512.00]


Epoch #1573: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1574: 1025it [00:02, 374.07it/s, env_step=1611776, len=27, n/ep=3, n/st=64, player_1/loss=200.251, player_2/loss=146.175, rew=772.67]


Epoch #1574: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1575: 1025it [00:02, 377.24it/s, env_step=1612800, len=40, n/ep=1, n/st=64, player_1/loss=288.175, player_2/loss=214.488, rew=1638.00]


Epoch #1575: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1576: 1025it [00:02, 376.68it/s, env_step=1613824, len=23, n/ep=3, n/st=64, player_1/loss=224.461, player_2/loss=167.937, rew=636.00]


Epoch #1576: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1577: 1025it [00:02, 369.36it/s, env_step=1614848, len=21, n/ep=2, n/st=64, player_1/loss=58.025, player_2/loss=198.889, rew=484.00]


Epoch #1577: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1578: 1025it [00:02, 374.07it/s, env_step=1615872, len=20, n/ep=3, n/st=64, player_1/loss=160.011, player_2/loss=123.421, rew=452.00]


Epoch #1578: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1579: 1025it [00:02, 371.50it/s, env_step=1616896, len=21, n/ep=2, n/st=64, player_1/loss=282.163, player_2/loss=105.979, rew=469.00]


Epoch #1579: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1580: 1025it [00:02, 374.62it/s, env_step=1617920, len=22, n/ep=4, n/st=64, player_1/loss=169.902, player_2/loss=138.629, rew=589.50]


Epoch #1580: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1581: 1025it [00:02, 376.68it/s, env_step=1618944, len=25, n/ep=3, n/st=64, player_1/loss=397.284, player_2/loss=144.844, rew=682.67]


Epoch #1581: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1582: 1025it [00:02, 377.65it/s, env_step=1619968, len=24, n/ep=3, n/st=64, player_1/loss=508.014, player_2/loss=151.270, rew=630.67]


Epoch #1582: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1583: 1025it [00:02, 374.75it/s, env_step=1620992, len=30, n/ep=2, n/st=64, player_1/loss=392.307, player_2/loss=137.370, rew=979.00]


Epoch #1583: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1584: 1025it [00:02, 373.39it/s, env_step=1622016, len=14, n/ep=4, n/st=64, player_1/loss=717.061, player_2/loss=54.738, rew=255.50]


Epoch #1584: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1585: 1025it [00:02, 370.83it/s, env_step=1623040, len=22, n/ep=2, n/st=64, player_1/loss=626.689, player_2/loss=127.814, rew=504.00]


Epoch #1585: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1586: 1025it [00:02, 373.25it/s, env_step=1624064, len=22, n/ep=3, n/st=64, player_1/loss=389.019, player_2/loss=499.240, rew=504.67]


Epoch #1586: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1587: 1025it [00:02, 371.36it/s, env_step=1625088, len=22, n/ep=3, n/st=64, player_1/loss=503.374, player_2/loss=497.919, rew=508.67]


Epoch #1587: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1588: 1025it [00:02, 374.76it/s, env_step=1626112, len=22, n/ep=2, n/st=64, player_1/loss=646.001, player_2/loss=495.967, rew=539.00]


Epoch #1588: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1589: 1025it [00:02, 375.03it/s, env_step=1627136, len=35, n/ep=1, n/st=64, player_1/loss=367.257, player_2/loss=318.979, rew=1258.00]


Epoch #1589: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1590: 1025it [00:02, 372.58it/s, env_step=1628160, len=29, n/ep=2, n/st=64, player_1/loss=354.792, player_2/loss=479.705, rew=872.00]


Epoch #1590: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1591: 1025it [00:02, 373.12it/s, env_step=1629184, len=24, n/ep=2, n/st=64, player_1/loss=657.932, player_2/loss=242.087, rew=698.00]


Epoch #1591: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1592: 1025it [00:02, 373.94it/s, env_step=1630208, len=38, n/ep=2, n/st=64, player_1/loss=569.454, player_2/loss=157.266, rew=1519.00]


Epoch #1592: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1593: 1025it [00:02, 373.53it/s, env_step=1631232, len=38, n/ep=1, n/st=64, player_1/loss=250.318, player_2/loss=386.272, rew=1480.00]


Epoch #1593: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1594: 1025it [00:02, 375.17it/s, env_step=1632256, len=18, n/ep=3, n/st=64, player_1/loss=413.042, player_2/loss=438.402, rew=364.00]


Epoch #1594: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1595: 1025it [00:02, 373.53it/s, env_step=1633280, len=17, n/ep=4, n/st=64, player_1/loss=555.799, player_2/loss=482.599, rew=360.00]


Epoch #1595: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1596: 1025it [00:02, 372.17it/s, env_step=1634304, len=21, n/ep=3, n/st=64, player_1/loss=458.155, player_2/loss=145.418, rew=462.00]


Epoch #1596: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1597: 1025it [00:02, 373.12it/s, env_step=1635328, len=32, n/ep=2, n/st=64, player_1/loss=474.635, player_2/loss=97.914, rew=1103.00]


Epoch #1597: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1598: 1025it [00:02, 372.85it/s, env_step=1636352, len=25, n/ep=2, n/st=64, player_1/loss=480.130, player_2/loss=162.469, rew=652.00]


Epoch #1598: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1599: 1025it [00:02, 373.12it/s, env_step=1637376, len=33, n/ep=2, n/st=64, player_1/loss=687.907, player_2/loss=266.551, rew=1124.00]


Epoch #1599: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1600: 1025it [00:02, 377.10it/s, env_step=1638400, len=31, n/ep=3, n/st=64, player_1/loss=402.909, player_2/loss=348.298, rew=1000.67]


Epoch #1600: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1601: 1025it [00:02, 374.48it/s, env_step=1639424, len=28, n/ep=3, n/st=64, player_1/loss=362.191, player_2/loss=353.106, rew=844.00]


Epoch #1601: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1602: 1025it [00:02, 373.39it/s, env_step=1640448, len=26, n/ep=2, n/st=64, player_1/loss=476.678, player_2/loss=190.786, rew=725.00]


Epoch #1602: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1603: 1025it [00:02, 372.31it/s, env_step=1641472, len=22, n/ep=2, n/st=64, player_1/loss=444.331, player_2/loss=193.039, rew=529.00]


Epoch #1603: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1604: 1025it [00:02, 375.17it/s, env_step=1642496, len=29, n/ep=3, n/st=64, player_1/loss=458.316, player_2/loss=307.624, rew=924.00]


Epoch #1604: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1605: 1025it [00:02, 376.41it/s, env_step=1643520, len=25, n/ep=3, n/st=64, player_1/loss=283.500, player_2/loss=484.700, rew=721.33]


Epoch #1605: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1606: 1025it [00:02, 374.48it/s, env_step=1644544, len=33, n/ep=2, n/st=64, player_1/loss=576.682, player_2/loss=386.598, rew=1124.00]


Epoch #1606: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1607: 1025it [00:02, 373.80it/s, env_step=1645568, len=32, n/ep=2, n/st=64, player_1/loss=548.281, player_2/loss=228.220, rew=1054.00]


Epoch #1607: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1608: 1025it [00:02, 376.55it/s, env_step=1646592, len=35, n/ep=2, n/st=64, player_1/loss=607.247, player_2/loss=82.477, rew=1300.00]


Epoch #1608: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1609: 1025it [00:02, 375.17it/s, env_step=1647616, len=37, n/ep=2, n/st=64, player_1/loss=811.235, player_2/loss=65.294, rew=1448.00]


Epoch #1609: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1610: 1025it [00:02, 363.85it/s, env_step=1648640, len=32, n/ep=1, n/st=64, player_1/loss=666.426, player_2/loss=615.717, rew=1054.00]


Epoch #1610: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1611: 1025it [00:02, 370.96it/s, env_step=1649664, len=21, n/ep=3, n/st=64, player_1/loss=561.872, player_2/loss=757.691, rew=475.33]


Epoch #1611: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1612: 1025it [00:02, 374.48it/s, env_step=1650688, len=37, n/ep=1, n/st=64, player_1/loss=342.578, player_2/loss=599.657, rew=1404.00]


Epoch #1612: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1613: 1025it [00:02, 372.31it/s, env_step=1651712, len=31, n/ep=2, n/st=64, player_1/loss=197.095, player_2/loss=386.464, rew=1054.00]


Epoch #1613: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1614: 1025it [00:02, 374.89it/s, env_step=1652736, len=30, n/ep=2, n/st=64, player_1/loss=399.437, player_2/loss=263.974, rew=944.00]


Epoch #1614: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1615: 1025it [00:02, 373.12it/s, env_step=1653760, len=28, n/ep=2, n/st=64, player_1/loss=410.702, player_2/loss=312.774, rew=910.00]


Epoch #1615: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1616: 1025it [00:02, 372.03it/s, env_step=1654784, len=30, n/ep=2, n/st=64, player_1/loss=248.392, rew=959.00]  


Epoch #1616: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1617: 1025it [00:02, 374.21it/s, env_step=1655808, len=36, n/ep=2, n/st=64, player_1/loss=398.021, player_2/loss=598.806, rew=1334.00]


Epoch #1617: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1618: 1025it [00:02, 373.39it/s, env_step=1656832, len=34, n/ep=2, n/st=64, player_1/loss=439.152, player_2/loss=475.051, rew=1197.00]


Epoch #1618: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1619: 1025it [00:02, 374.76it/s, env_step=1657856, len=27, n/ep=2, n/st=64, player_1/loss=375.471, player_2/loss=432.971, rew=794.00]


Epoch #1619: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1620: 1025it [00:02, 373.80it/s, env_step=1658880, len=15, n/ep=4, n/st=64, player_1/loss=498.966, player_2/loss=256.010, rew=267.50]


Epoch #1620: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1621: 1025it [00:02, 376.27it/s, env_step=1659904, len=29, n/ep=2, n/st=64, player_1/loss=587.995, player_2/loss=322.712, rew=1069.00]


Epoch #1621: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1622: 1025it [00:02, 377.79it/s, env_step=1660928, len=29, n/ep=2, n/st=64, player_1/loss=527.323, player_2/loss=371.252, rew=900.00]


Epoch #1622: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1623: 1025it [00:02, 375.72it/s, env_step=1661952, len=18, n/ep=4, n/st=64, player_1/loss=562.135, player_2/loss=185.594, rew=362.00]


Epoch #1623: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1624: 1025it [00:02, 369.22it/s, env_step=1662976, len=27, n/ep=3, n/st=64, player_1/loss=448.550, rew=846.00]  


Epoch #1624: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1625: 1025it [00:02, 371.77it/s, env_step=1664000, len=25, n/ep=3, n/st=64, player_1/loss=570.141, player_2/loss=630.702, rew=706.67]


Epoch #1625: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1626: 1025it [00:02, 374.62it/s, env_step=1665024, len=25, n/ep=2, n/st=64, player_1/loss=458.439, player_2/loss=468.357, rew=694.00]


Epoch #1626: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1627: 1025it [00:02, 372.04it/s, env_step=1666048, len=23, n/ep=3, n/st=64, player_1/loss=492.543, player_2/loss=411.257, rew=554.67]


Epoch #1627: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1628: 1025it [00:02, 373.53it/s, env_step=1667072, len=26, n/ep=3, n/st=64, player_1/loss=453.269, player_2/loss=195.303, rew=724.00]


Epoch #1628: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1629: 1025it [00:02, 375.30it/s, env_step=1668096, len=21, n/ep=3, n/st=64, player_1/loss=484.066, player_2/loss=216.724, rew=490.00]


Epoch #1629: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1630: 1025it [00:02, 375.58it/s, env_step=1669120, len=22, n/ep=3, n/st=64, player_1/loss=518.949, player_2/loss=111.369, rew=520.00]


Epoch #1630: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1631: 1025it [00:02, 373.66it/s, env_step=1670144, len=22, n/ep=3, n/st=64, player_1/loss=170.420, player_2/loss=332.129, rew=520.00]


Epoch #1631: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1632: 1025it [00:02, 370.83it/s, env_step=1671168, len=25, n/ep=2, n/st=64, player_1/loss=86.080, player_2/loss=303.891, rew=730.00]


Epoch #1632: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1633: 1025it [00:02, 374.75it/s, env_step=1672192, len=32, n/ep=2, n/st=64, player_1/loss=114.625, player_2/loss=452.183, rew=1087.00]


Epoch #1633: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1634: 1025it [00:02, 375.58it/s, env_step=1673216, len=24, n/ep=3, n/st=64, player_1/loss=76.682, player_2/loss=605.616, rew=598.67]


Epoch #1634: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1635: 1025it [00:02, 374.89it/s, env_step=1674240, len=28, n/ep=2, n/st=64, player_1/loss=177.370, player_2/loss=432.014, rew=835.00]


Epoch #1635: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1636: 1025it [00:02, 372.98it/s, env_step=1675264, len=30, n/ep=2, n/st=64, player_1/loss=231.135, player_2/loss=428.021, rew=1031.00]


Epoch #1636: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1637: 1025it [00:02, 373.26it/s, env_step=1676288, len=24, n/ep=3, n/st=64, player_1/loss=214.124, player_2/loss=422.607, rew=602.67]


Epoch #1637: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1638: 1025it [00:02, 370.16it/s, env_step=1677312, len=22, n/ep=3, n/st=64, player_1/loss=375.277, player_2/loss=228.353, rew=556.67]


Epoch #1638: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1639: 1025it [00:02, 372.44it/s, env_step=1678336, len=30, n/ep=2, n/st=64, player_1/loss=353.024, player_2/loss=134.916, rew=937.00]


Epoch #1639: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1640: 1025it [00:02, 376.41it/s, env_step=1679360, len=27, n/ep=3, n/st=64, player_1/loss=257.058, player_2/loss=148.348, rew=776.00]


Epoch #1640: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1641: 1025it [00:02, 372.44it/s, env_step=1680384, len=29, n/ep=2, n/st=64, player_1/loss=410.791, player_2/loss=207.113, rew=900.00]


Epoch #1641: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1642: 1025it [00:02, 372.04it/s, env_step=1681408, len=36, n/ep=2, n/st=64, player_1/loss=501.416, player_2/loss=204.981, rew=1367.00]


Epoch #1642: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1643: 1025it [00:02, 376.41it/s, env_step=1682432, len=32, n/ep=2, n/st=64, player_1/loss=211.235, player_2/loss=351.376, rew=1063.00]


Epoch #1643: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1644: 1025it [00:02, 374.07it/s, env_step=1683456, len=23, n/ep=2, n/st=64, player_1/loss=420.087, player_2/loss=300.693, rew=630.00]


Epoch #1644: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1645: 1025it [00:02, 372.04it/s, env_step=1684480, len=17, n/ep=4, n/st=64, player_1/loss=441.205, player_2/loss=203.388, rew=307.50]


Epoch #1645: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1646: 1025it [00:02, 374.07it/s, env_step=1685504, len=31, n/ep=1, n/st=64, player_1/loss=405.660, player_2/loss=367.361, rew=990.00]


Epoch #1646: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1647: 1025it [00:02, 372.71it/s, env_step=1686528, len=24, n/ep=2, n/st=64, player_1/loss=502.197, player_2/loss=484.367, rew=623.00]


Epoch #1647: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1648: 1025it [00:02, 377.10it/s, env_step=1687552, len=32, n/ep=2, n/st=64, player_1/loss=525.836, player_2/loss=817.517, rew=1129.00]


Epoch #1648: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1649: 1025it [00:02, 370.42it/s, env_step=1688576, len=23, n/ep=2, n/st=64, player_1/loss=308.464, player_2/loss=518.593, rew=550.00]


Epoch #1649: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1650: 1025it [00:02, 370.96it/s, env_step=1689600, len=33, n/ep=2, n/st=64, player_1/loss=431.685, player_2/loss=221.139, rew=1145.00]


Epoch #1650: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1651: 1025it [00:02, 373.66it/s, env_step=1690624, len=25, n/ep=3, n/st=64, player_1/loss=697.613, player_2/loss=312.535, rew=710.00]


Epoch #1651: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1652: 1025it [00:02, 373.26it/s, env_step=1691648, len=26, n/ep=3, n/st=64, player_1/loss=574.640, player_2/loss=284.075, rew=718.67]


Epoch #1652: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1653: 1025it [00:02, 373.26it/s, env_step=1692672, len=27, n/ep=2, n/st=64, player_1/loss=503.788, player_2/loss=472.640, rew=788.00]


Epoch #1653: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1654: 1025it [00:02, 373.53it/s, env_step=1693696, len=33, n/ep=2, n/st=64, player_1/loss=642.527, player_2/loss=681.349, rew=1160.00]


Epoch #1654: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1655: 1025it [00:02, 373.26it/s, env_step=1694720, len=28, n/ep=2, n/st=64, player_1/loss=476.987, player_2/loss=500.107, rew=814.00]


Epoch #1655: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1656: 1025it [00:02, 377.79it/s, env_step=1695744, len=21, n/ep=3, n/st=64, player_1/loss=169.654, player_2/loss=307.656, rew=493.33]


Epoch #1656: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1657: 1025it [00:02, 370.83it/s, env_step=1696768, len=29, n/ep=2, n/st=64, player_1/loss=535.721, player_2/loss=332.825, rew=898.00]


Epoch #1657: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1658: 1025it [00:02, 375.03it/s, env_step=1697792, len=24, n/ep=3, n/st=64, player_1/loss=584.349, player_2/loss=167.927, rew=622.67]


Epoch #1658: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1659: 1025it [00:02, 375.30it/s, env_step=1698816, len=25, n/ep=2, n/st=64, player_1/loss=356.380, player_2/loss=145.678, rew=674.00]


Epoch #1659: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1660: 1025it [00:02, 372.04it/s, env_step=1699840, len=21, n/ep=3, n/st=64, player_1/loss=412.896, player_2/loss=183.424, rew=472.67]


Epoch #1660: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1661: 1025it [00:02, 372.98it/s, env_step=1700864, len=39, n/ep=1, n/st=64, player_1/loss=422.271, player_2/loss=244.626, rew=1558.00]


Epoch #1661: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1662: 1025it [00:02, 373.26it/s, env_step=1701888, len=24, n/ep=2, n/st=64, player_1/loss=404.830, player_2/loss=221.056, rew=679.00]


Epoch #1662: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1663: 1025it [00:02, 371.90it/s, env_step=1702912, len=28, n/ep=2, n/st=64, player_1/loss=228.043, player_2/loss=201.317, rew=845.00]


Epoch #1663: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1664: 1025it [00:02, 375.86it/s, env_step=1703936, len=29, n/ep=2, n/st=64, player_1/loss=143.282, player_2/loss=233.981, rew=970.00]


Epoch #1664: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1665: 1025it [00:02, 372.17it/s, env_step=1704960, len=29, n/ep=3, n/st=64, player_1/loss=420.841, player_2/loss=180.018, rew=956.00]


Epoch #1665: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1666: 1025it [00:02, 375.30it/s, env_step=1705984, len=31, n/ep=2, n/st=64, player_1/loss=677.950, player_2/loss=236.033, rew=1064.00]


Epoch #1666: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1667: 1025it [00:02, 371.77it/s, env_step=1707008, len=30, n/ep=3, n/st=64, player_1/loss=632.588, player_2/loss=299.284, rew=949.33]


Epoch #1667: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1668: 1025it [00:02, 372.04it/s, env_step=1708032, len=26, n/ep=3, n/st=64, player_1/loss=529.199, player_2/loss=118.549, rew=738.00]


Epoch #1668: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1669: 1025it [00:02, 375.72it/s, env_step=1709056, len=24, n/ep=3, n/st=64, player_1/loss=421.893, player_2/loss=55.264, rew=647.33]


Epoch #1669: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1670: 1025it [00:02, 372.17it/s, env_step=1710080, len=22, n/ep=3, n/st=64, player_1/loss=296.843, player_2/loss=286.978, rew=576.67]


Epoch #1670: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1671: 1025it [00:02, 374.62it/s, env_step=1711104, len=27, n/ep=3, n/st=64, player_1/loss=268.438, player_2/loss=327.926, rew=880.67]


Epoch #1671: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1672: 1025it [00:02, 370.69it/s, env_step=1712128, len=31, n/ep=2, n/st=64, player_1/loss=339.388, player_2/loss=299.916, rew=1024.00]


Epoch #1672: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1673: 1025it [00:02, 374.89it/s, env_step=1713152, len=30, n/ep=2, n/st=64, player_1/loss=472.706, player_2/loss=148.497, rew=944.00]


Epoch #1673: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1674: 1025it [00:02, 371.23it/s, env_step=1714176, len=22, n/ep=3, n/st=64, player_1/loss=293.819, player_2/loss=133.460, rew=536.00]


Epoch #1674: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1675: 1025it [00:02, 375.99it/s, env_step=1715200, len=27, n/ep=3, n/st=64, player_1/loss=258.176, player_2/loss=67.271, rew=807.33]


Epoch #1675: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1676: 1025it [00:02, 371.77it/s, env_step=1716224, len=19, n/ep=3, n/st=64, player_1/loss=162.945, player_2/loss=227.322, rew=424.67]


Epoch #1676: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1677: 1025it [00:02, 374.07it/s, env_step=1717248, len=30, n/ep=2, n/st=64, player_1/loss=54.932, player_2/loss=868.535, rew=961.00]


Epoch #1677: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1678: 1025it [00:02, 371.63it/s, env_step=1718272, len=29, n/ep=2, n/st=64, player_1/loss=393.346, player_2/loss=772.709, rew=900.00]


Epoch #1678: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1679: 1025it [00:02, 372.03it/s, env_step=1719296, len=25, n/ep=3, n/st=64, player_1/loss=494.086, player_2/loss=490.393, rew=696.67]


Epoch #1679: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1680: 1025it [00:02, 373.94it/s, env_step=1720320, len=25, n/ep=3, n/st=64, player_1/loss=150.123, player_2/loss=635.484, rew=648.67]


Epoch #1680: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1681: 1025it [00:02, 375.30it/s, env_step=1721344, len=22, n/ep=3, n/st=64, player_1/loss=180.528, player_2/loss=642.919, rew=582.67]


Epoch #1681: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1682: 1025it [00:02, 372.85it/s, env_step=1722368, len=26, n/ep=3, n/st=64, player_1/loss=462.710, player_2/loss=256.626, rew=723.33]


Epoch #1682: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1683: 1025it [00:02, 373.12it/s, env_step=1723392, len=25, n/ep=2, n/st=64, player_1/loss=459.361, player_2/loss=117.056, rew=674.00]


Epoch #1683: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1684: 1025it [00:02, 373.80it/s, env_step=1724416, len=25, n/ep=3, n/st=64, player_1/loss=236.671, player_2/loss=176.987, rew=656.67]


Epoch #1684: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1685: 1025it [00:02, 375.99it/s, env_step=1725440, len=30, n/ep=2, n/st=64, player_1/loss=484.772, player_2/loss=341.732, rew=992.00]


Epoch #1685: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1686: 1025it [00:02, 372.44it/s, env_step=1726464, len=26, n/ep=2, n/st=64, player_1/loss=729.568, player_2/loss=431.074, rew=727.00]


Epoch #1686: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1687: 1025it [00:02, 372.85it/s, env_step=1727488, len=25, n/ep=2, n/st=64, player_1/loss=790.919, player_2/loss=354.278, rew=674.00]


Epoch #1687: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1688: 1025it [00:02, 374.89it/s, env_step=1728512, len=27, n/ep=3, n/st=64, player_1/loss=230.884, player_2/loss=231.979, rew=944.00]


Epoch #1688: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1689: 1025it [00:02, 375.30it/s, env_step=1729536, len=13, n/ep=7, n/st=64, player_1/loss=175.382, player_2/loss=317.119, rew=280.29]


Epoch #1689: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1690: 1025it [00:02, 372.44it/s, env_step=1730560, len=34, n/ep=1, n/st=64, player_1/loss=190.152, player_2/loss=213.485, rew=1188.00]


Epoch #1690: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1691: 1025it [00:02, 375.99it/s, env_step=1731584, len=33, n/ep=2, n/st=64, player_1/loss=267.684, player_2/loss=633.660, rew=1154.00]


Epoch #1691: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1692: 1025it [00:02, 373.93it/s, env_step=1732608, len=24, n/ep=3, n/st=64, player_1/loss=151.106, player_2/loss=541.274, rew=696.67]


Epoch #1692: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1693: 1025it [00:02, 370.16it/s, env_step=1733632, len=31, n/ep=2, n/st=64, player_1/loss=352.874, rew=1054.00] 


Epoch #1693: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1694: 1025it [00:02, 375.85it/s, env_step=1734656, len=17, n/ep=4, n/st=64, player_1/loss=915.245, player_2/loss=580.638, rew=326.00]


Epoch #1694: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1695: 1025it [00:02, 371.77it/s, env_step=1735680, len=14, n/ep=4, n/st=64, player_1/loss=876.458, player_2/loss=894.049, rew=225.00]


Epoch #1695: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1696: 1025it [00:02, 371.77it/s, env_step=1736704, len=17, n/ep=3, n/st=64, player_1/loss=649.185, rew=332.00]  


Epoch #1696: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1697: 1025it [00:02, 372.44it/s, env_step=1737728, len=15, n/ep=4, n/st=64, player_1/loss=523.576, rew=251.00]  


Epoch #1697: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1698: 1025it [00:02, 357.88it/s, env_step=1738752, len=14, n/ep=5, n/st=64, player_1/loss=141.789, player_2/loss=404.882, rew=227.20]


Epoch #1698: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1699: 1025it [00:02, 365.02it/s, env_step=1739776, len=15, n/ep=5, n/st=64, player_1/loss=304.984, player_2/loss=250.014, rew=250.80]


Epoch #1699: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1700: 1025it [00:02, 363.72it/s, env_step=1740800, len=28, n/ep=3, n/st=64, player_1/loss=312.141, player_2/loss=174.694, rew=867.33]


Epoch #1700: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1701: 1025it [00:02, 370.56it/s, env_step=1741824, len=22, n/ep=3, n/st=64, player_1/loss=382.521, player_2/loss=418.380, rew=582.67]


Epoch #1701: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1702: 1025it [00:02, 377.24it/s, env_step=1742848, len=28, n/ep=2, n/st=64, player_1/loss=322.852, player_2/loss=419.004, rew=846.00]


Epoch #1702: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1703: 1025it [00:02, 371.23it/s, env_step=1743872, len=19, n/ep=3, n/st=64, player_1/loss=242.447, rew=415.33]  


Epoch #1703: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1704: 1025it [00:02, 373.26it/s, env_step=1744896, len=20, n/ep=3, n/st=64, player_1/loss=170.780, player_2/loss=430.852, rew=503.33]


Epoch #1704: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1705: 1025it [00:02, 373.66it/s, env_step=1745920, len=22, n/ep=2, n/st=64, player_1/loss=127.357, player_2/loss=834.328, rew=539.00]


Epoch #1705: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1706: 1025it [00:02, 371.23it/s, env_step=1746944, len=15, n/ep=4, n/st=64, player_1/loss=107.958, player_2/loss=758.282, rew=272.50]


Epoch #1706: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1707: 1025it [00:02, 376.41it/s, env_step=1747968, len=30, n/ep=2, n/st=64, player_1/loss=515.716, player_2/loss=1008.697, rew=1031.00]


Epoch #1707: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1708: 1025it [00:02, 373.53it/s, env_step=1748992, len=21, n/ep=3, n/st=64, player_1/loss=558.697, player_2/loss=937.032, rew=460.67]


Epoch #1708: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1709: 1025it [00:02, 370.96it/s, env_step=1750016, len=22, n/ep=3, n/st=64, player_2/loss=572.249, rew=540.67]  


Epoch #1709: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1710: 1025it [00:02, 372.17it/s, env_step=1751040, len=17, n/ep=4, n/st=64, player_1/loss=184.799, player_2/loss=583.926, rew=358.50]


Epoch #1710: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1711: 1025it [00:02, 370.42it/s, env_step=1752064, len=29, n/ep=3, n/st=64, player_1/loss=159.516, player_2/loss=536.803, rew=888.00]


Epoch #1711: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1712: 1025it [00:02, 375.30it/s, env_step=1753088, len=23, n/ep=2, n/st=64, player_1/loss=223.496, player_2/loss=293.402, rew=551.00]


Epoch #1712: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1713: 1025it [00:02, 373.94it/s, env_step=1754112, len=18, n/ep=4, n/st=64, player_1/loss=118.873, player_2/loss=466.159, rew=407.00]


Epoch #1713: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1714: 1025it [00:02, 368.82it/s, env_step=1755136, len=24, n/ep=2, n/st=64, player_1/loss=97.663, player_2/loss=272.750, rew=599.00]


Epoch #1714: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1715: 1025it [00:02, 375.44it/s, env_step=1756160, len=19, n/ep=3, n/st=64, player_2/loss=263.106, rew=405.33]  


Epoch #1715: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1716: 1025it [00:02, 377.24it/s, env_step=1757184, len=25, n/ep=3, n/st=64, player_1/loss=440.452, player_2/loss=185.281, rew=656.67]


Epoch #1716: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1717: 1025it [00:02, 362.69it/s, env_step=1758208, len=23, n/ep=3, n/st=64, player_1/loss=306.470, player_2/loss=170.204, rew=566.67]


Epoch #1717: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1718: 1025it [00:02, 373.26it/s, env_step=1759232, len=21, n/ep=2, n/st=64, player_1/loss=263.909, player_2/loss=305.537, rew=488.00]


Epoch #1718: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1719: 1025it [00:02, 371.09it/s, env_step=1760256, len=31, n/ep=2, n/st=64, player_1/loss=263.386, player_2/loss=352.625, rew=990.00]


Epoch #1719: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1720: 1025it [00:02, 371.50it/s, env_step=1761280, len=29, n/ep=2, n/st=64, player_1/loss=237.784, player_2/loss=395.634, rew=868.00]


Epoch #1720: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1721: 1025it [00:02, 373.93it/s, env_step=1762304, len=29, n/ep=2, n/st=64, player_1/loss=317.807, player_2/loss=450.338, rew=872.00]


Epoch #1721: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1722: 1025it [00:02, 368.82it/s, env_step=1763328, len=32, n/ep=2, n/st=64, player_1/loss=409.825, player_2/loss=362.113, rew=1117.00]


Epoch #1722: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1723: 1025it [00:02, 375.03it/s, env_step=1764352, len=33, n/ep=2, n/st=64, player_1/loss=286.701, player_2/loss=561.044, rew=1129.00]


Epoch #1723: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1724: 1025it [00:02, 375.03it/s, env_step=1765376, len=21, n/ep=4, n/st=64, player_1/loss=106.148, player_2/loss=508.479, rew=607.00]


Epoch #1724: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1725: 1025it [00:02, 370.56it/s, env_step=1766400, len=21, n/ep=3, n/st=64, player_1/loss=146.809, player_2/loss=292.641, rew=475.33]


Epoch #1725: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1726: 1025it [00:02, 373.39it/s, env_step=1767424, len=26, n/ep=3, n/st=64, player_1/loss=211.595, player_2/loss=254.081, rew=745.33]


Epoch #1726: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1727: 1025it [00:02, 373.39it/s, env_step=1768448, len=20, n/ep=3, n/st=64, player_1/loss=277.584, player_2/loss=154.657, rew=432.67]


Epoch #1727: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1728: 1025it [00:02, 367.90it/s, env_step=1769472, len=27, n/ep=2, n/st=64, player_1/loss=260.315, player_2/loss=157.522, rew=782.00]


Epoch #1728: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1729: 1025it [00:02, 375.58it/s, env_step=1770496, len=38, n/ep=2, n/st=64, player_1/loss=231.810, player_2/loss=621.423, rew=1480.00]


Epoch #1729: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1730: 1025it [00:02, 371.09it/s, env_step=1771520, len=15, n/ep=4, n/st=64, player_1/loss=179.943, player_2/loss=662.552, rew=309.00]


Epoch #1730: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1731: 1025it [00:02, 371.36it/s, env_step=1772544, len=30, n/ep=3, n/st=64, player_1/loss=168.878, player_2/loss=472.933, rew=1032.00]


Epoch #1731: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1732: 1025it [00:02, 372.71it/s, env_step=1773568, len=20, n/ep=4, n/st=64, player_1/loss=307.197, player_2/loss=470.154, rew=504.00]


Epoch #1732: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1733: 1025it [00:02, 371.77it/s, env_step=1774592, len=21, n/ep=3, n/st=64, player_1/loss=426.065, player_2/loss=201.429, rew=476.00]


Epoch #1733: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1734: 1025it [00:02, 370.02it/s, env_step=1775616, len=26, n/ep=3, n/st=64, player_1/loss=199.318, player_2/loss=355.481, rew=744.00]


Epoch #1734: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1735: 1025it [00:02, 370.96it/s, env_step=1776640, len=27, n/ep=2, n/st=64, player_1/loss=304.189, player_2/loss=361.012, rew=754.00]


Epoch #1735: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1736: 1025it [00:02, 372.85it/s, env_step=1777664, len=16, n/ep=4, n/st=64, player_1/loss=376.656, player_2/loss=281.305, rew=290.00]


Epoch #1736: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1737: 1025it [00:02, 372.71it/s, env_step=1778688, len=16, n/ep=3, n/st=64, player_1/loss=285.071, player_2/loss=276.707, rew=289.33]


Epoch #1737: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1738: 1025it [00:02, 369.89it/s, env_step=1779712, len=31, n/ep=2, n/st=64, player_1/loss=241.082, player_2/loss=116.421, rew=1078.00]


Epoch #1738: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1739: 1025it [00:02, 370.83it/s, env_step=1780736, len=8, n/ep=8, n/st=64, player_1/loss=311.896, player_2/loss=230.220, rew=85.25]


Epoch #1739: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1740: 1025it [00:02, 368.43it/s, env_step=1781760, len=14, n/ep=4, n/st=64, player_1/loss=278.925, player_2/loss=757.903, rew=237.00]


Epoch #1740: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1741: 1025it [00:02, 370.83it/s, env_step=1782784, len=9, n/ep=7, n/st=64, player_1/loss=152.752, player_2/loss=602.967, rew=90.00]


Epoch #1741: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1742: 1025it [00:02, 372.71it/s, env_step=1783808, len=9, n/ep=6, n/st=64, player_1/loss=103.260, player_2/loss=591.494, rew=101.00]


Epoch #1742: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1743: 1025it [00:02, 370.56it/s, env_step=1784832, len=15, n/ep=4, n/st=64, player_1/loss=28.900, player_2/loss=497.794, rew=257.00]


Epoch #1743: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1744: 1025it [00:02, 371.77it/s, env_step=1785856, len=27, n/ep=2, n/st=64, player_1/loss=86.178, player_2/loss=471.758, rew=782.00]


Epoch #1744: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1745: 1025it [00:02, 372.71it/s, env_step=1786880, len=24, n/ep=2, n/st=64, player_1/loss=247.973, player_2/loss=326.187, rew=635.00]


Epoch #1745: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1746: 1025it [00:02, 371.09it/s, env_step=1787904, len=14, n/ep=4, n/st=64, player_1/loss=255.100, player_2/loss=480.146, rew=210.00]


Epoch #1746: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1747: 1025it [00:02, 372.31it/s, env_step=1788928, len=15, n/ep=5, n/st=64, player_1/loss=98.152, player_2/loss=564.165, rew=257.20]


Epoch #1747: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1748: 1025it [00:02, 375.72it/s, env_step=1789952, len=16, n/ep=4, n/st=64, player_1/loss=62.215, player_2/loss=639.345, rew=276.00]


Epoch #1748: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1749: 1025it [00:02, 372.44it/s, env_step=1790976, len=13, n/ep=5, n/st=64, player_1/loss=117.125, player_2/loss=590.287, rew=207.20]


Epoch #1749: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1750: 1025it [00:02, 372.58it/s, env_step=1792000, len=8, n/ep=8, n/st=64, player_1/loss=156.644, player_2/loss=397.831, rew=81.75]


Epoch #1750: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1751: 1025it [00:02, 372.85it/s, env_step=1793024, len=14, n/ep=4, n/st=64, player_1/loss=204.221, player_2/loss=443.918, rew=219.00]


Epoch #1751: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1752: 1025it [00:02, 368.29it/s, env_step=1794048, len=17, n/ep=4, n/st=64, player_1/loss=235.854, player_2/loss=322.964, rew=335.50]


Epoch #1752: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1753: 1025it [00:02, 374.21it/s, env_step=1795072, len=27, n/ep=2, n/st=64, player_1/loss=383.576, player_2/loss=374.843, rew=758.00]


Epoch #1753: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1754: 1025it [00:02, 371.50it/s, env_step=1796096, len=18, n/ep=3, n/st=64, player_1/loss=402.683, player_2/loss=528.690, rew=344.67]


Epoch #1754: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1755: 1025it [00:02, 373.94it/s, env_step=1797120, len=20, n/ep=3, n/st=64, player_1/loss=151.929, player_2/loss=373.404, rew=432.67]


Epoch #1755: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1756: 1025it [00:02, 370.69it/s, env_step=1798144, len=24, n/ep=3, n/st=64, player_1/loss=132.823, player_2/loss=205.476, rew=653.33]


Epoch #1756: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1757: 1025it [00:02, 368.82it/s, env_step=1799168, len=15, n/ep=5, n/st=64, player_1/loss=170.000, player_2/loss=295.634, rew=274.00]


Epoch #1757: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1758: 1025it [00:02, 372.44it/s, env_step=1800192, len=20, n/ep=3, n/st=64, player_1/loss=129.653, player_2/loss=472.180, rew=446.00]


Epoch #1758: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1759: 1025it [00:02, 371.09it/s, env_step=1801216, len=29, n/ep=2, n/st=64, player_1/loss=71.870, player_2/loss=633.499, rew=910.00]


Epoch #1759: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1760: 1025it [00:02, 370.82it/s, env_step=1802240, len=26, n/ep=2, n/st=64, player_1/loss=343.970, player_2/loss=377.556, rew=747.00]


Epoch #1760: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1761: 1025it [00:02, 374.89it/s, env_step=1803264, len=14, n/ep=5, n/st=64, player_1/loss=311.229, player_2/loss=180.558, rew=291.20]


Epoch #1761: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1762: 1025it [00:02, 371.63it/s, env_step=1804288, len=15, n/ep=4, n/st=64, player_1/loss=217.909, player_2/loss=158.177, rew=258.50]


Epoch #1762: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1763: 1025it [00:02, 372.98it/s, env_step=1805312, len=23, n/ep=2, n/st=64, player_1/loss=170.562, player_2/loss=219.396, rew=580.00]


Epoch #1763: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1764: 1025it [00:02, 371.77it/s, env_step=1806336, len=22, n/ep=3, n/st=64, player_1/loss=185.437, player_2/loss=194.463, rew=542.00]


Epoch #1764: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1765: 1025it [00:02, 371.77it/s, env_step=1807360, len=21, n/ep=2, n/st=64, player_1/loss=176.118, player_2/loss=299.053, rew=460.00]


Epoch #1765: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1766: 1025it [00:02, 375.17it/s, env_step=1808384, len=16, n/ep=4, n/st=64, player_1/loss=283.347, player_2/loss=310.182, rew=318.50]


Epoch #1766: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1767: 1025it [00:02, 370.42it/s, env_step=1809408, len=13, n/ep=5, n/st=64, player_1/loss=329.747, player_2/loss=268.200, rew=216.00]


Epoch #1767: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1768: 1025it [00:02, 371.36it/s, env_step=1810432, len=17, n/ep=3, n/st=64, player_1/loss=207.899, player_2/loss=366.262, rew=328.67]


Epoch #1768: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1769: 1025it [00:02, 371.23it/s, env_step=1811456, len=15, n/ep=4, n/st=64, player_1/loss=147.300, player_2/loss=201.668, rew=254.00]


Epoch #1769: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1770: 1025it [00:02, 375.17it/s, env_step=1812480, len=17, n/ep=4, n/st=64, player_1/loss=100.026, player_2/loss=86.598, rew=325.00]


Epoch #1770: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1771: 1025it [00:02, 374.76it/s, env_step=1813504, len=15, n/ep=4, n/st=64, player_1/loss=107.804, player_2/loss=102.487, rew=254.50]


Epoch #1771: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1772: 1025it [00:02, 374.07it/s, env_step=1814528, len=21, n/ep=3, n/st=64, player_1/loss=83.988, player_2/loss=77.764, rew=480.67]


Epoch #1772: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1773: 1025it [00:02, 374.35it/s, env_step=1815552, len=17, n/ep=3, n/st=64, player_1/loss=77.718, player_2/loss=172.325, rew=306.00]


Epoch #1773: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1774: 1025it [00:02, 374.35it/s, env_step=1816576, len=22, n/ep=3, n/st=64, player_1/loss=347.302, player_2/loss=156.870, rew=520.00]


Epoch #1774: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1775: 1025it [00:02, 371.23it/s, env_step=1817600, len=20, n/ep=2, n/st=64, player_1/loss=437.495, player_2/loss=114.568, rew=441.00]


Epoch #1775: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1776: 1025it [00:02, 373.26it/s, env_step=1818624, len=16, n/ep=4, n/st=64, player_1/loss=223.001, player_2/loss=156.269, rew=300.50]


Epoch #1776: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1777: 1025it [00:02, 369.49it/s, env_step=1819648, len=16, n/ep=4, n/st=64, player_1/loss=139.826, player_2/loss=84.387, rew=317.50]


Epoch #1777: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1778: 1025it [00:02, 373.66it/s, env_step=1820672, len=33, n/ep=2, n/st=64, player_1/loss=126.688, player_2/loss=202.732, rew=1121.00]


Epoch #1778: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1779: 1025it [00:02, 370.56it/s, env_step=1821696, len=16, n/ep=4, n/st=64, player_1/loss=132.495, player_2/loss=250.472, rew=323.00]


Epoch #1779: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1780: 1025it [00:02, 371.36it/s, env_step=1822720, len=13, n/ep=4, n/st=64, player_1/loss=181.502, player_2/loss=251.112, rew=192.00]


Epoch #1780: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1781: 1025it [00:02, 372.04it/s, env_step=1823744, len=10, n/ep=4, n/st=64, player_1/loss=171.310, player_2/loss=176.254, rew=121.50]


Epoch #1781: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1782: 1025it [00:02, 369.62it/s, env_step=1824768, len=17, n/ep=3, n/st=64, player_1/loss=82.770, player_2/loss=270.482, rew=329.33]


Epoch #1782: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1783: 1025it [00:02, 370.96it/s, env_step=1825792, len=21, n/ep=3, n/st=64, player_1/loss=255.242, player_2/loss=344.173, rew=498.67]


Epoch #1783: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1784: 1025it [00:02, 371.77it/s, env_step=1826816, len=21, n/ep=3, n/st=64, player_1/loss=290.670, player_2/loss=232.739, rew=460.67]


Epoch #1784: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1785: 1025it [00:02, 369.36it/s, env_step=1827840, len=18, n/ep=3, n/st=64, player_1/loss=196.694, player_2/loss=136.860, rew=367.33]


Epoch #1785: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1786: 1025it [00:02, 372.17it/s, env_step=1828864, len=9, n/ep=7, n/st=64, player_1/loss=177.598, player_2/loss=106.504, rew=110.57]


Epoch #1786: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1787: 1025it [00:02, 369.49it/s, env_step=1829888, len=15, n/ep=5, n/st=64, player_1/loss=188.502, player_2/loss=333.818, rew=278.40]


Epoch #1787: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1788: 1025it [00:02, 374.76it/s, env_step=1830912, len=20, n/ep=4, n/st=64, player_1/loss=187.166, player_2/loss=458.788, rew=452.50]


Epoch #1788: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1789: 1025it [00:02, 370.96it/s, env_step=1831936, len=15, n/ep=4, n/st=64, player_1/loss=112.385, player_2/loss=126.052, rew=254.00]


Epoch #1789: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1790: 1025it [00:02, 371.50it/s, env_step=1832960, len=19, n/ep=3, n/st=64, player_1/loss=132.917, player_2/loss=29.678, rew=380.67]


Epoch #1790: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1791: 1025it [00:02, 375.17it/s, env_step=1833984, len=17, n/ep=4, n/st=64, player_1/loss=139.880, player_2/loss=183.334, rew=326.00]


Epoch #1791: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1792: 1025it [00:02, 371.50it/s, env_step=1835008, len=16, n/ep=4, n/st=64, player_1/loss=110.507, player_2/loss=269.516, rew=272.00]


Epoch #1792: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1793: 1025it [00:02, 374.21it/s, env_step=1836032, len=15, n/ep=4, n/st=64, player_1/loss=140.711, player_2/loss=135.353, rew=249.00]


Epoch #1793: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1794: 1025it [00:02, 372.98it/s, env_step=1837056, len=21, n/ep=3, n/st=64, player_1/loss=160.143, player_2/loss=200.611, rew=550.67]


Epoch #1794: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1795: 1025it [00:02, 370.29it/s, env_step=1838080, len=18, n/ep=3, n/st=64, player_1/loss=124.101, player_2/loss=356.636, rew=365.33]


Epoch #1795: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1796: 1025it [00:02, 374.48it/s, env_step=1839104, len=22, n/ep=3, n/st=64, player_1/loss=82.737, player_2/loss=406.769, rew=569.33]


Epoch #1796: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1797: 1025it [00:02, 370.15it/s, env_step=1840128, len=21, n/ep=3, n/st=64, player_1/loss=56.077, player_2/loss=386.821, rew=462.00]


Epoch #1797: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1798: 1025it [00:02, 373.12it/s, env_step=1841152, len=20, n/ep=4, n/st=64, player_1/loss=111.859, player_2/loss=378.679, rew=527.00]


Epoch #1798: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1799: 1025it [00:02, 370.29it/s, env_step=1842176, len=19, n/ep=3, n/st=64, player_1/loss=169.739, player_2/loss=358.786, rew=386.67]


Epoch #1799: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1800: 1025it [00:02, 367.11it/s, env_step=1843200, len=18, n/ep=4, n/st=64, player_1/loss=167.061, player_2/loss=241.473, rew=341.00]


Epoch #1800: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1801: 1025it [00:02, 372.17it/s, env_step=1844224, len=28, n/ep=3, n/st=64, player_1/loss=159.311, player_2/loss=155.311, rew=816.00]


Epoch #1801: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1802: 1025it [00:02, 372.17it/s, env_step=1845248, len=29, n/ep=2, n/st=64, player_1/loss=163.915, player_2/loss=301.756, rew=872.00]


Epoch #1802: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1803: 1025it [00:02, 372.85it/s, env_step=1846272, len=10, n/ep=6, n/st=64, player_1/loss=458.467, player_2/loss=372.318, rew=124.00]


Epoch #1803: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1804: 1025it [00:02, 372.44it/s, env_step=1847296, len=8, n/ep=7, n/st=64, player_1/loss=308.242, player_2/loss=348.855, rew=79.71]


Epoch #1804: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1805: 1025it [00:02, 367.24it/s, env_step=1848320, len=15, n/ep=4, n/st=64, player_1/loss=99.414, player_2/loss=199.768, rew=254.00]


Epoch #1805: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1806: 1025it [00:02, 373.26it/s, env_step=1849344, len=14, n/ep=5, n/st=64, player_1/loss=141.766, rew=216.40]  


Epoch #1806: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1807: 1025it [00:02, 372.17it/s, env_step=1850368, len=22, n/ep=3, n/st=64, player_1/loss=218.362, player_2/loss=268.802, rew=586.00]


Epoch #1807: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1808: 1025it [00:02, 371.23it/s, env_step=1851392, len=23, n/ep=3, n/st=64, player_1/loss=170.570, player_2/loss=407.421, rew=614.00]


Epoch #1808: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1809: 1025it [00:02, 372.85it/s, env_step=1852416, len=15, n/ep=4, n/st=64, player_1/loss=74.870, player_2/loss=363.927, rew=254.50]


Epoch #1809: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1810: 1025it [00:02, 373.80it/s, env_step=1853440, len=12, n/ep=5, n/st=64, player_1/loss=92.182, player_2/loss=262.468, rew=181.20]


Epoch #1810: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1811: 1025it [00:02, 374.21it/s, env_step=1854464, len=18, n/ep=3, n/st=64, player_1/loss=119.386, player_2/loss=86.026, rew=367.33]


Epoch #1811: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1812: 1025it [00:02, 369.76it/s, env_step=1855488, len=15, n/ep=4, n/st=64, player_1/loss=122.262, player_2/loss=192.274, rew=254.00]


Epoch #1812: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1813: 1025it [00:02, 371.09it/s, env_step=1856512, len=16, n/ep=4, n/st=64, player_1/loss=181.502, player_2/loss=316.707, rew=291.50]


Epoch #1813: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1814: 1025it [00:02, 370.16it/s, env_step=1857536, len=17, n/ep=3, n/st=64, player_1/loss=196.102, player_2/loss=256.878, rew=317.33]


Epoch #1814: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1815: 1025it [00:02, 372.71it/s, env_step=1858560, len=15, n/ep=5, n/st=64, player_1/loss=233.098, player_2/loss=179.072, rew=252.40]


Epoch #1815: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1816: 1025it [00:02, 374.07it/s, env_step=1859584, len=21, n/ep=3, n/st=64, player_1/loss=183.095, player_2/loss=206.554, rew=532.67]


Epoch #1816: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1817: 1025it [00:02, 370.16it/s, env_step=1860608, len=20, n/ep=3, n/st=64, player_1/loss=107.890, player_2/loss=304.415, rew=426.67]


Epoch #1817: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1818: 1025it [00:02, 375.30it/s, env_step=1861632, len=22, n/ep=3, n/st=64, player_1/loss=180.437, player_2/loss=151.831, rew=506.00]


Epoch #1818: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1819: 1025it [00:02, 369.36it/s, env_step=1862656, len=20, n/ep=4, n/st=64, player_1/loss=202.672, player_2/loss=64.383, rew=445.50]


Epoch #1819: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1820: 1025it [00:02, 374.76it/s, env_step=1863680, len=21, n/ep=3, n/st=64, player_1/loss=139.467, player_2/loss=87.327, rew=462.67]


Epoch #1820: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1821: 1025it [00:02, 373.39it/s, env_step=1864704, len=17, n/ep=4, n/st=64, player_1/loss=166.620, player_2/loss=193.797, rew=306.00]


Epoch #1821: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1822: 1025it [00:02, 371.09it/s, env_step=1865728, len=18, n/ep=4, n/st=64, player_1/loss=198.336, player_2/loss=230.993, rew=377.00]


Epoch #1822: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1823: 1025it [00:02, 371.23it/s, env_step=1866752, len=17, n/ep=4, n/st=64, player_1/loss=120.185, player_2/loss=194.380, rew=326.50]


Epoch #1823: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1824: 1025it [00:02, 362.69it/s, env_step=1867776, len=29, n/ep=2, n/st=64, player_1/loss=128.762, player_2/loss=175.260, rew=988.00]


Epoch #1824: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1825: 1025it [00:02, 360.91it/s, env_step=1868800, len=22, n/ep=3, n/st=64, player_1/loss=151.439, player_2/loss=148.470, rew=532.67]


Epoch #1825: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1826: 1025it [00:02, 372.04it/s, env_step=1869824, len=23, n/ep=2, n/st=64, player_1/loss=146.655, player_2/loss=133.783, rew=576.00]


Epoch #1826: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1827: 1025it [00:02, 370.16it/s, env_step=1870848, len=26, n/ep=3, n/st=64, player_1/loss=353.244, player_2/loss=216.914, rew=760.67]


Epoch #1827: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1828: 1025it [00:02, 372.44it/s, env_step=1871872, len=21, n/ep=3, n/st=64, player_1/loss=301.124, player_2/loss=234.456, rew=475.33]


Epoch #1828: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1829: 1025it [00:02, 371.50it/s, env_step=1872896, len=26, n/ep=2, n/st=64, player_1/loss=160.603, player_2/loss=196.823, rew=837.00]


Epoch #1829: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1830: 1025it [00:02, 374.48it/s, env_step=1873920, len=22, n/ep=3, n/st=64, player_1/loss=131.977, player_2/loss=159.335, rew=548.00]


Epoch #1830: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1831: 1025it [00:02, 374.48it/s, env_step=1874944, len=9, n/ep=6, n/st=64, player_1/loss=174.776, player_2/loss=191.046, rew=105.67]


Epoch #1831: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1832: 1025it [00:02, 370.16it/s, env_step=1875968, len=32, n/ep=2, n/st=64, player_1/loss=335.884, player_2/loss=295.565, rew=1070.00]


Epoch #1832: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1833: 1025it [00:02, 354.42it/s, env_step=1876992, len=28, n/ep=3, n/st=64, player_1/loss=334.533, player_2/loss=393.694, rew=870.67]


Epoch #1833: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1834: 1025it [00:02, 372.98it/s, env_step=1878016, len=21, n/ep=3, n/st=64, player_1/loss=501.045, player_2/loss=262.508, rew=478.67]


Epoch #1834: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1835: 1025it [00:02, 367.77it/s, env_step=1879040, len=17, n/ep=3, n/st=64, player_1/loss=411.969, player_2/loss=245.296, rew=385.33]


Epoch #1835: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1836: 1025it [00:02, 370.83it/s, env_step=1880064, len=9, n/ep=7, n/st=64, player_1/loss=136.945, player_2/loss=321.206, rew=101.43]


Epoch #1836: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1837: 1025it [00:02, 368.43it/s, env_step=1881088, len=12, n/ep=5, n/st=64, player_1/loss=164.369, player_2/loss=202.748, rew=205.20]


Epoch #1837: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1838: 1025it [00:02, 373.39it/s, env_step=1882112, len=28, n/ep=3, n/st=64, player_1/loss=121.027, player_2/loss=326.001, rew=931.33]


Epoch #1838: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1839: 1025it [00:02, 373.39it/s, env_step=1883136, len=29, n/ep=2, n/st=64, player_1/loss=231.966, player_2/loss=319.079, rew=918.00]


Epoch #1839: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1840: 1025it [00:02, 372.17it/s, env_step=1884160, len=15, n/ep=4, n/st=64, player_1/loss=385.510, player_2/loss=79.222, rew=285.50]


Epoch #1840: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1841: 1025it [00:02, 372.71it/s, env_step=1885184, len=17, n/ep=3, n/st=64, player_1/loss=287.737, player_2/loss=107.431, rew=306.00]


Epoch #1841: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1842: 1025it [00:02, 373.53it/s, env_step=1886208, len=22, n/ep=3, n/st=64, player_1/loss=392.095, player_2/loss=109.245, rew=510.00]


Epoch #1842: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1843: 1025it [00:02, 372.31it/s, env_step=1887232, len=28, n/ep=3, n/st=64, player_1/loss=418.305, player_2/loss=220.658, rew=972.67]


Epoch #1843: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1844: 1025it [00:02, 373.12it/s, env_step=1888256, len=9, n/ep=8, n/st=64, player_1/loss=352.848, player_2/loss=311.424, rew=113.25]


Epoch #1844: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1845: 1025it [00:02, 370.02it/s, env_step=1889280, len=40, n/ep=1, n/st=64, player_1/loss=426.126, player_2/loss=276.599, rew=1638.00]


Epoch #1845: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1846: 1025it [00:02, 374.76it/s, env_step=1890304, len=22, n/ep=3, n/st=64, player_1/loss=359.430, player_2/loss=229.648, rew=632.67]


Epoch #1846: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1847: 1025it [00:02, 372.17it/s, env_step=1891328, len=21, n/ep=4, n/st=64, player_1/loss=401.962, player_2/loss=360.549, rew=475.50]


Epoch #1847: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1848: 1025it [00:02, 368.56it/s, env_step=1892352, len=12, n/ep=5, n/st=64, player_1/loss=268.075, player_2/loss=181.609, rew=218.00]


Epoch #1848: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1849: 1025it [00:02, 372.58it/s, env_step=1893376, len=16, n/ep=4, n/st=64, player_1/loss=138.688, player_2/loss=125.466, rew=276.50]


Epoch #1849: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1850: 1025it [00:02, 369.22it/s, env_step=1894400, len=21, n/ep=2, n/st=64, player_1/loss=366.199, player_2/loss=346.178, rew=460.00]


Epoch #1850: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1851: 1025it [00:02, 372.85it/s, env_step=1895424, len=16, n/ep=5, n/st=64, player_1/loss=414.117, rew=291.20]  


Epoch #1851: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1852: 1025it [00:02, 369.76it/s, env_step=1896448, len=13, n/ep=5, n/st=64, player_1/loss=164.835, player_2/loss=217.049, rew=197.60]


Epoch #1852: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1853: 1025it [00:02, 374.62it/s, env_step=1897472, len=18, n/ep=3, n/st=64, player_1/loss=354.912, player_2/loss=201.209, rew=614.67]


Epoch #1853: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1854: 1025it [00:02, 375.58it/s, env_step=1898496, len=23, n/ep=3, n/st=64, player_1/loss=322.877, player_2/loss=335.462, rew=588.67]


Epoch #1854: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1855: 1025it [00:02, 371.36it/s, env_step=1899520, len=20, n/ep=3, n/st=64, player_1/loss=177.545, player_2/loss=357.715, rew=447.33]


Epoch #1855: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1856: 1025it [00:02, 372.04it/s, env_step=1900544, len=28, n/ep=3, n/st=64, player_1/loss=222.449, player_2/loss=435.184, rew=892.67]


Epoch #1856: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1857: 1025it [00:02, 371.90it/s, env_step=1901568, len=19, n/ep=3, n/st=64, player_1/loss=208.932, player_2/loss=399.723, rew=408.67]


Epoch #1857: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1858: 1025it [00:02, 377.52it/s, env_step=1902592, len=21, n/ep=3, n/st=64, player_1/loss=408.192, player_2/loss=140.070, rew=460.00]


Epoch #1858: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1859: 1025it [00:02, 372.98it/s, env_step=1903616, len=27, n/ep=3, n/st=64, player_1/loss=511.434, player_2/loss=604.599, rew=826.67]


Epoch #1859: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1860: 1025it [00:02, 371.23it/s, env_step=1904640, len=23, n/ep=2, n/st=64, player_1/loss=273.374, player_2/loss=578.365, rew=551.00]


Epoch #1860: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1861: 1025it [00:02, 372.44it/s, env_step=1905664, len=23, n/ep=3, n/st=64, player_1/loss=320.757, player_2/loss=375.763, rew=552.67]


Epoch #1861: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1862: 1025it [00:02, 371.36it/s, env_step=1906688, len=32, n/ep=2, n/st=64, player_1/loss=842.714, player_2/loss=312.403, rew=1055.00]


Epoch #1862: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1863: 1025it [00:02, 376.55it/s, env_step=1907712, len=40, n/ep=2, n/st=64, player_1/loss=1181.886, player_2/loss=258.211, rew=1696.00]


Epoch #1863: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1864: 1025it [00:02, 375.58it/s, env_step=1908736, len=23, n/ep=3, n/st=64, player_1/loss=958.752, player_2/loss=288.568, rew=590.67]


Epoch #1864: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1865: 1025it [00:02, 370.69it/s, env_step=1909760, len=24, n/ep=2, n/st=64, player_1/loss=317.469, player_2/loss=550.996, rew=629.00]


Epoch #1865: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1866: 1025it [00:02, 372.04it/s, env_step=1910784, len=20, n/ep=3, n/st=64, player_1/loss=243.358, player_2/loss=635.025, rew=501.33]


Epoch #1866: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1867: 1025it [00:02, 367.37it/s, env_step=1911808, len=14, n/ep=4, n/st=64, player_1/loss=216.053, player_2/loss=438.565, rew=231.50]


Epoch #1867: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1868: 1025it [00:02, 371.50it/s, env_step=1912832, len=15, n/ep=4, n/st=64, player_1/loss=363.474, player_2/loss=436.453, rew=240.00]


Epoch #1868: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1869: 1025it [00:02, 373.53it/s, env_step=1913856, len=22, n/ep=3, n/st=64, player_1/loss=751.066, player_2/loss=368.577, rew=632.00]


Epoch #1869: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1870: 1025it [00:02, 371.90it/s, env_step=1914880, len=28, n/ep=2, n/st=64, player_1/loss=721.944, player_2/loss=653.591, rew=839.00]


Epoch #1870: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1871: 1025it [00:02, 372.98it/s, env_step=1915904, len=28, n/ep=2, n/st=64, player_1/loss=586.758, player_2/loss=588.673, rew=869.00]


Epoch #1871: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1872: 1025it [00:02, 370.02it/s, env_step=1916928, len=19, n/ep=4, n/st=64, player_1/loss=645.633, player_2/loss=253.623, rew=456.00]


Epoch #1872: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1873: 1025it [00:02, 373.39it/s, env_step=1917952, len=32, n/ep=2, n/st=64, player_1/loss=576.174, player_2/loss=336.095, rew=1055.00]


Epoch #1873: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1874: 1025it [00:02, 371.23it/s, env_step=1918976, len=38, n/ep=1, n/st=64, player_1/loss=383.547, player_2/loss=256.197, rew=1480.00]


Epoch #1874: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1875: 1025it [00:02, 366.98it/s, env_step=1920000, len=15, n/ep=4, n/st=64, player_1/loss=192.963, player_2/loss=222.315, rew=242.50]


Epoch #1875: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1876: 1025it [00:02, 372.71it/s, env_step=1921024, len=19, n/ep=4, n/st=64, player_1/loss=212.641, player_2/loss=304.508, rew=504.50]


Epoch #1876: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1877: 1025it [00:02, 368.03it/s, env_step=1922048, len=25, n/ep=3, n/st=64, player_1/loss=110.871, player_2/loss=150.736, rew=857.33]


Epoch #1877: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1878: 1025it [00:02, 373.39it/s, env_step=1923072, len=32, n/ep=1, n/st=64, player_1/loss=204.034, player_2/loss=170.244, rew=1054.00]


Epoch #1878: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1879: 1025it [00:02, 373.26it/s, env_step=1924096, len=33, n/ep=2, n/st=64, player_1/loss=400.281, player_2/loss=440.580, rew=1166.00]


Epoch #1879: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1880: 1025it [00:02, 371.09it/s, env_step=1925120, len=27, n/ep=3, n/st=64, player_1/loss=379.645, player_2/loss=535.075, rew=806.00]


Epoch #1880: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1881: 1025it [00:02, 372.71it/s, env_step=1926144, len=18, n/ep=3, n/st=64, player_1/loss=720.183, player_2/loss=363.319, rew=342.67]


Epoch #1881: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1882: 1025it [00:02, 372.85it/s, env_step=1927168, len=29, n/ep=2, n/st=64, player_2/loss=461.573, rew=893.00]  


Epoch #1882: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1883: 1025it [00:02, 374.48it/s, env_step=1928192, len=18, n/ep=4, n/st=64, player_1/loss=364.314, player_2/loss=394.316, rew=360.50]


Epoch #1883: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1884: 1025it [00:02, 371.23it/s, env_step=1929216, len=38, n/ep=1, n/st=64, player_1/loss=301.737, player_2/loss=471.270, rew=1480.00]


Epoch #1884: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1885: 1025it [00:02, 372.58it/s, env_step=1930240, len=30, n/ep=3, n/st=64, player_1/loss=114.003, player_2/loss=679.433, rew=1140.67]


Epoch #1885: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1886: 1025it [00:02, 371.90it/s, env_step=1931264, len=10, n/ep=7, n/st=64, player_1/loss=196.744, player_2/loss=511.623, rew=141.71]


Epoch #1886: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1887: 1025it [00:02, 371.63it/s, env_step=1932288, len=27, n/ep=3, n/st=64, player_1/loss=569.492, player_2/loss=308.592, rew=784.00]


Epoch #1887: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1888: 1025it [00:02, 370.96it/s, env_step=1933312, len=29, n/ep=2, n/st=64, player_1/loss=496.303, player_2/loss=408.746, rew=898.00]


Epoch #1888: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1889: 1025it [00:02, 370.42it/s, env_step=1934336, len=23, n/ep=3, n/st=64, player_1/loss=92.209, player_2/loss=257.093, rew=552.00]


Epoch #1889: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1890: 1025it [00:02, 374.76it/s, env_step=1935360, len=23, n/ep=3, n/st=64, player_1/loss=322.923, player_2/loss=212.065, rew=571.33]


Epoch #1890: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1891: 1025it [00:02, 372.17it/s, env_step=1936384, len=24, n/ep=3, n/st=64, player_1/loss=368.472, player_2/loss=133.966, rew=666.00]


Epoch #1891: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1892: 1025it [00:02, 372.85it/s, env_step=1937408, len=20, n/ep=3, n/st=64, player_1/loss=443.491, player_2/loss=403.035, rew=447.33]


Epoch #1892: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1893: 1025it [00:02, 372.31it/s, env_step=1938432, len=22, n/ep=3, n/st=64, player_1/loss=664.590, rew=522.67]  


Epoch #1893: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1894: 1025it [00:02, 368.29it/s, env_step=1939456, len=24, n/ep=2, n/st=64, player_1/loss=636.326, player_2/loss=148.205, rew=625.00]


Epoch #1894: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1895: 1025it [00:02, 372.17it/s, env_step=1940480, len=34, n/ep=2, n/st=64, player_1/loss=593.942, player_2/loss=566.334, rew=1243.00]


Epoch #1895: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1896: 1025it [00:02, 372.04it/s, env_step=1941504, len=22, n/ep=3, n/st=64, player_1/loss=589.263, player_2/loss=654.442, rew=602.00]


Epoch #1896: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1897: 1025it [00:02, 370.69it/s, env_step=1942528, len=32, n/ep=2, n/st=64, player_1/loss=533.688, player_2/loss=189.046, rew=1089.00]


Epoch #1897: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1898: 1025it [00:02, 374.07it/s, env_step=1943552, len=32, n/ep=2, n/st=64, player_1/loss=717.174, player_2/loss=396.254, rew=1107.00]


Epoch #1898: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1899: 1025it [00:02, 369.35it/s, env_step=1944576, len=32, n/ep=2, n/st=64, player_1/loss=702.695, player_2/loss=417.539, rew=1087.00]


Epoch #1899: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1900: 1025it [00:02, 370.96it/s, env_step=1945600, len=22, n/ep=3, n/st=64, player_1/loss=828.359, player_2/loss=302.908, rew=540.67]


Epoch #1900: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1901: 1025it [00:02, 372.31it/s, env_step=1946624, len=27, n/ep=2, n/st=64, player_1/loss=956.308, player_2/loss=328.781, rew=758.00]


Epoch #1901: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1902: 1025it [00:02, 368.69it/s, env_step=1947648, len=27, n/ep=3, n/st=64, player_1/loss=640.330, player_2/loss=383.945, rew=848.67]


Epoch #1902: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1903: 1025it [00:02, 374.35it/s, env_step=1948672, len=24, n/ep=3, n/st=64, player_1/loss=533.998, player_2/loss=281.626, rew=624.00]


Epoch #1903: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1904: 1025it [00:02, 372.58it/s, env_step=1949696, len=19, n/ep=4, n/st=64, player_1/loss=432.294, player_2/loss=260.330, rew=495.00]


Epoch #1904: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1905: 1025it [00:02, 372.31it/s, env_step=1950720, len=28, n/ep=2, n/st=64, player_1/loss=384.454, player_2/loss=214.425, rew=859.00]


Epoch #1905: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1906: 1025it [00:02, 372.71it/s, env_step=1951744, len=30, n/ep=2, n/st=64, player_1/loss=455.213, player_2/loss=260.878, rew=929.00]


Epoch #1906: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1907: 1025it [00:02, 369.49it/s, env_step=1952768, len=23, n/ep=3, n/st=64, player_1/loss=463.417, player_2/loss=334.097, rew=552.67]


Epoch #1907: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1908: 1025it [00:02, 373.39it/s, env_step=1953792, len=30, n/ep=2, n/st=64, player_1/loss=386.867, player_2/loss=328.062, rew=1015.00]


Epoch #1908: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1909: 1025it [00:02, 368.29it/s, env_step=1954816, len=25, n/ep=2, n/st=64, player_1/loss=473.230, player_2/loss=319.404, rew=652.00]


Epoch #1909: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1910: 1025it [00:02, 370.16it/s, env_step=1955840, len=36, n/ep=2, n/st=64, player_1/loss=668.176, player_2/loss=343.914, rew=1373.00]


Epoch #1910: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1911: 1025it [00:02, 371.09it/s, env_step=1956864, len=33, n/ep=2, n/st=64, player_1/loss=499.241, player_2/loss=235.089, rew=1241.00]


Epoch #1911: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1912: 1025it [00:02, 373.80it/s, env_step=1957888, len=27, n/ep=3, n/st=64, player_1/loss=453.287, player_2/loss=426.121, rew=754.67]


Epoch #1912: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1913: 1025it [00:02, 373.94it/s, env_step=1958912, len=12, n/ep=5, n/st=64, player_1/loss=259.319, player_2/loss=770.445, rew=172.40]


Epoch #1913: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1914: 1025it [00:02, 369.36it/s, env_step=1959936, len=19, n/ep=2, n/st=64, player_1/loss=201.308, player_2/loss=555.203, rew=508.00]


Epoch #1914: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1915: 1025it [00:02, 373.66it/s, env_step=1960960, len=20, n/ep=3, n/st=64, player_1/loss=224.076, player_2/loss=232.243, rew=446.67]


Epoch #1915: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1916: 1025it [00:02, 370.56it/s, env_step=1961984, len=19, n/ep=4, n/st=64, player_1/loss=382.865, player_2/loss=324.008, rew=426.00]


Epoch #1916: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1917: 1025it [00:02, 371.09it/s, env_step=1963008, len=34, n/ep=2, n/st=64, player_1/loss=369.835, player_2/loss=173.973, rew=1235.00]


Epoch #1917: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1918: 1025it [00:02, 368.16it/s, env_step=1964032, len=17, n/ep=3, n/st=64, player_1/loss=552.155, player_2/loss=416.144, rew=344.67]


Epoch #1918: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1919: 1025it [00:02, 371.63it/s, env_step=1965056, len=29, n/ep=2, n/st=64, player_1/loss=330.600, player_2/loss=403.104, rew=970.00]


Epoch #1919: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1920: 1025it [00:02, 371.90it/s, env_step=1966080, len=28, n/ep=2, n/st=64, player_1/loss=324.783, player_2/loss=369.166, rew=839.00]


Epoch #1920: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1921: 1025it [00:02, 370.16it/s, env_step=1967104, len=21, n/ep=3, n/st=64, player_1/loss=344.220, player_2/loss=359.523, rew=462.67]


Epoch #1921: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1922: 1025it [00:02, 372.31it/s, env_step=1968128, len=23, n/ep=3, n/st=64, player_1/loss=304.119, player_2/loss=150.335, rew=552.67]


Epoch #1922: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1923: 1025it [00:02, 367.37it/s, env_step=1969152, len=28, n/ep=3, n/st=64, player_1/loss=534.792, player_2/loss=114.971, rew=850.00]


Epoch #1923: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1924: 1025it [00:02, 370.16it/s, env_step=1970176, len=29, n/ep=2, n/st=64, player_1/loss=363.941, player_2/loss=170.041, rew=872.00]


Epoch #1924: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1925: 1025it [00:02, 371.63it/s, env_step=1971200, len=26, n/ep=2, n/st=64, player_1/loss=322.246, player_2/loss=229.323, rew=701.00]


Epoch #1925: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1926: 1025it [00:02, 373.94it/s, env_step=1972224, len=25, n/ep=3, n/st=64, player_1/loss=378.127, player_2/loss=385.430, rew=808.67]


Epoch #1926: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1927: 1025it [00:02, 371.90it/s, env_step=1973248, len=21, n/ep=3, n/st=64, player_1/loss=321.670, player_2/loss=311.660, rew=606.00]


Epoch #1927: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1928: 1025it [00:02, 369.36it/s, env_step=1974272, len=22, n/ep=3, n/st=64, player_1/loss=250.511, player_2/loss=515.157, rew=528.67]


Epoch #1928: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1929: 1025it [00:02, 372.31it/s, env_step=1975296, len=27, n/ep=2, n/st=64, player_1/loss=296.861, player_2/loss=504.591, rew=758.00]


Epoch #1929: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1930: 1025it [00:02, 371.77it/s, env_step=1976320, len=26, n/ep=3, n/st=64, player_1/loss=543.768, player_2/loss=655.707, rew=737.33]


Epoch #1930: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1931: 1025it [00:02, 367.77it/s, env_step=1977344, len=27, n/ep=2, n/st=64, player_1/loss=534.351, player_2/loss=398.608, rew=784.00]


Epoch #1931: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1932: 1025it [00:02, 369.89it/s, env_step=1978368, len=25, n/ep=3, n/st=64, player_1/loss=159.577, player_2/loss=97.619, rew=746.00]


Epoch #1932: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1933: 1025it [00:02, 371.90it/s, env_step=1979392, len=27, n/ep=3, n/st=64, player_1/loss=173.094, player_2/loss=344.478, rew=866.67]


Epoch #1933: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1934: 1025it [00:02, 372.98it/s, env_step=1980416, len=26, n/ep=3, n/st=64, player_1/loss=319.693, player_2/loss=561.090, rew=732.67]


Epoch #1934: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1935: 1025it [00:02, 370.69it/s, env_step=1981440, len=22, n/ep=2, n/st=64, player_1/loss=301.380, player_2/loss=262.825, rew=583.00]


Epoch #1935: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1936: 1025it [00:02, 372.03it/s, env_step=1982464, len=24, n/ep=3, n/st=64, player_1/loss=174.840, player_2/loss=55.097, rew=624.00]


Epoch #1936: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1937: 1025it [00:02, 368.96it/s, env_step=1983488, len=32, n/ep=2, n/st=64, player_1/loss=148.567, player_2/loss=156.434, rew=1055.00]


Epoch #1937: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1938: 1025it [00:02, 372.98it/s, env_step=1984512, len=35, n/ep=1, n/st=64, player_2/loss=241.342, rew=1258.00] 


Epoch #1938: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1939: 1025it [00:02, 373.53it/s, env_step=1985536, len=34, n/ep=2, n/st=64, player_1/loss=285.208, player_2/loss=138.886, rew=1265.00]


Epoch #1939: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1940: 1025it [00:02, 373.26it/s, env_step=1986560, len=23, n/ep=3, n/st=64, player_1/loss=337.182, player_2/loss=181.959, rew=552.67]


Epoch #1940: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1941: 1025it [00:02, 372.17it/s, env_step=1987584, len=26, n/ep=2, n/st=64, player_1/loss=371.966, player_2/loss=180.880, rew=709.00]


Epoch #1941: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1942: 1025it [00:02, 370.83it/s, env_step=1988608, len=26, n/ep=2, n/st=64, player_1/loss=230.683, player_2/loss=57.934, rew=727.00]


Epoch #1942: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1943: 1025it [00:02, 373.26it/s, env_step=1989632, len=20, n/ep=3, n/st=64, player_1/loss=68.445, player_2/loss=48.138, rew=436.67]


Epoch #1943: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1944: 1025it [00:02, 369.09it/s, env_step=1990656, len=14, n/ep=4, n/st=64, player_1/loss=30.835, player_2/loss=290.842, rew=245.50]


Epoch #1944: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1945: 1025it [00:02, 372.71it/s, env_step=1991680, len=16, n/ep=4, n/st=64, player_1/loss=438.925, player_2/loss=473.078, rew=304.50]


Epoch #1945: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1946: 1025it [00:02, 369.22it/s, env_step=1992704, len=18, n/ep=3, n/st=64, player_1/loss=430.704, player_2/loss=305.657, rew=376.00]


Epoch #1946: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1947: 1025it [00:02, 368.96it/s, env_step=1993728, len=29, n/ep=2, n/st=64, player_1/loss=141.855, player_2/loss=186.039, rew=970.00]


Epoch #1947: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1948: 1025it [00:02, 373.39it/s, env_step=1994752, len=24, n/ep=2, n/st=64, player_1/loss=244.894, player_2/loss=273.635, rew=599.00]


Epoch #1948: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1949: 1025it [00:02, 370.15it/s, env_step=1995776, len=26, n/ep=3, n/st=64, player_1/loss=370.263, player_2/loss=432.188, rew=704.67]


Epoch #1949: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1950: 1025it [00:02, 372.31it/s, env_step=1996800, len=38, n/ep=2, n/st=64, player_1/loss=531.621, player_2/loss=748.270, rew=1481.00]


Epoch #1950: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1951: 1025it [00:02, 372.98it/s, env_step=1997824, len=32, n/ep=2, n/st=64, player_1/loss=286.130, player_2/loss=493.129, rew=1055.00]


Epoch #1951: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1952: 1025it [00:02, 369.09it/s, env_step=1998848, len=22, n/ep=3, n/st=64, player_1/loss=223.913, player_2/loss=436.949, rew=540.67]


Epoch #1952: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1953: 1025it [00:02, 370.42it/s, env_step=1999872, len=33, n/ep=2, n/st=64, player_1/loss=180.224, player_2/loss=273.331, rew=1166.00]


Epoch #1953: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1954: 1025it [00:02, 371.63it/s, env_step=2000896, len=42, n/ep=1, n/st=64, player_1/loss=335.406, player_2/loss=518.856, rew=1834.00]


Epoch #1954: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1955: 1025it [00:02, 373.12it/s, env_step=2001920, len=37, n/ep=1, n/st=64, player_1/loss=422.097, player_2/loss=482.734, rew=1404.00]


Epoch #1955: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1956: 1025it [00:02, 370.96it/s, env_step=2002944, len=31, n/ep=2, n/st=64, player_1/loss=401.628, player_2/loss=216.261, rew=1028.00]


Epoch #1956: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1957: 1025it [00:02, 374.89it/s, env_step=2003968, len=32, n/ep=2, n/st=64, player_1/loss=439.356, player_2/loss=504.308, rew=1070.00]


Epoch #1957: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1958: 1025it [00:02, 372.31it/s, env_step=2004992, len=7, n/ep=8, n/st=64, player_1/loss=262.085, player_2/loss=575.224, rew=67.00]


Epoch #1958: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1959: 1025it [00:02, 373.66it/s, env_step=2006016, len=34, n/ep=2, n/st=64, player_1/loss=729.879, player_2/loss=170.124, rew=1204.00]


Epoch #1959: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1960: 1025it [00:02, 370.42it/s, env_step=2007040, len=17, n/ep=3, n/st=64, player_1/loss=733.236, player_2/loss=691.614, rew=397.33]


Epoch #1960: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1961: 1025it [00:02, 373.39it/s, env_step=2008064, len=33, n/ep=2, n/st=64, player_1/loss=659.348, player_2/loss=695.289, rew=1166.00]


Epoch #1961: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1962: 1025it [00:02, 374.76it/s, env_step=2009088, len=32, n/ep=2, n/st=64, player_1/loss=389.642, player_2/loss=334.088, rew=1129.00]


Epoch #1962: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1963: 1025it [00:02, 371.36it/s, env_step=2010112, len=22, n/ep=3, n/st=64, player_1/loss=157.577, player_2/loss=268.324, rew=510.00]


Epoch #1963: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1964: 1025it [00:02, 374.07it/s, env_step=2011136, len=34, n/ep=1, n/st=64, player_1/loss=715.801, player_2/loss=298.434, rew=1188.00]


Epoch #1964: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1965: 1025it [00:02, 372.98it/s, env_step=2012160, len=20, n/ep=3, n/st=64, player_1/loss=767.496, player_2/loss=643.341, rew=526.67]


Epoch #1965: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1966: 1025it [00:02, 370.02it/s, env_step=2013184, len=26, n/ep=3, n/st=64, player_1/loss=198.436, player_2/loss=720.785, rew=752.67]


Epoch #1966: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1967: 1025it [00:02, 374.35it/s, env_step=2014208, len=28, n/ep=2, n/st=64, player_1/loss=281.910, player_2/loss=803.126, rew=814.00]


Epoch #1967: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1968: 1025it [00:02, 370.16it/s, env_step=2015232, len=24, n/ep=3, n/st=64, player_1/loss=534.104, player_2/loss=1081.340, rew=664.67]


Epoch #1968: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1969: 1025it [00:02, 372.44it/s, env_step=2016256, len=25, n/ep=3, n/st=64, player_1/loss=426.378, player_2/loss=1183.873, rew=698.67]


Epoch #1969: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1970: 1025it [00:02, 369.22it/s, env_step=2017280, len=31, n/ep=2, n/st=64, player_1/loss=285.714, player_2/loss=714.526, rew=991.00]


Epoch #1970: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1971: 1025it [00:02, 373.39it/s, env_step=2018304, len=29, n/ep=2, n/st=64, player_1/loss=244.858, player_2/loss=444.682, rew=904.00]


Epoch #1971: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1972: 1025it [00:02, 371.77it/s, env_step=2019328, len=35, n/ep=2, n/st=64, player_1/loss=545.697, player_2/loss=147.346, rew=1296.00]


Epoch #1972: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1973: 1025it [00:02, 370.96it/s, env_step=2020352, len=38, n/ep=1, n/st=64, player_1/loss=316.612, player_2/loss=135.643, rew=1480.00]


Epoch #1973: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1974: 1025it [00:02, 371.77it/s, env_step=2021376, len=31, n/ep=2, n/st=64, player_1/loss=409.368, player_2/loss=302.919, rew=991.00]


Epoch #1974: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1975: 1025it [00:02, 368.16it/s, env_step=2022400, len=20, n/ep=3, n/st=64, player_1/loss=308.662, player_2/loss=387.653, rew=455.33]


Epoch #1975: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1976: 1025it [00:02, 372.71it/s, env_step=2023424, len=18, n/ep=3, n/st=64, player_1/loss=215.037, player_2/loss=501.947, rew=406.00]


Epoch #1976: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1977: 1025it [00:02, 372.71it/s, env_step=2024448, len=21, n/ep=3, n/st=64, player_1/loss=418.823, player_2/loss=490.658, rew=622.67]


Epoch #1977: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1978: 1025it [00:02, 368.56it/s, env_step=2025472, len=20, n/ep=4, n/st=64, player_1/loss=506.512, player_2/loss=260.941, rew=644.50]


Epoch #1978: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1979: 1025it [00:02, 371.63it/s, env_step=2026496, len=37, n/ep=2, n/st=64, player_1/loss=282.652, player_2/loss=492.160, rew=1442.00]


Epoch #1979: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1980: 1025it [00:02, 373.12it/s, env_step=2027520, len=19, n/ep=3, n/st=64, player_1/loss=125.576, player_2/loss=527.645, rew=493.33]


Epoch #1980: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1981: 1025it [00:02, 372.04it/s, env_step=2028544, len=29, n/ep=2, n/st=64, player_1/loss=317.572, player_2/loss=674.137, rew=970.00]


Epoch #1981: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1982: 1025it [00:02, 369.09it/s, env_step=2029568, len=39, n/ep=2, n/st=64, player_1/loss=371.274, player_2/loss=762.953, rew=1600.00]


Epoch #1982: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1983: 1025it [00:02, 370.56it/s, env_step=2030592, len=34, n/ep=1, n/st=64, player_1/loss=470.047, player_2/loss=625.200, rew=1188.00]


Epoch #1983: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1984: 1025it [00:02, 372.58it/s, env_step=2031616, len=18, n/ep=4, n/st=64, player_1/loss=398.593, player_2/loss=350.188, rew=360.00]


Epoch #1984: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1985: 1025it [00:02, 369.36it/s, env_step=2032640, len=24, n/ep=3, n/st=64, player_1/loss=355.984, player_2/loss=368.060, rew=606.00]


Epoch #1985: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1986: 1025it [00:02, 374.48it/s, env_step=2033664, len=20, n/ep=3, n/st=64, player_1/loss=266.092, player_2/loss=325.136, rew=489.33]


Epoch #1986: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1987: 1025it [00:02, 368.56it/s, env_step=2034688, len=23, n/ep=3, n/st=64, player_1/loss=302.053, player_2/loss=429.240, rew=678.00]


Epoch #1987: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1988: 1025it [00:02, 368.82it/s, env_step=2035712, len=15, n/ep=4, n/st=64, player_1/loss=451.002, player_2/loss=523.007, rew=259.50]


Epoch #1988: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1989: 1025it [00:02, 371.36it/s, env_step=2036736, len=8, n/ep=8, n/st=64, player_1/loss=279.084, player_2/loss=339.258, rew=76.25]


Epoch #1989: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1990: 1025it [00:02, 369.09it/s, env_step=2037760, len=14, n/ep=5, n/st=64, player_1/loss=277.361, player_2/loss=722.838, rew=234.80]


Epoch #1990: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1991: 1025it [00:02, 370.02it/s, env_step=2038784, len=16, n/ep=4, n/st=64, player_1/loss=637.475, player_2/loss=536.195, rew=313.00]


Epoch #1991: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1992: 1025it [00:02, 371.36it/s, env_step=2039808, len=23, n/ep=2, n/st=64, player_1/loss=425.165, player_2/loss=747.631, rew=604.00]


Epoch #1992: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1993: 1025it [00:02, 373.12it/s, env_step=2040832, len=23, n/ep=3, n/st=64, player_1/loss=223.049, player_2/loss=490.425, rew=567.33]


Epoch #1993: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1994: 1025it [00:02, 370.02it/s, env_step=2041856, len=37, n/ep=2, n/st=64, player_1/loss=574.637, player_2/loss=266.258, rew=1448.00]


Epoch #1994: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1995: 1025it [00:02, 368.82it/s, env_step=2042880, len=36, n/ep=2, n/st=64, player_1/loss=567.159, player_2/loss=146.141, rew=1367.00]


Epoch #1995: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1996: 1025it [00:02, 365.80it/s, env_step=2043904, len=13, n/ep=5, n/st=64, player_1/loss=690.756, player_2/loss=245.824, rew=246.40]


Epoch #1996: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1997: 1025it [00:02, 370.83it/s, env_step=2044928, len=22, n/ep=3, n/st=64, player_1/loss=647.682, player_2/loss=551.310, rew=624.67]


Epoch #1997: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1998: 1025it [00:02, 373.26it/s, env_step=2045952, len=17, n/ep=5, n/st=64, player_1/loss=361.737, player_2/loss=495.698, rew=468.00]


Epoch #1998: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #1999: 1025it [00:02, 369.22it/s, env_step=2046976, len=8, n/ep=8, n/st=64, player_1/loss=350.247, player_2/loss=186.292, rew=75.25]


Epoch #1999: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2000: 1025it [00:02, 371.36it/s, env_step=2048000, len=12, n/ep=5, n/st=64, player_1/loss=307.381, player_2/loss=462.587, rew=187.60]


Epoch #2000: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2001: 1025it [00:02, 368.56it/s, env_step=2049024, len=28, n/ep=3, n/st=64, player_1/loss=382.297, player_2/loss=387.471, rew=972.67]


Epoch #2001: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2002: 1025it [00:02, 372.85it/s, env_step=2050048, len=29, n/ep=2, n/st=64, player_1/loss=467.914, player_2/loss=108.416, rew=884.00]


Epoch #2002: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2003: 1025it [00:02, 368.82it/s, env_step=2051072, len=33, n/ep=2, n/st=64, player_1/loss=306.073, player_2/loss=175.416, rew=1121.00]


Epoch #2003: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2004: 1025it [00:02, 372.58it/s, env_step=2052096, len=29, n/ep=2, n/st=64, player_1/loss=98.137, player_2/loss=780.476, rew=898.00]


Epoch #2004: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2005: 1025it [00:02, 372.04it/s, env_step=2053120, len=14, n/ep=5, n/st=64, player_1/loss=245.427, player_2/loss=767.579, rew=228.00]


Epoch #2005: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2006: 1025it [00:02, 370.83it/s, env_step=2054144, len=26, n/ep=2, n/st=64, player_1/loss=325.787, player_2/loss=319.205, rew=727.00]


Epoch #2006: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2007: 1025it [00:02, 371.76it/s, env_step=2055168, len=23, n/ep=3, n/st=64, player_1/loss=314.541, player_2/loss=313.609, rew=662.00]


Epoch #2007: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2008: 1025it [00:02, 369.76it/s, env_step=2056192, len=28, n/ep=2, n/st=64, player_1/loss=265.681, player_2/loss=279.289, rew=841.00]


Epoch #2008: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2009: 1025it [00:02, 373.53it/s, env_step=2057216, len=21, n/ep=3, n/st=64, player_1/loss=130.318, rew=548.67]  


Epoch #2009: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2010: 1025it [00:02, 371.90it/s, env_step=2058240, len=35, n/ep=2, n/st=64, player_1/loss=72.324, player_2/loss=399.322, rew=1274.00]


Epoch #2010: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2011: 1025it [00:02, 372.31it/s, env_step=2059264, len=39, n/ep=1, n/st=64, player_2/loss=165.424, rew=1558.00] 


Epoch #2011: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2012: 1025it [00:02, 374.76it/s, env_step=2060288, len=34, n/ep=2, n/st=64, player_1/loss=92.842, player_2/loss=189.343, rew=1213.00]


Epoch #2012: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2013: 1025it [00:02, 371.50it/s, env_step=2061312, len=28, n/ep=2, n/st=64, player_1/loss=149.390, player_2/loss=90.426, rew=826.00]


Epoch #2013: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2014: 1025it [00:02, 368.56it/s, env_step=2062336, len=20, n/ep=3, n/st=64, player_1/loss=89.132, player_2/loss=215.433, rew=468.67]


Epoch #2014: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2015: 1025it [00:02, 369.35it/s, env_step=2063360, len=15, n/ep=2, n/st=64, player_1/loss=112.137, player_2/loss=429.304, rew=239.00]


Epoch #2015: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2016: 1025it [00:02, 370.69it/s, env_step=2064384, len=23, n/ep=3, n/st=64, player_1/loss=336.211, player_2/loss=510.786, rew=628.00]


Epoch #2016: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2017: 1025it [00:02, 370.83it/s, env_step=2065408, len=30, n/ep=2, n/st=64, player_1/loss=282.901, player_2/loss=433.683, rew=1015.00]


Epoch #2017: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2018: 1025it [00:02, 372.44it/s, env_step=2066432, len=9, n/ep=7, n/st=64, player_1/loss=158.167, player_2/loss=369.137, rew=99.43]


Epoch #2018: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2019: 1025it [00:02, 372.17it/s, env_step=2067456, len=17, n/ep=3, n/st=64, player_1/loss=200.982, player_2/loss=387.207, rew=352.67]


Epoch #2019: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2020: 1025it [00:02, 370.42it/s, env_step=2068480, len=26, n/ep=2, n/st=64, player_1/loss=105.123, player_2/loss=296.543, rew=727.00]


Epoch #2020: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2021: 1025it [00:02, 369.89it/s, env_step=2069504, len=21, n/ep=3, n/st=64, player_1/loss=439.998, player_2/loss=393.147, rew=464.67]


Epoch #2021: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2022: 1025it [00:02, 370.02it/s, env_step=2070528, len=38, n/ep=1, n/st=64, player_1/loss=640.792, player_2/loss=571.300, rew=1480.00]


Epoch #2022: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2023: 1025it [00:02, 370.96it/s, env_step=2071552, len=24, n/ep=3, n/st=64, player_1/loss=327.030, player_2/loss=383.369, rew=661.33]


Epoch #2023: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2024: 1025it [00:02, 367.24it/s, env_step=2072576, len=25, n/ep=2, n/st=64, player_1/loss=299.418, player_2/loss=238.896, rew=746.00]


Epoch #2024: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2025: 1025it [00:02, 372.44it/s, env_step=2073600, len=20, n/ep=3, n/st=64, player_1/loss=595.329, player_2/loss=401.477, rew=447.33]


Epoch #2025: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2026: 1025it [00:02, 370.56it/s, env_step=2074624, len=17, n/ep=4, n/st=64, player_1/loss=508.666, player_2/loss=520.927, rew=304.50]


Epoch #2026: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2027: 1025it [00:02, 370.15it/s, env_step=2075648, len=37, n/ep=2, n/st=64, player_1/loss=278.031, player_2/loss=777.836, rew=1404.00]


Epoch #2027: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2028: 1025it [00:02, 374.35it/s, env_step=2076672, len=20, n/ep=3, n/st=64, player_1/loss=295.708, player_2/loss=464.607, rew=447.33]


Epoch #2028: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2029: 1025it [00:02, 370.96it/s, env_step=2077696, len=18, n/ep=4, n/st=64, player_1/loss=244.014, player_2/loss=259.598, rew=350.00]


Epoch #2029: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2030: 1025it [00:02, 373.80it/s, env_step=2078720, len=16, n/ep=3, n/st=64, player_1/loss=137.973, player_2/loss=217.034, rew=329.33]


Epoch #2030: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2031: 1025it [00:02, 369.62it/s, env_step=2079744, len=20, n/ep=3, n/st=64, player_1/loss=242.101, player_2/loss=339.912, rew=422.67]


Epoch #2031: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2032: 1025it [00:02, 370.83it/s, env_step=2080768, len=21, n/ep=4, n/st=64, player_1/loss=280.253, player_2/loss=629.770, rew=462.00]


Epoch #2032: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2033: 1025it [00:02, 373.12it/s, env_step=2081792, len=28, n/ep=3, n/st=64, player_1/loss=292.267, player_2/loss=381.211, rew=814.67]


Epoch #2033: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2034: 1025it [00:02, 371.36it/s, env_step=2082816, len=25, n/ep=2, n/st=64, player_1/loss=247.235, player_2/loss=225.317, rew=652.00]


Epoch #2034: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2035: 1025it [00:02, 373.94it/s, env_step=2083840, len=32, n/ep=2, n/st=64, player_1/loss=206.555, player_2/loss=283.824, rew=1089.00]


Epoch #2035: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2036: 1025it [00:02, 371.77it/s, env_step=2084864, len=32, n/ep=2, n/st=64, player_1/loss=199.711, player_2/loss=339.352, rew=1087.00]


Epoch #2036: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2037: 1025it [00:02, 369.62it/s, env_step=2085888, len=26, n/ep=3, n/st=64, player_1/loss=356.308, player_2/loss=181.438, rew=702.00]


Epoch #2037: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2038: 1025it [00:02, 360.65it/s, env_step=2086912, len=30, n/ep=2, n/st=64, player_1/loss=627.869, player_2/loss=527.416, rew=937.00]


Epoch #2038: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2039: 1025it [00:02, 370.42it/s, env_step=2087936, len=30, n/ep=2, n/st=64, player_1/loss=314.982, player_2/loss=754.656, rew=1001.00]


Epoch #2039: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2040: 1025it [00:02, 376.41it/s, env_step=2088960, len=26, n/ep=3, n/st=64, player_1/loss=230.838, player_2/loss=397.695, rew=700.67]


Epoch #2040: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2041: 1025it [00:02, 372.17it/s, env_step=2089984, len=37, n/ep=2, n/st=64, player_1/loss=546.734, player_2/loss=195.774, rew=1442.00]


Epoch #2041: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2042: 1025it [00:02, 373.12it/s, env_step=2091008, len=27, n/ep=3, n/st=64, player_1/loss=532.259, player_2/loss=362.111, rew=915.33]


Epoch #2042: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2043: 1025it [00:02, 367.37it/s, env_step=2092032, len=33, n/ep=2, n/st=64, player_1/loss=161.198, player_2/loss=835.146, rew=1174.00]


Epoch #2043: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2044: 1025it [00:02, 370.02it/s, env_step=2093056, len=28, n/ep=1, n/st=64, player_1/loss=803.841, player_2/loss=948.604, rew=810.00]


Epoch #2044: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2045: 1025it [00:02, 372.17it/s, env_step=2094080, len=25, n/ep=3, n/st=64, player_1/loss=821.665, player_2/loss=542.010, rew=709.33]


Epoch #2045: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2046: 1025it [00:02, 371.90it/s, env_step=2095104, len=32, n/ep=2, n/st=64, player_1/loss=725.864, player_2/loss=483.143, rew=1090.00]


Epoch #2046: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2047: 1025it [00:02, 371.90it/s, env_step=2096128, len=39, n/ep=1, n/st=64, player_1/loss=506.004, player_2/loss=619.449, rew=1558.00]


Epoch #2047: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2048: 1025it [00:02, 371.36it/s, env_step=2097152, len=37, n/ep=2, n/st=64, player_1/loss=505.369, player_2/loss=594.584, rew=1413.00]


Epoch #2048: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2049: 1025it [00:02, 372.71it/s, env_step=2098176, len=35, n/ep=2, n/st=64, player_1/loss=566.955, player_2/loss=705.176, rew=1296.00]


Epoch #2049: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2050: 1025it [00:02, 371.09it/s, env_step=2099200, len=15, n/ep=4, n/st=64, player_1/loss=340.945, player_2/loss=1106.085, rew=255.00]


Epoch #2050: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2051: 1025it [00:02, 369.22it/s, env_step=2100224, len=42, n/ep=1, n/st=64, player_1/loss=996.663, player_2/loss=1160.876, rew=1834.00]


Epoch #2051: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2052: 1025it [00:02, 368.43it/s, env_step=2101248, len=24, n/ep=2, n/st=64, player_1/loss=1306.717, player_2/loss=594.647, rew=679.00]


Epoch #2052: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2053: 1025it [00:02, 369.36it/s, env_step=2102272, len=23, n/ep=2, n/st=64, player_1/loss=780.808, player_2/loss=195.325, rew=586.00]


Epoch #2053: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2054: 1025it [00:02, 371.09it/s, env_step=2103296, len=18, n/ep=3, n/st=64, player_1/loss=541.649, player_2/loss=155.277, rew=355.33]


Epoch #2054: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2055: 1025it [00:02, 368.29it/s, env_step=2104320, len=35, n/ep=2, n/st=64, player_1/loss=503.385, player_2/loss=379.250, rew=1267.00]


Epoch #2055: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2056: 1025it [00:02, 372.71it/s, env_step=2105344, len=24, n/ep=3, n/st=64, player_1/loss=361.411, player_2/loss=467.424, rew=642.00]


Epoch #2056: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2057: 1025it [00:02, 366.32it/s, env_step=2106368, len=24, n/ep=4, n/st=64, player_1/loss=533.327, player_2/loss=214.873, rew=675.00]


Epoch #2057: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2058: 1025it [00:02, 370.83it/s, env_step=2107392, len=31, n/ep=3, n/st=64, player_1/loss=613.964, player_2/loss=308.985, rew=992.67]


Epoch #2058: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2059: 1025it [00:02, 370.42it/s, env_step=2108416, len=34, n/ep=2, n/st=64, player_1/loss=419.301, player_2/loss=256.922, rew=1204.00]


Epoch #2059: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2060: 1025it [00:02, 374.48it/s, env_step=2109440, len=35, n/ep=2, n/st=64, player_1/loss=445.706, player_2/loss=559.380, rew=1283.00]


Epoch #2060: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2061: 1025it [00:02, 373.26it/s, env_step=2110464, len=22, n/ep=3, n/st=64, player_1/loss=633.956, player_2/loss=505.342, rew=510.00]


Epoch #2061: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2062: 1025it [00:02, 370.16it/s, env_step=2111488, len=23, n/ep=3, n/st=64, player_1/loss=660.748, player_2/loss=280.564, rew=570.00]


Epoch #2062: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2063: 1025it [00:02, 371.09it/s, env_step=2112512, len=35, n/ep=2, n/st=64, player_1/loss=508.905, player_2/loss=341.458, rew=1314.00]


Epoch #2063: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2064: 1025it [00:02, 368.16it/s, env_step=2113536, len=21, n/ep=3, n/st=64, player_1/loss=417.638, player_2/loss=319.957, rew=524.00]


Epoch #2064: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2065: 1025it [00:02, 373.12it/s, env_step=2114560, len=39, n/ep=1, n/st=64, player_1/loss=322.322, player_2/loss=287.559, rew=1558.00]


Epoch #2065: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2066: 1025it [00:02, 368.96it/s, env_step=2115584, len=13, n/ep=6, n/st=64, player_1/loss=413.924, player_2/loss=512.491, rew=308.67]


Epoch #2066: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2067: 1025it [00:02, 373.94it/s, env_step=2116608, len=33, n/ep=2, n/st=64, player_1/loss=421.288, player_2/loss=571.233, rew=1156.00]


Epoch #2067: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2068: 1025it [00:02, 372.04it/s, env_step=2117632, len=20, n/ep=2, n/st=64, player_1/loss=470.380, player_2/loss=149.832, rew=418.00]


Epoch #2068: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2069: 1025it [00:02, 368.69it/s, env_step=2118656, len=27, n/ep=2, n/st=64, player_1/loss=1354.691, player_2/loss=87.579, rew=782.00]


Epoch #2069: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2070: 1025it [00:02, 373.66it/s, env_step=2119680, len=32, n/ep=2, n/st=64, player_1/loss=1452.239, player_2/loss=88.572, rew=1058.00]


Epoch #2070: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2071: 1025it [00:02, 371.09it/s, env_step=2120704, len=34, n/ep=2, n/st=64, player_1/loss=600.414, player_2/loss=213.456, rew=1225.00]


Epoch #2071: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2072: 1025it [00:02, 370.02it/s, env_step=2121728, len=28, n/ep=3, n/st=64, player_1/loss=458.158, rew=824.00]  


Epoch #2072: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2073: 1025it [00:02, 371.23it/s, env_step=2122752, len=21, n/ep=3, n/st=64, player_1/loss=793.785, player_2/loss=276.337, rew=474.67]


Epoch #2073: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2074: 1025it [00:02, 370.02it/s, env_step=2123776, len=38, n/ep=1, n/st=64, player_2/loss=384.255, rew=1480.00] 


Epoch #2074: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2075: 1025it [00:02, 371.90it/s, env_step=2124800, len=37, n/ep=2, n/st=64, player_1/loss=283.499, player_2/loss=426.902, rew=1405.00]


Epoch #2075: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2076: 1025it [00:02, 369.89it/s, env_step=2125824, len=29, n/ep=2, n/st=64, player_1/loss=206.772, player_2/loss=255.314, rew=970.00]


Epoch #2076: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2077: 1025it [00:02, 372.44it/s, env_step=2126848, len=27, n/ep=2, n/st=64, player_1/loss=377.039, player_2/loss=306.079, rew=754.00]


Epoch #2077: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2078: 1025it [00:02, 368.43it/s, env_step=2127872, len=21, n/ep=3, n/st=64, player_1/loss=680.289, player_2/loss=187.273, rew=532.67]


Epoch #2078: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2079: 1025it [00:02, 372.31it/s, env_step=2128896, len=27, n/ep=2, n/st=64, player_1/loss=597.931, player_2/loss=541.054, rew=758.00]


Epoch #2079: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2080: 1025it [00:02, 370.02it/s, env_step=2129920, len=24, n/ep=2, n/st=64, player_1/loss=639.181, player_2/loss=581.794, rew=805.00]


Epoch #2080: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2081: 1025it [00:02, 371.63it/s, env_step=2130944, len=30, n/ep=2, n/st=64, player_1/loss=736.733, player_2/loss=785.382, rew=932.00]


Epoch #2081: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2082: 1025it [00:02, 369.89it/s, env_step=2131968, len=25, n/ep=3, n/st=64, player_1/loss=408.523, player_2/loss=970.969, rew=676.00]


Epoch #2082: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2083: 1025it [00:02, 368.03it/s, env_step=2132992, len=34, n/ep=2, n/st=64, player_1/loss=830.996, player_2/loss=738.079, rew=1225.00]


Epoch #2083: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2084: 1025it [00:02, 374.62it/s, env_step=2134016, len=24, n/ep=3, n/st=64, player_1/loss=636.273, player_2/loss=67.058, rew=670.00]


Epoch #2084: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2085: 1025it [00:02, 368.82it/s, env_step=2135040, len=23, n/ep=2, n/st=64, player_1/loss=239.471, player_2/loss=502.606, rew=550.00]


Epoch #2085: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2086: 1025it [00:02, 373.12it/s, env_step=2136064, len=27, n/ep=2, n/st=64, player_1/loss=239.381, player_2/loss=568.642, rew=754.00]


Epoch #2086: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2087: 1025it [00:02, 369.76it/s, env_step=2137088, len=30, n/ep=2, n/st=64, player_1/loss=1063.023, player_2/loss=341.887, rew=959.00]


Epoch #2087: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2088: 1025it [00:02, 372.98it/s, env_step=2138112, len=17, n/ep=3, n/st=64, player_1/loss=1078.680, player_2/loss=479.184, rew=331.33]


Epoch #2088: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2089: 1025it [00:02, 370.69it/s, env_step=2139136, len=32, n/ep=2, n/st=64, player_1/loss=236.755, player_2/loss=246.760, rew=1054.00]


Epoch #2089: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2090: 1025it [00:02, 369.76it/s, env_step=2140160, len=39, n/ep=2, n/st=64, player_1/loss=137.569, player_2/loss=276.819, rew=1600.00]


Epoch #2090: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2091: 1025it [00:02, 373.66it/s, env_step=2141184, len=29, n/ep=2, n/st=64, player_1/loss=209.073, player_2/loss=434.955, rew=970.00]


Epoch #2091: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2092: 1025it [00:02, 370.02it/s, env_step=2142208, len=33, n/ep=2, n/st=64, player_1/loss=198.624, player_2/loss=420.162, rew=1169.00]


Epoch #2092: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2093: 1025it [00:02, 370.56it/s, env_step=2143232, len=30, n/ep=2, n/st=64, player_1/loss=285.059, player_2/loss=758.805, rew=937.00]


Epoch #2093: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2094: 1025it [00:02, 372.17it/s, env_step=2144256, len=20, n/ep=4, n/st=64, player_1/loss=523.274, player_2/loss=766.492, rew=523.00]


Epoch #2094: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2095: 1025it [00:02, 373.39it/s, env_step=2145280, len=35, n/ep=2, n/st=64, player_1/loss=409.764, player_2/loss=936.948, rew=1351.00]


Epoch #2095: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2096: 1025it [00:02, 370.96it/s, env_step=2146304, len=32, n/ep=2, n/st=64, player_1/loss=683.011, rew=1099.00] 


Epoch #2096: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2097: 1025it [00:02, 370.42it/s, env_step=2147328, len=15, n/ep=4, n/st=64, player_1/loss=702.415, player_2/loss=433.282, rew=268.00]


Epoch #2097: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2098: 1025it [00:02, 374.48it/s, env_step=2148352, len=20, n/ep=3, n/st=64, player_1/loss=317.217, player_2/loss=572.363, rew=446.00]


Epoch #2098: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2099: 1025it [00:02, 370.29it/s, env_step=2149376, len=14, n/ep=4, n/st=64, player_1/loss=401.375, player_2/loss=556.004, rew=232.50]


Epoch #2099: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2100: 1025it [00:02, 369.62it/s, env_step=2150400, len=35, n/ep=2, n/st=64, player_1/loss=688.970, player_2/loss=350.991, rew=1294.00]


Epoch #2100: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2101: 1025it [00:02, 370.56it/s, env_step=2151424, len=20, n/ep=3, n/st=64, player_1/loss=913.500, player_2/loss=214.979, rew=448.67]


Epoch #2101: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2102: 1025it [00:02, 370.83it/s, env_step=2152448, len=19, n/ep=3, n/st=64, player_1/loss=728.922, player_2/loss=290.439, rew=404.67]


Epoch #2102: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2103: 1025it [00:02, 366.85it/s, env_step=2153472, len=31, n/ep=2, n/st=64, player_1/loss=134.610, player_2/loss=202.535, rew=1022.00]


Epoch #2103: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2104: 1025it [00:02, 372.98it/s, env_step=2154496, len=34, n/ep=2, n/st=64, player_1/loss=119.263, player_2/loss=527.154, rew=1229.00]


Epoch #2104: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2105: 1025it [00:02, 372.44it/s, env_step=2155520, len=34, n/ep=2, n/st=64, player_1/loss=297.316, player_2/loss=384.208, rew=1197.00]


Epoch #2105: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2106: 1025it [00:02, 369.09it/s, env_step=2156544, len=32, n/ep=2, n/st=64, player_1/loss=478.670, player_2/loss=78.652, rew=1129.00]


Epoch #2106: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2107: 1025it [00:02, 372.71it/s, env_step=2157568, len=35, n/ep=2, n/st=64, player_1/loss=312.462, player_2/loss=414.032, rew=1306.00]


Epoch #2107: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2108: 1025it [00:02, 370.42it/s, env_step=2158592, len=34, n/ep=2, n/st=64, player_1/loss=449.548, player_2/loss=766.635, rew=1237.00]


Epoch #2108: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2109: 1025it [00:02, 372.71it/s, env_step=2159616, len=28, n/ep=2, n/st=64, player_1/loss=614.267, player_2/loss=779.200, rew=839.00]


Epoch #2109: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2110: 1025it [00:02, 369.49it/s, env_step=2160640, len=32, n/ep=2, n/st=64, player_1/loss=661.685, player_2/loss=657.314, rew=1103.00]


Epoch #2110: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2111: 1025it [00:02, 372.31it/s, env_step=2161664, len=28, n/ep=3, n/st=64, player_1/loss=377.217, player_2/loss=985.988, rew=849.33]


Epoch #2111: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2112: 1025it [00:02, 373.66it/s, env_step=2162688, len=33, n/ep=2, n/st=64, player_1/loss=373.302, player_2/loss=871.652, rew=1196.00]


Epoch #2112: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2113: 1025it [00:02, 370.96it/s, env_step=2163712, len=35, n/ep=2, n/st=64, player_1/loss=344.090, player_2/loss=730.113, rew=1267.00]


Epoch #2113: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2114: 1025it [00:02, 372.04it/s, env_step=2164736, len=28, n/ep=3, n/st=64, player_1/loss=231.655, player_2/loss=407.134, rew=876.00]


Epoch #2114: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2115: 1025it [00:02, 368.82it/s, env_step=2165760, len=42, n/ep=1, n/st=64, player_1/loss=267.624, player_2/loss=137.598, rew=1834.00]


Epoch #2115: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2116: 1025it [00:02, 372.04it/s, env_step=2166784, len=38, n/ep=1, n/st=64, player_1/loss=239.674, player_2/loss=626.951, rew=1480.00]


Epoch #2116: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2117: 1025it [00:02, 370.83it/s, env_step=2167808, len=23, n/ep=3, n/st=64, player_1/loss=271.768, player_2/loss=665.103, rew=679.33]


Epoch #2117: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2118: 1025it [00:02, 374.89it/s, env_step=2168832, len=21, n/ep=3, n/st=64, player_1/loss=595.687, rew=468.67]  


Epoch #2118: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2119: 1025it [00:02, 371.23it/s, env_step=2169856, len=24, n/ep=3, n/st=64, player_1/loss=650.139, player_2/loss=858.546, rew=617.33]


Epoch #2119: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2120: 1025it [00:02, 367.90it/s, env_step=2170880, len=27, n/ep=3, n/st=64, player_1/loss=806.653, player_2/loss=847.930, rew=782.00]


Epoch #2120: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2121: 1025it [00:02, 372.58it/s, env_step=2171904, len=28, n/ep=3, n/st=64, player_1/loss=855.573, player_2/loss=896.437, rew=846.00]


Epoch #2121: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2122: 1025it [00:02, 368.43it/s, env_step=2172928, len=20, n/ep=2, n/st=64, player_1/loss=427.887, player_2/loss=616.646, rew=454.00]


Epoch #2122: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2123: 1025it [00:02, 371.90it/s, env_step=2173952, len=22, n/ep=2, n/st=64, player_1/loss=614.879, player_2/loss=448.600, rew=508.00]


Epoch #2123: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2124: 1025it [00:02, 371.90it/s, env_step=2174976, len=28, n/ep=2, n/st=64, player_1/loss=702.544, player_2/loss=323.590, rew=845.00]


Epoch #2124: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2125: 1025it [00:02, 372.85it/s, env_step=2176000, len=36, n/ep=2, n/st=64, player_2/loss=604.677, rew=1369.00] 


Epoch #2125: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2126: 1025it [00:02, 370.69it/s, env_step=2177024, len=21, n/ep=3, n/st=64, player_1/loss=617.715, player_2/loss=613.117, rew=490.00]


Epoch #2126: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2127: 1025it [00:02, 372.31it/s, env_step=2178048, len=31, n/ep=2, n/st=64, player_1/loss=480.477, player_2/loss=266.285, rew=994.00]


Epoch #2127: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2128: 1025it [00:02, 370.83it/s, env_step=2179072, len=35, n/ep=2, n/st=64, player_1/loss=569.307, player_2/loss=404.101, rew=1314.00]


Epoch #2128: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2129: 1025it [00:02, 369.76it/s, env_step=2180096, len=36, n/ep=2, n/st=64, player_1/loss=585.196, player_2/loss=505.847, rew=1339.00]


Epoch #2129: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2130: 1025it [00:02, 370.56it/s, env_step=2181120, len=32, n/ep=2, n/st=64, player_2/loss=262.794, rew=1087.00] 


Epoch #2130: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2131: 1025it [00:02, 369.89it/s, env_step=2182144, len=38, n/ep=2, n/st=64, player_1/loss=96.949, player_2/loss=331.014, rew=1521.00]


Epoch #2131: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2132: 1025it [00:02, 371.63it/s, env_step=2183168, len=30, n/ep=2, n/st=64, player_1/loss=422.664, player_2/loss=450.631, rew=1049.00]


Epoch #2132: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2133: 1025it [00:02, 371.63it/s, env_step=2184192, len=26, n/ep=3, n/st=64, player_1/loss=524.132, player_2/loss=183.672, rew=793.33]


Epoch #2133: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2134: 1025it [00:02, 373.39it/s, env_step=2185216, len=22, n/ep=4, n/st=64, player_1/loss=847.871, player_2/loss=198.117, rew=583.50]


Epoch #2134: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2135: 1025it [00:02, 369.62it/s, env_step=2186240, len=29, n/ep=3, n/st=64, player_1/loss=1223.290, player_2/loss=379.951, rew=925.33]


Epoch #2135: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2136: 1025it [00:02, 373.39it/s, env_step=2187264, len=26, n/ep=2, n/st=64, player_1/loss=796.495, player_2/loss=332.942, rew=709.00]


Epoch #2136: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2137: 1025it [00:02, 369.76it/s, env_step=2188288, len=26, n/ep=2, n/st=64, player_1/loss=287.840, player_2/loss=412.971, rew=727.00]


Epoch #2137: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2138: 1025it [00:02, 372.85it/s, env_step=2189312, len=30, n/ep=2, n/st=64, player_1/loss=336.040, player_2/loss=415.973, rew=971.00]


Epoch #2138: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2139: 1025it [00:02, 372.17it/s, env_step=2190336, len=28, n/ep=2, n/st=64, player_1/loss=243.633, player_2/loss=98.964, rew=869.00]


Epoch #2139: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2140: 1025it [00:02, 366.97it/s, env_step=2191360, len=23, n/ep=3, n/st=64, player_1/loss=180.400, player_2/loss=330.100, rew=603.33]


Epoch #2140: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2141: 1025it [00:02, 374.89it/s, env_step=2192384, len=25, n/ep=3, n/st=64, player_1/loss=971.004, player_2/loss=313.153, rew=672.67]


Epoch #2141: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2142: 1025it [00:02, 371.23it/s, env_step=2193408, len=29, n/ep=2, n/st=64, player_1/loss=1170.081, player_2/loss=219.193, rew=884.00]


Epoch #2142: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2143: 1025it [00:02, 370.02it/s, env_step=2194432, len=24, n/ep=3, n/st=64, player_1/loss=368.152, player_2/loss=363.573, rew=631.33]


Epoch #2143: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2144: 1025it [00:02, 367.63it/s, env_step=2195456, len=29, n/ep=2, n/st=64, player_1/loss=223.604, player_2/loss=859.077, rew=904.00]


Epoch #2144: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2145: 1025it [00:02, 369.62it/s, env_step=2196480, len=20, n/ep=4, n/st=64, player_1/loss=233.986, player_2/loss=733.935, rew=418.50]


Epoch #2145: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2146: 1025it [00:02, 374.07it/s, env_step=2197504, len=32, n/ep=2, n/st=64, player_1/loss=296.967, player_2/loss=518.629, rew=1087.00]


Epoch #2146: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2147: 1025it [00:02, 368.69it/s, env_step=2198528, len=24, n/ep=2, n/st=64, player_1/loss=313.483, player_2/loss=597.103, rew=623.00]


Epoch #2147: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2148: 1025it [00:02, 370.69it/s, env_step=2199552, len=23, n/ep=2, n/st=64, player_1/loss=496.221, player_2/loss=619.061, rew=586.00]


Epoch #2148: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2149: 1025it [00:02, 370.69it/s, env_step=2200576, len=26, n/ep=2, n/st=64, player_1/loss=428.582, player_2/loss=342.402, rew=701.00]


Epoch #2149: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2150: 1025it [00:02, 371.09it/s, env_step=2201600, len=23, n/ep=3, n/st=64, player_1/loss=364.843, player_2/loss=226.448, rew=593.33]


Epoch #2150: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2151: 1025it [00:02, 372.71it/s, env_step=2202624, len=29, n/ep=2, n/st=64, player_1/loss=332.053, player_2/loss=130.882, rew=877.00]


Epoch #2151: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2152: 1025it [00:02, 372.31it/s, env_step=2203648, len=27, n/ep=2, n/st=64, player_1/loss=327.081, player_2/loss=495.569, rew=824.00]


Epoch #2152: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2153: 1025it [00:02, 354.66it/s, env_step=2204672, len=27, n/ep=3, n/st=64, player_1/loss=576.467, player_2/loss=778.682, rew=872.00]


Epoch #2153: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2154: 1025it [00:02, 366.45it/s, env_step=2205696, len=27, n/ep=2, n/st=64, player_1/loss=479.190, player_2/loss=1182.168, rew=812.00]


Epoch #2154: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2155: 1025it [00:02, 371.50it/s, env_step=2206720, len=24, n/ep=3, n/st=64, player_1/loss=179.033, rew=648.00]  


Epoch #2155: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2156: 1025it [00:02, 366.85it/s, env_step=2207744, len=33, n/ep=2, n/st=64, player_1/loss=99.160, player_2/loss=109.093, rew=1124.00]


Epoch #2156: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2157: 1025it [00:02, 371.23it/s, env_step=2208768, len=16, n/ep=4, n/st=64, player_1/loss=366.826, player_2/loss=204.452, rew=327.00]


Epoch #2157: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2158: 1025it [00:02, 369.62it/s, env_step=2209792, len=11, n/ep=6, n/st=64, player_1/loss=375.984, player_2/loss=673.216, rew=134.00]


Epoch #2158: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2159: 1025it [00:02, 370.96it/s, env_step=2210816, len=12, n/ep=5, n/st=64, player_1/loss=114.441, player_2/loss=885.539, rew=173.20]


Epoch #2159: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2160: 1025it [00:02, 370.56it/s, env_step=2211840, len=30, n/ep=2, n/st=64, player_1/loss=162.317, player_2/loss=834.174, rew=964.00]


Epoch #2160: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2161: 1025it [00:02, 372.31it/s, env_step=2212864, len=25, n/ep=3, n/st=64, player_1/loss=130.052, player_2/loss=569.900, rew=680.67]


Epoch #2161: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2162: 1025it [00:02, 370.29it/s, env_step=2213888, len=28, n/ep=2, n/st=64, player_1/loss=436.163, player_2/loss=173.181, rew=869.00]


Epoch #2162: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2163: 1025it [00:02, 367.37it/s, env_step=2214912, len=14, n/ep=4, n/st=64, player_1/loss=595.216, player_2/loss=265.013, rew=224.00]


Epoch #2163: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2164: 1025it [00:02, 372.17it/s, env_step=2215936, len=20, n/ep=3, n/st=64, player_1/loss=403.383, player_2/loss=651.299, rew=422.67]


Epoch #2164: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2165: 1025it [00:02, 363.08it/s, env_step=2216960, len=24, n/ep=3, n/st=64, player_1/loss=709.441, player_2/loss=595.444, rew=638.67]


Epoch #2165: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2166: 1025it [00:02, 371.90it/s, env_step=2217984, len=26, n/ep=2, n/st=64, player_1/loss=967.533, player_2/loss=397.851, rew=736.00]


Epoch #2166: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2167: 1025it [00:02, 369.89it/s, env_step=2219008, len=36, n/ep=2, n/st=64, player_1/loss=538.365, player_2/loss=105.842, rew=1369.00]


Epoch #2167: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2168: 1025it [00:02, 372.17it/s, env_step=2220032, len=23, n/ep=3, n/st=64, player_1/loss=251.672, player_2/loss=673.993, rew=595.33]


Epoch #2168: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2169: 1025it [00:02, 371.77it/s, env_step=2221056, len=16, n/ep=3, n/st=64, player_1/loss=461.921, player_2/loss=964.750, rew=330.67]


Epoch #2169: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2170: 1025it [00:02, 369.36it/s, env_step=2222080, len=25, n/ep=3, n/st=64, player_1/loss=402.399, player_2/loss=672.174, rew=666.67]


Epoch #2170: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2171: 1025it [00:02, 370.69it/s, env_step=2223104, len=15, n/ep=4, n/st=64, player_1/loss=445.872, player_2/loss=245.901, rew=248.00]


Epoch #2171: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2172: 1025it [00:02, 369.62it/s, env_step=2224128, len=30, n/ep=3, n/st=64, player_1/loss=770.630, player_2/loss=236.208, rew=969.33]


Epoch #2172: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2173: 1025it [00:02, 370.29it/s, env_step=2225152, len=28, n/ep=3, n/st=64, player_1/loss=950.265, player_2/loss=444.287, rew=865.33]


Epoch #2173: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2174: 1025it [00:02, 370.69it/s, env_step=2226176, len=22, n/ep=3, n/st=64, player_1/loss=691.262, player_2/loss=412.093, rew=512.67]


Epoch #2174: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2175: 1025it [00:02, 368.69it/s, env_step=2227200, len=29, n/ep=2, n/st=64, player_1/loss=473.663, player_2/loss=415.408, rew=1012.00]


Epoch #2175: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2176: 1025it [00:02, 369.76it/s, env_step=2228224, len=29, n/ep=2, n/st=64, player_1/loss=262.422, player_2/loss=168.643, rew=970.00]


Epoch #2176: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2177: 1025it [00:02, 372.98it/s, env_step=2229248, len=32, n/ep=2, n/st=64, player_1/loss=184.028, player_2/loss=455.874, rew=1089.00]


Epoch #2177: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2178: 1025it [00:02, 368.96it/s, env_step=2230272, len=20, n/ep=3, n/st=64, player_1/loss=472.013, player_2/loss=331.398, rew=434.67]


Epoch #2178: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2179: 1025it [00:02, 372.44it/s, env_step=2231296, len=28, n/ep=3, n/st=64, player_1/loss=446.312, player_2/loss=302.582, rew=828.67]


Epoch #2179: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2180: 1025it [00:02, 373.12it/s, env_step=2232320, len=30, n/ep=2, n/st=64, player_1/loss=198.125, player_2/loss=558.125, rew=928.00]


Epoch #2180: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2181: 1025it [00:02, 370.56it/s, env_step=2233344, len=34, n/ep=1, n/st=64, player_1/loss=379.268, player_2/loss=705.861, rew=1188.00]


Epoch #2181: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2182: 1025it [00:02, 370.69it/s, env_step=2234368, len=15, n/ep=4, n/st=64, player_1/loss=544.431, player_2/loss=867.501, rew=258.00]


Epoch #2182: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2183: 1025it [00:02, 367.77it/s, env_step=2235392, len=16, n/ep=4, n/st=64, player_1/loss=374.631, player_2/loss=625.385, rew=280.50]


Epoch #2183: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2184: 1025it [00:02, 371.50it/s, env_step=2236416, len=28, n/ep=3, n/st=64, player_1/loss=303.605, player_2/loss=256.210, rew=1097.33]


Epoch #2184: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2185: 1025it [00:02, 369.22it/s, env_step=2237440, len=15, n/ep=2, n/st=64, player_1/loss=176.718, player_2/loss=323.580, rew=254.00]


Epoch #2185: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2186: 1025it [00:02, 372.04it/s, env_step=2238464, len=17, n/ep=4, n/st=64, player_1/loss=69.630, player_2/loss=280.426, rew=326.00]


Epoch #2186: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2187: 1025it [00:02, 370.16it/s, env_step=2239488, len=21, n/ep=3, n/st=64, player_1/loss=84.948, player_2/loss=879.671, rew=462.00]


Epoch #2187: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2188: 1025it [00:02, 370.16it/s, env_step=2240512, len=22, n/ep=2, n/st=64, player_1/loss=163.201, player_2/loss=842.908, rew=557.00]


Epoch #2188: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2189: 1025it [00:02, 371.77it/s, env_step=2241536, len=20, n/ep=3, n/st=64, player_1/loss=259.043, player_2/loss=232.078, rew=420.00]


Epoch #2189: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2190: 1025it [00:02, 367.37it/s, env_step=2242560, len=19, n/ep=3, n/st=64, player_1/loss=279.421, player_2/loss=502.297, rew=485.33]


Epoch #2190: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2191: 1025it [00:02, 369.22it/s, env_step=2243584, len=16, n/ep=4, n/st=64, player_1/loss=362.558, player_2/loss=623.563, rew=338.50]


Epoch #2191: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2192: 1025it [00:02, 370.29it/s, env_step=2244608, len=34, n/ep=2, n/st=64, player_1/loss=386.702, player_2/loss=551.630, rew=1235.00]


Epoch #2192: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2193: 1025it [00:02, 370.83it/s, env_step=2245632, len=17, n/ep=4, n/st=64, player_2/loss=393.439, rew=306.00]  


Epoch #2193: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2194: 1025it [00:02, 367.24it/s, env_step=2246656, len=23, n/ep=3, n/st=64, player_1/loss=283.035, player_2/loss=251.299, rew=554.67]


Epoch #2194: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2195: 1025it [00:02, 369.75it/s, env_step=2247680, len=24, n/ep=2, n/st=64, player_1/loss=375.886, player_2/loss=58.988, rew=599.00]


Epoch #2195: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2196: 1025it [00:02, 371.23it/s, env_step=2248704, len=34, n/ep=2, n/st=64, player_1/loss=870.404, player_2/loss=181.863, rew=1253.00]


Epoch #2196: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2197: 1025it [00:02, 369.36it/s, env_step=2249728, len=20, n/ep=3, n/st=64, player_1/loss=863.372, player_2/loss=517.151, rew=563.33]


Epoch #2197: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2198: 1025it [00:02, 371.09it/s, env_step=2250752, len=8, n/ep=7, n/st=64, player_1/loss=336.716, player_2/loss=553.924, rew=80.57]


Epoch #2198: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2199: 1025it [00:02, 369.36it/s, env_step=2251776, len=21, n/ep=3, n/st=64, player_1/loss=303.415, player_2/loss=483.799, rew=526.67]


Epoch #2199: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2200: 1025it [00:02, 371.50it/s, env_step=2252800, len=32, n/ep=2, n/st=64, player_1/loss=279.469, player_2/loss=674.798, rew=1055.00]


Epoch #2200: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2201: 1025it [00:02, 366.84it/s, env_step=2253824, len=25, n/ep=3, n/st=64, player_1/loss=307.621, player_2/loss=473.695, rew=841.33]


Epoch #2201: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2202: 1025it [00:02, 372.58it/s, env_step=2254848, len=21, n/ep=3, n/st=64, player_1/loss=100.328, player_2/loss=348.264, rew=558.67]


Epoch #2202: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2203: 1025it [00:02, 373.26it/s, env_step=2255872, len=15, n/ep=3, n/st=64, player_1/loss=154.850, player_2/loss=357.989, rew=238.00]


Epoch #2203: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2204: 1025it [00:02, 365.67it/s, env_step=2256896, len=35, n/ep=2, n/st=64, player_1/loss=72.098, player_2/loss=297.579, rew=1259.00]


Epoch #2204: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2205: 1025it [00:02, 371.36it/s, env_step=2257920, len=24, n/ep=3, n/st=64, player_1/loss=44.169, player_2/loss=373.097, rew=698.67]


Epoch #2205: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2206: 1025it [00:02, 369.36it/s, env_step=2258944, len=20, n/ep=3, n/st=64, player_1/loss=97.062, player_2/loss=779.537, rew=478.67]


Epoch #2206: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2207: 1025it [00:02, 368.82it/s, env_step=2259968, len=15, n/ep=4, n/st=64, player_1/loss=322.042, player_2/loss=511.377, rew=240.00]


Epoch #2207: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2208: 1025it [00:02, 365.67it/s, env_step=2260992, len=15, n/ep=4, n/st=64, player_1/loss=390.549, player_2/loss=333.729, rew=238.00]


Epoch #2208: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2209: 1025it [00:02, 370.02it/s, env_step=2262016, len=17, n/ep=4, n/st=64, player_1/loss=336.523, player_2/loss=266.584, rew=351.50]


Epoch #2209: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2210: 1025it [00:02, 370.02it/s, env_step=2263040, len=18, n/ep=3, n/st=64, player_1/loss=265.928, player_2/loss=331.703, rew=366.67]


Epoch #2210: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2211: 1025it [00:02, 366.06it/s, env_step=2264064, len=23, n/ep=3, n/st=64, player_1/loss=153.345, player_2/loss=329.783, rew=582.67]


Epoch #2211: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2212: 1025it [00:02, 372.04it/s, env_step=2265088, len=12, n/ep=5, n/st=64, player_1/loss=115.288, player_2/loss=141.805, rew=166.40]


Epoch #2212: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2213: 1025it [00:02, 369.36it/s, env_step=2266112, len=20, n/ep=3, n/st=64, player_1/loss=112.261, player_2/loss=127.297, rew=418.67]


Epoch #2213: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2214: 1025it [00:02, 369.49it/s, env_step=2267136, len=37, n/ep=1, n/st=64, player_1/loss=221.194, player_2/loss=75.794, rew=1404.00]


Epoch #2214: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2215: 1025it [00:02, 368.29it/s, env_step=2268160, len=29, n/ep=2, n/st=64, player_1/loss=240.993, player_2/loss=126.473, rew=970.00]


Epoch #2215: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2216: 1025it [00:02, 371.77it/s, env_step=2269184, len=15, n/ep=4, n/st=64, player_1/loss=208.935, player_2/loss=197.794, rew=267.50]


Epoch #2216: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2217: 1025it [00:02, 371.63it/s, env_step=2270208, len=18, n/ep=3, n/st=64, player_1/loss=273.302, player_2/loss=159.487, rew=348.67]


Epoch #2217: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2218: 1025it [00:02, 367.37it/s, env_step=2271232, len=25, n/ep=2, n/st=64, player_1/loss=150.913, player_2/loss=149.668, rew=680.00]


Epoch #2218: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2219: 1025it [00:02, 372.98it/s, env_step=2272256, len=23, n/ep=3, n/st=64, player_1/loss=333.393, player_2/loss=74.495, rew=550.67]


Epoch #2219: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2220: 1025it [00:02, 368.03it/s, env_step=2273280, len=31, n/ep=3, n/st=64, player_1/loss=430.193, player_2/loss=275.552, rew=1142.67]


Epoch #2220: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2221: 1025it [00:02, 367.77it/s, env_step=2274304, len=15, n/ep=4, n/st=64, player_1/loss=538.889, player_2/loss=256.845, rew=274.00]


Epoch #2221: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2222: 1025it [00:02, 371.63it/s, env_step=2275328, len=28, n/ep=2, n/st=64, player_1/loss=542.544, player_2/loss=338.660, rew=819.00]


Epoch #2222: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2223: 1025it [00:02, 368.43it/s, env_step=2276352, len=20, n/ep=2, n/st=64, player_1/loss=469.787, player_2/loss=318.499, rew=451.00]


Epoch #2223: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2224: 1025it [00:02, 371.63it/s, env_step=2277376, len=36, n/ep=2, n/st=64, player_1/loss=340.029, player_2/loss=56.154, rew=1412.00]


Epoch #2224: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2225: 1025it [00:02, 367.37it/s, env_step=2278400, len=39, n/ep=2, n/st=64, player_1/loss=426.670, player_2/loss=193.383, rew=1559.00]


Epoch #2225: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2226: 1025it [00:02, 369.09it/s, env_step=2279424, len=18, n/ep=4, n/st=64, player_1/loss=245.843, player_2/loss=206.988, rew=437.50]


Epoch #2226: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2227: 1025it [00:02, 372.31it/s, env_step=2280448, len=12, n/ep=5, n/st=64, player_1/loss=100.000, player_2/loss=302.437, rew=171.60]


Epoch #2227: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2228: 1025it [00:02, 368.03it/s, env_step=2281472, len=28, n/ep=2, n/st=64, player_1/loss=445.726, player_2/loss=280.874, rew=810.00]


Epoch #2228: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2229: 1025it [00:02, 368.96it/s, env_step=2282496, len=34, n/ep=2, n/st=64, player_1/loss=522.625, player_2/loss=181.088, rew=1235.00]


Epoch #2229: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2230: 1025it [00:02, 368.29it/s, env_step=2283520, len=25, n/ep=2, n/st=64, player_1/loss=201.267, player_2/loss=445.619, rew=649.00]


Epoch #2230: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2231: 1025it [00:02, 370.42it/s, env_step=2284544, len=19, n/ep=4, n/st=64, player_1/loss=150.532, player_2/loss=487.280, rew=455.00]


Epoch #2231: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2232: 1025it [00:02, 368.43it/s, env_step=2285568, len=32, n/ep=2, n/st=64, player_1/loss=79.832, player_2/loss=358.231, rew=1117.00]


Epoch #2232: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2233: 1025it [00:02, 370.83it/s, env_step=2286592, len=17, n/ep=4, n/st=64, player_1/loss=202.577, player_2/loss=435.020, rew=407.00]


Epoch #2233: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2234: 1025it [00:02, 371.63it/s, env_step=2287616, len=23, n/ep=2, n/st=64, player_1/loss=356.240, player_2/loss=496.754, rew=694.00]


Epoch #2234: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2235: 1025it [00:02, 368.03it/s, env_step=2288640, len=21, n/ep=3, n/st=64, player_1/loss=331.406, player_2/loss=430.190, rew=490.67]


Epoch #2235: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2236: 1025it [00:02, 369.89it/s, env_step=2289664, len=29, n/ep=3, n/st=64, player_1/loss=500.808, player_2/loss=565.055, rew=976.67]


Epoch #2236: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2237: 1025it [00:02, 368.56it/s, env_step=2290688, len=14, n/ep=4, n/st=64, player_1/loss=312.937, player_2/loss=550.635, rew=209.00]


Epoch #2237: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2238: 1025it [00:02, 372.17it/s, env_step=2291712, len=23, n/ep=3, n/st=64, player_1/loss=239.734, player_2/loss=297.220, rew=628.67]


Epoch #2238: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2239: 1025it [00:02, 369.89it/s, env_step=2292736, len=23, n/ep=2, n/st=64, player_1/loss=306.821, player_2/loss=325.189, rew=806.00]


Epoch #2239: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2240: 1025it [00:02, 368.69it/s, env_step=2293760, len=23, n/ep=3, n/st=64, player_1/loss=155.538, player_2/loss=260.376, rew=609.33]


Epoch #2240: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2241: 1025it [00:02, 372.31it/s, env_step=2294784, len=22, n/ep=3, n/st=64, player_1/loss=128.223, player_2/loss=221.903, rew=506.00]


Epoch #2241: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2242: 1025it [00:02, 369.76it/s, env_step=2295808, len=26, n/ep=3, n/st=64, player_1/loss=177.495, player_2/loss=206.924, rew=742.00]


Epoch #2242: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2243: 1025it [00:02, 364.63it/s, env_step=2296832, len=29, n/ep=2, n/st=64, player_1/loss=332.343, player_2/loss=179.304, rew=868.00]


Epoch #2243: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2244: 1025it [00:02, 371.90it/s, env_step=2297856, len=29, n/ep=2, n/st=64, player_1/loss=376.097, player_2/loss=383.328, rew=893.00]


Epoch #2244: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2245: 1025it [00:02, 368.96it/s, env_step=2298880, len=16, n/ep=3, n/st=64, player_1/loss=321.834, player_2/loss=473.298, rew=356.67]


Epoch #2245: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2246: 1025it [00:02, 369.76it/s, env_step=2299904, len=15, n/ep=4, n/st=64, player_1/loss=231.953, player_2/loss=496.891, rew=285.50]


Epoch #2246: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2247: 1025it [00:02, 365.93it/s, env_step=2300928, len=22, n/ep=2, n/st=64, player_1/loss=277.356, player_2/loss=539.736, rew=529.00]


Epoch #2247: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2248: 1025it [00:02, 369.49it/s, env_step=2301952, len=22, n/ep=3, n/st=64, player_1/loss=363.917, player_2/loss=487.768, rew=565.33]


Epoch #2248: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2249: 1025it [00:02, 370.02it/s, env_step=2302976, len=24, n/ep=2, n/st=64, player_1/loss=242.068, player_2/loss=786.660, rew=647.00]


Epoch #2249: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2250: 1025it [00:02, 355.89it/s, env_step=2304000, len=32, n/ep=2, n/st=64, player_1/loss=210.022, player_2/loss=454.257, rew=1090.00]


Epoch #2250: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2251: 1025it [00:02, 362.05it/s, env_step=2305024, len=25, n/ep=2, n/st=64, player_1/loss=305.569, player_2/loss=397.325, rew=730.00]


Epoch #2251: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2252: 1025it [00:02, 370.16it/s, env_step=2306048, len=23, n/ep=3, n/st=64, player_1/loss=309.229, player_2/loss=445.381, rew=566.00]


Epoch #2252: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2253: 1025it [00:02, 370.42it/s, env_step=2307072, len=22, n/ep=3, n/st=64, player_1/loss=239.196, player_2/loss=309.017, rew=586.00]


Epoch #2253: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2254: 1025it [00:02, 371.50it/s, env_step=2308096, len=16, n/ep=4, n/st=64, player_1/loss=218.578, player_2/loss=311.166, rew=273.00]


Epoch #2254: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2255: 1025it [00:02, 368.56it/s, env_step=2309120, len=23, n/ep=3, n/st=64, player_1/loss=353.112, player_2/loss=269.239, rew=586.00]


Epoch #2255: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2256: 1025it [00:02, 368.69it/s, env_step=2310144, len=22, n/ep=3, n/st=64, player_1/loss=196.170, player_2/loss=107.653, rew=536.00]


Epoch #2256: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2257: 1025it [00:02, 368.43it/s, env_step=2311168, len=38, n/ep=1, n/st=64, player_1/loss=65.901, player_2/loss=341.840, rew=1480.00]


Epoch #2257: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2258: 1025it [00:02, 371.63it/s, env_step=2312192, len=9, n/ep=7, n/st=64, player_1/loss=176.283, player_2/loss=421.926, rew=90.57]


Epoch #2258: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2259: 1025it [00:02, 369.22it/s, env_step=2313216, len=17, n/ep=3, n/st=64, player_1/loss=165.537, player_2/loss=214.097, rew=306.67]


Epoch #2259: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2260: 1025it [00:02, 369.89it/s, env_step=2314240, len=22, n/ep=4, n/st=64, player_1/loss=167.453, player_2/loss=341.745, rew=589.00]


Epoch #2260: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2261: 1025it [00:02, 366.85it/s, env_step=2315264, len=28, n/ep=3, n/st=64, player_1/loss=201.075, player_2/loss=780.931, rew=820.67]


Epoch #2261: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2262: 1025it [00:02, 368.96it/s, env_step=2316288, len=17, n/ep=3, n/st=64, player_1/loss=364.211, player_2/loss=521.064, rew=306.67]


Epoch #2262: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2263: 1025it [00:02, 368.03it/s, env_step=2317312, len=16, n/ep=4, n/st=64, player_1/loss=578.619, player_2/loss=202.450, rew=298.50]


Epoch #2263: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2264: 1025it [00:02, 369.89it/s, env_step=2318336, len=27, n/ep=2, n/st=64, player_1/loss=505.470, player_2/loss=181.262, rew=898.00]


Epoch #2264: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2265: 1025it [00:02, 370.42it/s, env_step=2319360, len=16, n/ep=3, n/st=64, player_1/loss=308.470, player_2/loss=133.374, rew=307.33]


Epoch #2265: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2266: 1025it [00:02, 366.71it/s, env_step=2320384, len=14, n/ep=4, n/st=64, player_1/loss=176.264, player_2/loss=140.873, rew=217.00]


Epoch #2266: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2267: 1025it [00:02, 369.49it/s, env_step=2321408, len=18, n/ep=3, n/st=64, player_1/loss=179.155, player_2/loss=54.380, rew=372.00]


Epoch #2267: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2268: 1025it [00:02, 369.89it/s, env_step=2322432, len=14, n/ep=4, n/st=64, player_1/loss=262.079, player_2/loss=290.844, rew=217.00]


Epoch #2268: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2269: 1025it [00:02, 367.77it/s, env_step=2323456, len=19, n/ep=3, n/st=64, player_1/loss=271.942, player_2/loss=542.266, rew=391.33]


Epoch #2269: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2270: 1025it [00:02, 371.50it/s, env_step=2324480, len=15, n/ep=4, n/st=64, player_1/loss=141.337, player_2/loss=518.676, rew=254.00]


Epoch #2270: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2271: 1025it [00:02, 370.16it/s, env_step=2325504, len=16, n/ep=4, n/st=64, player_1/loss=210.767, player_2/loss=390.351, rew=334.00]


Epoch #2271: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2272: 1025it [00:02, 370.82it/s, env_step=2326528, len=19, n/ep=4, n/st=64, player_1/loss=305.732, player_2/loss=186.690, rew=392.00]


Epoch #2272: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2273: 1025it [00:02, 368.03it/s, env_step=2327552, len=29, n/ep=2, n/st=64, player_1/loss=586.614, player_2/loss=309.872, rew=868.00]


Epoch #2273: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2274: 1025it [00:02, 370.29it/s, env_step=2328576, len=26, n/ep=3, n/st=64, player_1/loss=437.626, rew=702.67]  


Epoch #2274: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2275: 1025it [00:02, 367.24it/s, env_step=2329600, len=17, n/ep=4, n/st=64, player_1/loss=96.999, player_2/loss=404.944, rew=422.00]


Epoch #2275: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2276: 1025it [00:02, 370.83it/s, env_step=2330624, len=17, n/ep=4, n/st=64, player_1/loss=33.849, player_2/loss=518.236, rew=316.00]


Epoch #2276: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2277: 1025it [00:02, 366.71it/s, env_step=2331648, len=25, n/ep=3, n/st=64, player_1/loss=221.588, player_2/loss=493.375, rew=660.67]


Epoch #2277: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2278: 1025it [00:02, 368.82it/s, env_step=2332672, len=10, n/ep=6, n/st=64, player_1/loss=180.575, player_2/loss=323.913, rew=133.00]


Epoch #2278: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2279: 1025it [00:02, 364.89it/s, env_step=2333696, len=15, n/ep=5, n/st=64, player_1/loss=158.280, player_2/loss=234.930, rew=259.60]


Epoch #2279: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2280: 1025it [00:02, 371.09it/s, env_step=2334720, len=23, n/ep=3, n/st=64, player_1/loss=201.571, player_2/loss=278.434, rew=716.00]


Epoch #2280: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2281: 1025it [00:02, 366.98it/s, env_step=2335744, len=29, n/ep=2, n/st=64, player_1/loss=151.652, player_2/loss=438.220, rew=904.00]


Epoch #2281: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2282: 1025it [00:02, 369.36it/s, env_step=2336768, len=24, n/ep=2, n/st=64, player_1/loss=69.565, player_2/loss=695.180, rew=623.00]


Epoch #2282: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2283: 1025it [00:02, 370.56it/s, env_step=2337792, len=23, n/ep=3, n/st=64, player_1/loss=19.617, player_2/loss=741.793, rew=602.00]


Epoch #2283: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2284: 1025it [00:02, 366.58it/s, env_step=2338816, len=20, n/ep=2, n/st=64, player_1/loss=106.186, player_2/loss=569.809, rew=595.00]


Epoch #2284: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2285: 1025it [00:02, 369.09it/s, env_step=2339840, len=30, n/ep=2, n/st=64, player_1/loss=128.561, player_2/loss=289.886, rew=959.00]


Epoch #2285: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2286: 1025it [00:02, 366.71it/s, env_step=2340864, len=14, n/ep=4, n/st=64, player_1/loss=260.735, player_2/loss=83.097, rew=212.50]


Epoch #2286: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2287: 1025it [00:02, 369.22it/s, env_step=2341888, len=22, n/ep=3, n/st=64, player_1/loss=229.269, player_2/loss=76.013, rew=520.00]


Epoch #2287: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2288: 1025it [00:02, 368.29it/s, env_step=2342912, len=18, n/ep=3, n/st=64, player_1/loss=227.522, player_2/loss=108.010, rew=422.00]


Epoch #2288: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2289: 1025it [00:02, 372.03it/s, env_step=2343936, len=24, n/ep=2, n/st=64, player_1/loss=182.452, player_2/loss=303.935, rew=805.00]


Epoch #2289: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2290: 1025it [00:02, 365.93it/s, env_step=2344960, len=32, n/ep=2, n/st=64, player_1/loss=254.155, player_2/loss=305.334, rew=1090.00]


Epoch #2290: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2291: 1025it [00:02, 371.77it/s, env_step=2345984, len=25, n/ep=3, n/st=64, player_1/loss=482.810, player_2/loss=144.799, rew=688.67]


Epoch #2291: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2292: 1025it [00:02, 365.28it/s, env_step=2347008, len=26, n/ep=2, n/st=64, player_1/loss=397.921, player_2/loss=90.009, rew=739.00]


Epoch #2292: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2293: 1025it [00:02, 367.24it/s, env_step=2348032, len=24, n/ep=3, n/st=64, player_1/loss=151.820, player_2/loss=129.440, rew=634.67]


Epoch #2293: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2294: 1025it [00:02, 370.56it/s, env_step=2349056, len=21, n/ep=3, n/st=64, player_1/loss=61.211, player_2/loss=278.412, rew=596.67]


Epoch #2294: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2295: 1025it [00:02, 365.54it/s, env_step=2350080, len=20, n/ep=3, n/st=64, player_1/loss=160.363, player_2/loss=247.538, rew=446.00]


Epoch #2295: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2296: 1025it [00:02, 370.16it/s, env_step=2351104, len=23, n/ep=2, n/st=64, player_1/loss=318.272, player_2/loss=262.273, rew=574.00]


Epoch #2296: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2297: 1025it [00:02, 369.49it/s, env_step=2352128, len=19, n/ep=3, n/st=64, player_1/loss=144.474, player_2/loss=210.984, rew=412.67]


Epoch #2297: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2298: 1025it [00:02, 371.09it/s, env_step=2353152, len=20, n/ep=3, n/st=64, player_1/loss=179.888, player_2/loss=295.907, rew=475.33]


Epoch #2298: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2299: 1025it [00:02, 371.50it/s, env_step=2354176, len=20, n/ep=3, n/st=64, player_1/loss=291.160, player_2/loss=194.300, rew=432.67]


Epoch #2299: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2300: 1025it [00:02, 371.09it/s, env_step=2355200, len=20, n/ep=3, n/st=64, player_1/loss=220.206, player_2/loss=127.000, rew=450.00]


Epoch #2300: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2301: 1025it [00:02, 373.53it/s, env_step=2356224, len=25, n/ep=2, n/st=64, player_1/loss=315.615, player_2/loss=42.896, rew=694.00]


Epoch #2301: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2302: 1025it [00:02, 371.09it/s, env_step=2357248, len=25, n/ep=3, n/st=64, player_1/loss=452.896, player_2/loss=171.891, rew=768.67]


Epoch #2302: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2303: 1025it [00:02, 372.58it/s, env_step=2358272, len=14, n/ep=5, n/st=64, player_1/loss=508.138, player_2/loss=220.979, rew=221.60]


Epoch #2303: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2304: 1025it [00:02, 367.11it/s, env_step=2359296, len=14, n/ep=5, n/st=64, player_1/loss=432.367, player_2/loss=242.074, rew=210.40]


Epoch #2304: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2305: 1025it [00:02, 370.29it/s, env_step=2360320, len=21, n/ep=3, n/st=64, player_1/loss=250.977, player_2/loss=305.361, rew=460.00]


Epoch #2305: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2306: 1025it [00:02, 368.56it/s, env_step=2361344, len=17, n/ep=4, n/st=64, player_1/loss=158.704, player_2/loss=415.638, rew=336.00]


Epoch #2306: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2307: 1025it [00:02, 370.69it/s, env_step=2362368, len=16, n/ep=4, n/st=64, player_1/loss=116.413, player_2/loss=612.151, rew=312.00]


Epoch #2307: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2308: 1025it [00:02, 368.03it/s, env_step=2363392, len=23, n/ep=2, n/st=64, player_1/loss=258.167, player_2/loss=700.991, rew=614.00]


Epoch #2308: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2309: 1025it [00:02, 370.96it/s, env_step=2364416, len=28, n/ep=2, n/st=64, player_1/loss=510.586, player_2/loss=606.494, rew=859.00]


Epoch #2309: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2310: 1025it [00:02, 368.43it/s, env_step=2365440, len=12, n/ep=5, n/st=64, player_1/loss=517.499, player_2/loss=588.756, rew=193.60]


Epoch #2310: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2311: 1025it [00:02, 367.24it/s, env_step=2366464, len=20, n/ep=3, n/st=64, player_1/loss=676.805, rew=492.67]  


Epoch #2311: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2312: 1025it [00:02, 369.62it/s, env_step=2367488, len=27, n/ep=2, n/st=64, player_1/loss=358.139, player_2/loss=157.146, rew=892.00]


Epoch #2312: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2313: 1025it [00:02, 367.37it/s, env_step=2368512, len=15, n/ep=5, n/st=64, player_1/loss=369.802, player_2/loss=138.571, rew=240.80]


Epoch #2313: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2314: 1025it [00:02, 370.29it/s, env_step=2369536, len=14, n/ep=3, n/st=64, player_1/loss=560.322, player_2/loss=105.072, rew=286.00]


Epoch #2314: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2315: 1025it [00:02, 367.50it/s, env_step=2370560, len=21, n/ep=3, n/st=64, player_1/loss=383.419, player_2/loss=114.232, rew=546.67]


Epoch #2315: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2316: 1025it [00:02, 371.77it/s, env_step=2371584, len=25, n/ep=3, n/st=64, player_1/loss=404.158, player_2/loss=318.619, rew=752.67]


Epoch #2316: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2317: 1025it [00:02, 370.29it/s, env_step=2372608, len=38, n/ep=2, n/st=64, player_1/loss=373.762, player_2/loss=359.523, rew=1480.00]


Epoch #2317: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2318: 1025it [00:02, 369.36it/s, env_step=2373632, len=33, n/ep=2, n/st=64, player_1/loss=344.448, player_2/loss=321.243, rew=1121.00]


Epoch #2318: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2319: 1025it [00:02, 369.36it/s, env_step=2374656, len=27, n/ep=2, n/st=64, player_1/loss=739.878, player_2/loss=318.102, rew=898.00]


Epoch #2319: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2320: 1025it [00:02, 366.58it/s, env_step=2375680, len=8, n/ep=7, n/st=64, player_1/loss=693.645, player_2/loss=265.485, rew=76.86]


Epoch #2320: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2321: 1025it [00:02, 369.09it/s, env_step=2376704, len=25, n/ep=3, n/st=64, player_1/loss=507.363, player_2/loss=364.819, rew=650.67]


Epoch #2321: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2322: 1025it [00:02, 369.49it/s, env_step=2377728, len=21, n/ep=3, n/st=64, player_1/loss=634.080, player_2/loss=499.401, rew=478.00]


Epoch #2322: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2323: 1025it [00:02, 366.98it/s, env_step=2378752, len=33, n/ep=2, n/st=64, player_1/loss=391.235, player_2/loss=243.295, rew=1136.00]


Epoch #2323: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2324: 1025it [00:02, 369.76it/s, env_step=2379776, len=21, n/ep=3, n/st=64, player_1/loss=159.546, player_2/loss=380.856, rew=490.00]


Epoch #2324: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2325: 1025it [00:02, 369.49it/s, env_step=2380800, len=20, n/ep=4, n/st=64, player_1/loss=101.928, player_2/loss=384.003, rew=423.50]


Epoch #2325: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2326: 1025it [00:02, 366.06it/s, env_step=2381824, len=13, n/ep=3, n/st=64, player_1/loss=316.566, player_2/loss=290.030, rew=226.00]


Epoch #2326: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2327: 1025it [00:02, 374.21it/s, env_step=2382848, len=26, n/ep=2, n/st=64, player_1/loss=418.851, player_2/loss=109.556, rew=701.00]


Epoch #2327: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2328: 1025it [00:02, 370.83it/s, env_step=2383872, len=15, n/ep=3, n/st=64, player_1/loss=164.247, player_2/loss=96.062, rew=256.67]


Epoch #2328: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2329: 1025it [00:02, 365.80it/s, env_step=2384896, len=24, n/ep=3, n/st=64, player_1/loss=296.848, player_2/loss=211.547, rew=724.00]


Epoch #2329: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2330: 1025it [00:02, 371.36it/s, env_step=2385920, len=30, n/ep=2, n/st=64, player_1/loss=336.801, player_2/loss=179.844, rew=932.00]


Epoch #2330: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2331: 1025it [00:02, 367.11it/s, env_step=2386944, len=26, n/ep=3, n/st=64, player_1/loss=396.673, player_2/loss=86.083, rew=824.67]


Epoch #2331: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2332: 1025it [00:02, 367.77it/s, env_step=2387968, len=20, n/ep=3, n/st=64, player_1/loss=516.281, player_2/loss=208.749, rew=450.00]


Epoch #2332: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2333: 1025it [00:02, 366.45it/s, env_step=2388992, len=31, n/ep=2, n/st=64, player_1/loss=337.777, player_2/loss=213.016, rew=1094.00]


Epoch #2333: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2334: 1025it [00:02, 364.76it/s, env_step=2390016, len=16, n/ep=3, n/st=64, player_1/loss=213.522, player_2/loss=195.354, rew=356.67]


Epoch #2334: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2335: 1025it [00:02, 370.56it/s, env_step=2391040, len=21, n/ep=3, n/st=64, player_1/loss=161.302, player_2/loss=217.920, rew=462.67]


Epoch #2335: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2336: 1025it [00:02, 367.24it/s, env_step=2392064, len=29, n/ep=2, n/st=64, player_1/loss=141.257, player_2/loss=459.163, rew=970.00]


Epoch #2336: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2337: 1025it [00:02, 368.96it/s, env_step=2393088, len=34, n/ep=2, n/st=64, player_1/loss=148.131, player_2/loss=356.805, rew=1204.00]


Epoch #2337: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2338: 1025it [00:02, 368.16it/s, env_step=2394112, len=21, n/ep=3, n/st=64, player_1/loss=747.770, player_2/loss=651.086, rew=474.67]


Epoch #2338: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2339: 1025it [00:02, 370.29it/s, env_step=2395136, len=36, n/ep=2, n/st=64, player_1/loss=863.054, player_2/loss=854.238, rew=1369.00]


Epoch #2339: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2340: 1025it [00:02, 366.06it/s, env_step=2396160, len=26, n/ep=2, n/st=64, player_1/loss=448.281, player_2/loss=412.620, rew=982.00]


Epoch #2340: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2341: 1025it [00:02, 369.36it/s, env_step=2397184, len=18, n/ep=3, n/st=64, player_1/loss=308.297, player_2/loss=367.490, rew=469.33]


Epoch #2341: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2342: 1025it [00:02, 366.45it/s, env_step=2398208, len=8, n/ep=8, n/st=64, player_1/loss=274.177, player_2/loss=509.312, rew=72.00]


Epoch #2342: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2343: 1025it [00:02, 365.15it/s, env_step=2399232, len=9, n/ep=7, n/st=64, player_1/loss=129.560, player_2/loss=320.047, rew=92.57]


Epoch #2343: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2344: 1025it [00:02, 369.89it/s, env_step=2400256, len=18, n/ep=3, n/st=64, player_1/loss=282.646, player_2/loss=142.278, rew=394.67]


Epoch #2344: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2345: 1025it [00:02, 370.29it/s, env_step=2401280, len=23, n/ep=3, n/st=64, player_1/loss=312.557, player_2/loss=142.488, rew=599.33]


Epoch #2345: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2346: 1025it [00:02, 366.58it/s, env_step=2402304, len=28, n/ep=2, n/st=64, player_1/loss=330.413, player_2/loss=138.770, rew=891.00]


Epoch #2346: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2347: 1025it [00:02, 370.02it/s, env_step=2403328, len=26, n/ep=2, n/st=64, player_1/loss=643.161, player_2/loss=329.507, rew=736.00]


Epoch #2347: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2348: 1025it [00:02, 368.43it/s, env_step=2404352, len=22, n/ep=2, n/st=64, player_1/loss=647.363, player_2/loss=479.285, rew=553.00]


Epoch #2348: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2349: 1025it [00:02, 365.80it/s, env_step=2405376, len=15, n/ep=4, n/st=64, player_1/loss=288.209, player_2/loss=366.724, rew=290.00]


Epoch #2349: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2350: 1025it [00:02, 368.96it/s, env_step=2406400, len=16, n/ep=5, n/st=64, player_1/loss=287.606, player_2/loss=203.453, rew=309.60]


Epoch #2350: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2351: 1025it [00:02, 369.09it/s, env_step=2407424, len=34, n/ep=2, n/st=64, player_1/loss=430.459, player_2/loss=162.579, rew=1229.00]


Epoch #2351: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2352: 1025it [00:02, 367.11it/s, env_step=2408448, len=12, n/ep=5, n/st=64, player_1/loss=326.378, player_2/loss=148.559, rew=191.60]


Epoch #2352: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2353: 1025it [00:02, 370.02it/s, env_step=2409472, len=15, n/ep=4, n/st=64, player_1/loss=264.942, player_2/loss=168.999, rew=240.00]


Epoch #2353: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2354: 1025it [00:02, 366.45it/s, env_step=2410496, len=18, n/ep=4, n/st=64, player_1/loss=162.358, player_2/loss=102.017, rew=367.50]


Epoch #2354: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2355: 1025it [00:02, 370.42it/s, env_step=2411520, len=23, n/ep=3, n/st=64, player_1/loss=504.611, player_2/loss=111.046, rew=582.67]


Epoch #2355: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2356: 1025it [00:02, 365.28it/s, env_step=2412544, len=21, n/ep=3, n/st=64, player_1/loss=648.609, player_2/loss=154.950, rew=570.00]


Epoch #2356: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2357: 1025it [00:02, 367.24it/s, env_step=2413568, len=23, n/ep=2, n/st=64, player_1/loss=736.853, player_2/loss=285.818, rew=566.00]


Epoch #2357: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2358: 1025it [00:02, 367.11it/s, env_step=2414592, len=30, n/ep=2, n/st=64, player_1/loss=842.288, player_2/loss=577.513, rew=1009.00]


Epoch #2358: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2359: 1025it [00:02, 367.11it/s, env_step=2415616, len=20, n/ep=2, n/st=64, player_1/loss=948.855, player_2/loss=524.469, rew=427.00]


Epoch #2359: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2360: 1025it [00:02, 372.31it/s, env_step=2416640, len=22, n/ep=3, n/st=64, player_1/loss=722.593, player_2/loss=159.559, rew=537.33]


Epoch #2360: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2361: 1025it [00:02, 367.24it/s, env_step=2417664, len=18, n/ep=4, n/st=64, player_1/loss=218.918, player_2/loss=96.506, rew=384.00]


Epoch #2361: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2362: 1025it [00:02, 368.03it/s, env_step=2418688, len=18, n/ep=3, n/st=64, player_1/loss=261.698, player_2/loss=159.093, rew=370.00]


Epoch #2362: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2363: 1025it [00:02, 365.80it/s, env_step=2419712, len=20, n/ep=3, n/st=64, player_1/loss=310.464, player_2/loss=212.608, rew=446.00]


Epoch #2363: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2364: 1025it [00:02, 369.09it/s, env_step=2420736, len=19, n/ep=3, n/st=64, player_1/loss=311.444, player_2/loss=221.984, rew=382.67]


Epoch #2364: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2365: 1025it [00:02, 370.02it/s, env_step=2421760, len=23, n/ep=3, n/st=64, player_1/loss=424.331, player_2/loss=171.462, rew=568.67]


Epoch #2365: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2366: 1025it [00:02, 369.22it/s, env_step=2422784, len=21, n/ep=3, n/st=64, player_1/loss=532.757, player_2/loss=134.136, rew=500.67]


Epoch #2366: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2367: 1025it [00:02, 370.42it/s, env_step=2423808, len=20, n/ep=3, n/st=64, player_1/loss=253.507, player_2/loss=235.892, rew=459.33]


Epoch #2367: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2368: 1025it [00:02, 366.19it/s, env_step=2424832, len=30, n/ep=2, n/st=64, player_1/loss=308.819, player_2/loss=232.221, rew=1009.00]


Epoch #2368: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2369: 1025it [00:02, 369.36it/s, env_step=2425856, len=21, n/ep=4, n/st=64, player_1/loss=325.731, player_2/loss=141.202, rew=482.00]


Epoch #2369: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2370: 1025it [00:02, 365.54it/s, env_step=2426880, len=29, n/ep=3, n/st=64, player_1/loss=260.456, player_2/loss=318.696, rew=904.67]


Epoch #2370: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2371: 1025it [00:02, 371.23it/s, env_step=2427904, len=23, n/ep=3, n/st=64, player_1/loss=450.784, player_2/loss=355.045, rew=556.00]


Epoch #2371: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2372: 1025it [00:02, 367.37it/s, env_step=2428928, len=32, n/ep=2, n/st=64, player_1/loss=645.191, player_2/loss=289.332, rew=1070.00]


Epoch #2372: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2373: 1025it [00:02, 365.41it/s, env_step=2429952, len=21, n/ep=2, n/st=64, player_1/loss=608.310, player_2/loss=179.379, rew=604.00]


Epoch #2373: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2374: 1025it [00:02, 370.69it/s, env_step=2430976, len=40, n/ep=2, n/st=64, player_2/loss=283.539, rew=1638.00] 


Epoch #2374: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2375: 1025it [00:02, 372.44it/s, env_step=2432000, len=29, n/ep=2, n/st=64, player_1/loss=265.997, player_2/loss=286.798, rew=932.00]


Epoch #2375: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2376: 1025it [00:02, 363.72it/s, env_step=2433024, len=20, n/ep=3, n/st=64, player_2/loss=573.078, rew=433.33]  


Epoch #2376: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2377: 1025it [00:02, 369.22it/s, env_step=2434048, len=15, n/ep=4, n/st=64, player_1/loss=208.304, player_2/loss=493.207, rew=248.00]


Epoch #2377: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2378: 1025it [00:02, 363.85it/s, env_step=2435072, len=13, n/ep=4, n/st=64, player_2/loss=660.687, rew=184.00]  


Epoch #2378: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2379: 1025it [00:02, 369.09it/s, env_step=2436096, len=30, n/ep=2, n/st=64, player_1/loss=344.407, player_2/loss=1034.331, rew=928.00]


Epoch #2379: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2380: 1025it [00:02, 366.85it/s, env_step=2437120, len=31, n/ep=2, n/st=64, player_1/loss=408.578, player_2/loss=694.668, rew=1022.00]


Epoch #2380: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2381: 1025it [00:02, 369.09it/s, env_step=2438144, len=32, n/ep=2, n/st=64, player_1/loss=161.064, player_2/loss=333.078, rew=1089.00]


Epoch #2381: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2382: 1025it [00:02, 370.96it/s, env_step=2439168, len=21, n/ep=2, n/st=64, player_1/loss=157.957, player_2/loss=150.344, rew=560.00]


Epoch #2382: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2383: 1025it [00:02, 366.98it/s, env_step=2440192, len=32, n/ep=2, n/st=64, player_1/loss=182.904, player_2/loss=283.747, rew=1117.00]


Epoch #2383: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2384: 1025it [00:02, 370.02it/s, env_step=2441216, len=9, n/ep=7, n/st=64, player_1/loss=86.809, player_2/loss=534.545, rew=106.00]


Epoch #2384: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2385: 1025it [00:02, 368.82it/s, env_step=2442240, len=19, n/ep=3, n/st=64, player_1/loss=46.428, player_2/loss=764.209, rew=394.67]


Epoch #2385: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2386: 1025it [00:02, 366.71it/s, env_step=2443264, len=20, n/ep=4, n/st=64, player_1/loss=130.064, player_2/loss=581.262, rew=421.50]


Epoch #2386: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2387: 1025it [00:02, 370.56it/s, env_step=2444288, len=24, n/ep=2, n/st=64, player_1/loss=210.266, player_2/loss=570.264, rew=623.00]


Epoch #2387: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2388: 1025it [00:02, 366.32it/s, env_step=2445312, len=15, n/ep=5, n/st=64, player_1/loss=181.180, player_2/loss=380.216, rew=262.40]


Epoch #2388: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2389: 1025it [00:02, 372.31it/s, env_step=2446336, len=25, n/ep=2, n/st=64, player_1/loss=123.055, player_2/loss=323.987, rew=676.00]


Epoch #2389: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2390: 1025it [00:02, 367.37it/s, env_step=2447360, len=20, n/ep=3, n/st=64, player_1/loss=186.109, player_2/loss=498.485, rew=446.00]


Epoch #2390: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2391: 1025it [00:02, 368.82it/s, env_step=2448384, len=16, n/ep=5, n/st=64, player_1/loss=467.925, player_2/loss=269.126, rew=310.00]


Epoch #2391: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2392: 1025it [00:02, 368.82it/s, env_step=2449408, len=34, n/ep=2, n/st=64, player_1/loss=471.750, player_2/loss=270.988, rew=1204.00]


Epoch #2392: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2393: 1025it [00:02, 364.11it/s, env_step=2450432, len=16, n/ep=4, n/st=64, player_1/loss=435.229, player_2/loss=202.553, rew=329.50]


Epoch #2393: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2394: 1025it [00:02, 372.98it/s, env_step=2451456, len=28, n/ep=2, n/st=64, player_1/loss=481.016, player_2/loss=133.226, rew=841.00]


Epoch #2394: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2395: 1025it [00:02, 369.09it/s, env_step=2452480, len=27, n/ep=3, n/st=64, player_1/loss=240.279, player_2/loss=133.504, rew=762.67]


Epoch #2395: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2396: 1025it [00:02, 369.36it/s, env_step=2453504, len=9, n/ep=7, n/st=64, player_1/loss=103.192, player_2/loss=126.633, rew=90.29]


Epoch #2396: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2397: 1025it [00:02, 371.63it/s, env_step=2454528, len=28, n/ep=2, n/st=64, player_1/loss=153.189, rew=846.00]  


Epoch #2397: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2398: 1025it [00:02, 369.09it/s, env_step=2455552, len=10, n/ep=6, n/st=64, player_1/loss=335.064, player_2/loss=209.688, rew=117.33]


Epoch #2398: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2399: 1025it [00:02, 373.94it/s, env_step=2456576, len=33, n/ep=2, n/st=64, player_1/loss=640.053, player_2/loss=98.824, rew=1160.00]


Epoch #2399: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2400: 1025it [00:02, 365.41it/s, env_step=2457600, len=22, n/ep=3, n/st=64, player_1/loss=532.494, player_2/loss=130.241, rew=576.00]


Epoch #2400: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2401: 1025it [00:02, 371.36it/s, env_step=2458624, len=25, n/ep=2, n/st=64, player_1/loss=262.715, player_2/loss=136.950, rew=792.00]


Epoch #2401: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2402: 1025it [00:02, 370.42it/s, env_step=2459648, len=11, n/ep=5, n/st=64, player_1/loss=296.876, player_2/loss=119.729, rew=164.40]


Epoch #2402: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2403: 1025it [00:02, 366.45it/s, env_step=2460672, len=21, n/ep=3, n/st=64, player_1/loss=190.762, player_2/loss=254.727, rew=662.67]


Epoch #2403: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2404: 1025it [00:02, 367.90it/s, env_step=2461696, len=22, n/ep=3, n/st=64, player_1/loss=88.721, player_2/loss=365.140, rew=516.67]


Epoch #2404: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2405: 1025it [00:02, 365.93it/s, env_step=2462720, len=19, n/ep=3, n/st=64, player_1/loss=56.169, player_2/loss=293.414, rew=419.33]


Epoch #2405: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2406: 1025it [00:02, 369.76it/s, env_step=2463744, len=18, n/ep=3, n/st=64, player_1/loss=316.418, player_2/loss=342.224, rew=358.67]


Epoch #2406: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2407: 1025it [00:02, 366.85it/s, env_step=2464768, len=20, n/ep=4, n/st=64, player_1/loss=412.875, player_2/loss=160.376, rew=419.50]


Epoch #2407: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2408: 1025it [00:02, 367.63it/s, env_step=2465792, len=36, n/ep=2, n/st=64, player_2/loss=75.014, rew=1339.00]  


Epoch #2408: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2409: 1025it [00:02, 372.04it/s, env_step=2466816, len=19, n/ep=4, n/st=64, player_1/loss=304.399, player_2/loss=77.772, rew=383.00]


Epoch #2409: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2410: 1025it [00:02, 362.95it/s, env_step=2467840, len=22, n/ep=3, n/st=64, player_1/loss=444.110, player_2/loss=61.084, rew=520.00]


Epoch #2410: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2411: 1025it [00:02, 367.50it/s, env_step=2468864, len=24, n/ep=3, n/st=64, player_1/loss=305.245, player_2/loss=98.124, rew=634.67]


Epoch #2411: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2412: 1025it [00:02, 368.43it/s, env_step=2469888, len=18, n/ep=4, n/st=64, player_1/loss=279.716, player_2/loss=240.890, rew=377.00]


Epoch #2412: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2413: 1025it [00:02, 370.29it/s, env_step=2470912, len=8, n/ep=8, n/st=64, player_1/loss=498.367, player_2/loss=203.668, rew=86.75]


Epoch #2413: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2414: 1025it [00:02, 372.98it/s, env_step=2471936, len=15, n/ep=4, n/st=64, player_1/loss=590.551, player_2/loss=64.230, rew=277.50]


Epoch #2414: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2415: 1025it [00:02, 369.76it/s, env_step=2472960, len=17, n/ep=4, n/st=64, player_1/loss=293.650, player_2/loss=149.418, rew=324.50]


Epoch #2415: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2416: 1025it [00:02, 369.22it/s, env_step=2473984, len=28, n/ep=3, n/st=64, player_1/loss=62.360, player_2/loss=414.441, rew=829.33]


Epoch #2416: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2417: 1025it [00:02, 370.69it/s, env_step=2475008, len=33, n/ep=2, n/st=64, player_1/loss=64.885, player_2/loss=454.060, rew=1184.00]


Epoch #2417: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2418: 1025it [00:02, 369.62it/s, env_step=2476032, len=20, n/ep=4, n/st=64, player_1/loss=62.433, player_2/loss=374.772, rew=567.00]


Epoch #2418: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2419: 1025it [00:02, 371.23it/s, env_step=2477056, len=23, n/ep=2, n/st=64, player_1/loss=82.347, player_2/loss=450.294, rew=554.00]


Epoch #2419: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2420: 1025it [00:02, 365.67it/s, env_step=2478080, len=20, n/ep=3, n/st=64, player_1/loss=264.121, player_2/loss=690.065, rew=689.33]


Epoch #2420: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2421: 1025it [00:02, 370.83it/s, env_step=2479104, len=15, n/ep=4, n/st=64, player_1/loss=454.809, player_2/loss=415.472, rew=427.50]


Epoch #2421: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2422: 1025it [00:02, 368.16it/s, env_step=2480128, len=15, n/ep=4, n/st=64, player_1/loss=463.404, player_2/loss=275.709, rew=270.50]


Epoch #2422: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2423: 1025it [00:02, 371.50it/s, env_step=2481152, len=16, n/ep=4, n/st=64, player_1/loss=493.897, player_2/loss=117.679, rew=327.00]


Epoch #2423: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2424: 1025it [00:02, 368.03it/s, env_step=2482176, len=24, n/ep=2, n/st=64, player_1/loss=571.495, player_2/loss=115.133, rew=805.00]


Epoch #2424: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2425: 1025it [00:02, 368.29it/s, env_step=2483200, len=31, n/ep=2, n/st=64, player_1/loss=482.671, player_2/loss=108.086, rew=999.00]


Epoch #2425: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2426: 1025it [00:02, 370.42it/s, env_step=2484224, len=19, n/ep=3, n/st=64, player_1/loss=213.576, player_2/loss=280.742, rew=448.67]


Epoch #2426: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2427: 1025it [00:02, 368.03it/s, env_step=2485248, len=20, n/ep=3, n/st=64, player_1/loss=380.300, player_2/loss=345.604, rew=420.00]


Epoch #2427: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2428: 1025it [00:02, 368.82it/s, env_step=2486272, len=32, n/ep=2, n/st=64, player_1/loss=330.930, player_2/loss=185.748, rew=1055.00]


Epoch #2428: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2429: 1025it [00:02, 367.90it/s, env_step=2487296, len=26, n/ep=2, n/st=64, player_1/loss=365.164, player_2/loss=351.854, rew=709.00]


Epoch #2429: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2430: 1025it [00:02, 368.43it/s, env_step=2488320, len=25, n/ep=2, n/st=64, player_1/loss=468.128, player_2/loss=297.740, rew=664.00]


Epoch #2430: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2431: 1025it [00:02, 370.29it/s, env_step=2489344, len=25, n/ep=3, n/st=64, player_1/loss=346.515, player_2/loss=333.702, rew=673.33]


Epoch #2431: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2432: 1025it [00:02, 369.09it/s, env_step=2490368, len=20, n/ep=3, n/st=64, player_1/loss=372.734, player_2/loss=204.139, rew=432.00]


Epoch #2432: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2433: 1025it [00:02, 370.42it/s, env_step=2491392, len=22, n/ep=3, n/st=64, player_1/loss=290.495, player_2/loss=145.046, rew=522.67]


Epoch #2433: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2434: 1025it [00:02, 370.69it/s, env_step=2492416, len=20, n/ep=3, n/st=64, player_1/loss=275.850, player_2/loss=222.963, rew=450.00]


Epoch #2434: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2435: 1025it [00:02, 365.02it/s, env_step=2493440, len=25, n/ep=2, n/st=64, player_1/loss=263.513, player_2/loss=611.042, rew=684.00]


Epoch #2435: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2436: 1025it [00:02, 368.96it/s, env_step=2494464, len=20, n/ep=4, n/st=64, player_1/loss=344.281, player_2/loss=493.650, rew=421.50]


Epoch #2436: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2437: 1025it [00:02, 368.16it/s, env_step=2495488, len=23, n/ep=2, n/st=64, player_1/loss=248.048, player_2/loss=444.929, rew=554.00]


Epoch #2437: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2438: 1025it [00:02, 368.16it/s, env_step=2496512, len=28, n/ep=3, n/st=64, player_1/loss=262.815, player_2/loss=745.982, rew=966.67]


Epoch #2438: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2439: 1025it [00:02, 369.09it/s, env_step=2497536, len=40, n/ep=2, n/st=64, player_1/loss=459.066, player_2/loss=591.802, rew=1657.00]


Epoch #2439: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2440: 1025it [00:02, 368.16it/s, env_step=2498560, len=23, n/ep=3, n/st=64, player_1/loss=335.489, player_2/loss=317.715, rew=623.33]


Epoch #2440: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2441: 1025it [00:02, 371.63it/s, env_step=2499584, len=26, n/ep=2, n/st=64, player_1/loss=101.783, player_2/loss=206.698, rew=701.00]


Epoch #2441: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2442: 1025it [00:02, 370.02it/s, env_step=2500608, len=36, n/ep=2, n/st=64, player_1/loss=370.345, player_2/loss=237.252, rew=1373.00]


Epoch #2442: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2443: 1025it [00:02, 371.09it/s, env_step=2501632, len=20, n/ep=4, n/st=64, player_1/loss=183.035, player_2/loss=404.001, rew=454.50]


Epoch #2443: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2444: 1025it [00:02, 365.93it/s, env_step=2502656, len=33, n/ep=2, n/st=64, player_1/loss=196.771, player_2/loss=630.472, rew=1136.00]


Epoch #2444: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2445: 1025it [00:02, 369.76it/s, env_step=2503680, len=21, n/ep=3, n/st=64, player_1/loss=275.976, player_2/loss=626.245, rew=493.33]


Epoch #2445: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2446: 1025it [00:02, 368.56it/s, env_step=2504704, len=12, n/ep=5, n/st=64, player_1/loss=374.591, player_2/loss=466.075, rew=298.00]


Epoch #2446: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2447: 1025it [00:02, 366.19it/s, env_step=2505728, len=31, n/ep=3, n/st=64, player_1/loss=243.792, player_2/loss=321.837, rew=1006.67]


Epoch #2447: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2448: 1025it [00:02, 368.43it/s, env_step=2506752, len=18, n/ep=2, n/st=64, player_1/loss=60.106, player_2/loss=254.024, rew=341.00]


Epoch #2448: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2449: 1025it [00:02, 366.32it/s, env_step=2507776, len=26, n/ep=2, n/st=64, player_1/loss=162.672, player_2/loss=90.982, rew=747.00]


Epoch #2449: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2450: 1025it [00:02, 370.56it/s, env_step=2508800, len=28, n/ep=2, n/st=64, player_1/loss=389.067, player_2/loss=256.288, rew=895.00]


Epoch #2450: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2451: 1025it [00:02, 365.41it/s, env_step=2509824, len=34, n/ep=2, n/st=64, player_1/loss=458.297, player_2/loss=206.281, rew=1235.00]


Epoch #2451: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2452: 1025it [00:02, 373.12it/s, env_step=2510848, len=38, n/ep=1, n/st=64, player_1/loss=433.453, player_2/loss=57.381, rew=1480.00]


Epoch #2452: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2453: 1025it [00:02, 369.76it/s, env_step=2511872, len=37, n/ep=2, n/st=64, player_1/loss=309.561, player_2/loss=261.875, rew=1404.00]


Epoch #2453: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2454: 1025it [00:02, 367.37it/s, env_step=2512896, len=20, n/ep=3, n/st=64, player_1/loss=678.599, player_2/loss=298.936, rew=478.67]


Epoch #2454: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2455: 1025it [00:02, 369.09it/s, env_step=2513920, len=30, n/ep=2, n/st=64, player_1/loss=1152.924, player_2/loss=326.959, rew=937.00]


Epoch #2455: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2456: 1025it [00:02, 367.11it/s, env_step=2514944, len=11, n/ep=6, n/st=64, player_1/loss=530.148, player_2/loss=465.018, rew=150.00]


Epoch #2456: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2457: 1025it [00:02, 369.62it/s, env_step=2515968, len=28, n/ep=2, n/st=64, player_1/loss=192.781, player_2/loss=356.457, rew=811.00]


Epoch #2457: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2458: 1025it [00:02, 366.32it/s, env_step=2516992, len=19, n/ep=3, n/st=64, player_1/loss=388.169, player_2/loss=392.788, rew=504.00]


Epoch #2458: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2459: 1025it [00:02, 368.69it/s, env_step=2518016, len=12, n/ep=4, n/st=64, player_1/loss=307.410, player_2/loss=590.978, rew=164.50]


Epoch #2459: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2460: 1025it [00:02, 369.36it/s, env_step=2519040, len=32, n/ep=2, n/st=64, player_1/loss=542.097, player_2/loss=707.811, rew=1087.00]


Epoch #2460: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2461: 1025it [00:02, 368.16it/s, env_step=2520064, len=25, n/ep=2, n/st=64, player_1/loss=516.418, player_2/loss=760.157, rew=730.00]


Epoch #2461: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2462: 1025it [00:02, 363.08it/s, env_step=2521088, len=21, n/ep=3, n/st=64, player_1/loss=248.617, player_2/loss=643.751, rew=662.67]


Epoch #2462: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2463: 1025it [00:02, 366.32it/s, env_step=2522112, len=33, n/ep=2, n/st=64, player_1/loss=463.982, player_2/loss=778.135, rew=1184.00]


Epoch #2463: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2464: 1025it [00:02, 368.29it/s, env_step=2523136, len=15, n/ep=5, n/st=64, player_1/loss=401.869, player_2/loss=521.087, rew=374.80]


Epoch #2464: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2465: 1025it [00:02, 366.98it/s, env_step=2524160, len=15, n/ep=4, n/st=64, player_1/loss=126.790, player_2/loss=162.323, rew=284.50]


Epoch #2465: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2466: 1025it [00:02, 369.89it/s, env_step=2525184, len=14, n/ep=4, n/st=64, player_1/loss=254.606, player_2/loss=122.992, rew=229.50]


Epoch #2466: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2467: 1025it [00:02, 363.98it/s, env_step=2526208, len=25, n/ep=3, n/st=64, player_1/loss=292.824, player_2/loss=297.661, rew=821.33]


Epoch #2467: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2468: 1025it [00:02, 369.36it/s, env_step=2527232, len=27, n/ep=2, n/st=64, player_1/loss=248.780, player_2/loss=322.457, rew=788.00]


Epoch #2468: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2469: 1025it [00:02, 366.19it/s, env_step=2528256, len=20, n/ep=3, n/st=64, player_1/loss=296.194, player_2/loss=608.072, rew=466.67]


Epoch #2469: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2470: 1025it [00:02, 365.54it/s, env_step=2529280, len=30, n/ep=2, n/st=64, player_1/loss=440.614, player_2/loss=872.613, rew=944.00]


Epoch #2470: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2471: 1025it [00:02, 368.56it/s, env_step=2530304, len=10, n/ep=3, n/st=64, player_1/loss=316.994, player_2/loss=597.208, rew=118.67]


Epoch #2471: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2472: 1025it [00:02, 357.63it/s, env_step=2531328, len=30, n/ep=3, n/st=64, player_1/loss=309.235, player_2/loss=364.148, rew=952.67]


Epoch #2472: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2473: 1025it [00:02, 362.56it/s, env_step=2532352, len=27, n/ep=3, n/st=64, player_1/loss=439.735, player_2/loss=336.915, rew=774.00]


Epoch #2473: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2474: 1025it [00:02, 367.37it/s, env_step=2533376, len=20, n/ep=3, n/st=64, player_1/loss=529.083, player_2/loss=209.959, rew=540.67]


Epoch #2474: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2475: 1025it [00:02, 371.77it/s, env_step=2534400, len=18, n/ep=4, n/st=64, player_1/loss=424.006, player_2/loss=420.443, rew=422.00]


Epoch #2475: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2476: 1025it [00:02, 367.77it/s, env_step=2535424, len=24, n/ep=3, n/st=64, player_1/loss=416.891, player_2/loss=525.030, rew=786.67]


Epoch #2476: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2477: 1025it [00:02, 365.14it/s, env_step=2536448, len=16, n/ep=4, n/st=64, player_1/loss=290.312, player_2/loss=529.154, rew=295.50]


Epoch #2477: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2478: 1025it [00:02, 370.69it/s, env_step=2537472, len=26, n/ep=3, n/st=64, player_1/loss=292.420, player_2/loss=432.919, rew=786.00]


Epoch #2478: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2479: 1025it [00:02, 366.85it/s, env_step=2538496, len=30, n/ep=1, n/st=64, player_1/loss=640.802, player_2/loss=664.981, rew=928.00]


Epoch #2479: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2480: 1025it [00:02, 370.83it/s, env_step=2539520, len=38, n/ep=1, n/st=64, player_1/loss=734.748, player_2/loss=750.630, rew=1480.00]


Epoch #2480: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2481: 1025it [00:02, 366.58it/s, env_step=2540544, len=21, n/ep=4, n/st=64, player_1/loss=570.694, player_2/loss=252.322, rew=475.50]


Epoch #2481: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2482: 1025it [00:02, 368.82it/s, env_step=2541568, len=14, n/ep=5, n/st=64, player_1/loss=253.849, player_2/loss=340.945, rew=238.40]


Epoch #2482: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2483: 1025it [00:02, 364.63it/s, env_step=2542592, len=17, n/ep=3, n/st=64, player_1/loss=222.307, player_2/loss=389.265, rew=336.00]


Epoch #2483: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2484: 1025it [00:02, 369.49it/s, env_step=2543616, len=19, n/ep=3, n/st=64, player_1/loss=156.218, player_2/loss=701.277, rew=474.00]


Epoch #2484: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2485: 1025it [00:02, 367.77it/s, env_step=2544640, len=18, n/ep=2, n/st=64, player_1/loss=235.014, player_2/loss=691.752, rew=349.00]


Epoch #2485: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2486: 1025it [00:02, 367.63it/s, env_step=2545664, len=31, n/ep=2, n/st=64, player_1/loss=280.086, player_2/loss=358.170, rew=1064.00]


Epoch #2486: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2487: 1025it [00:02, 368.43it/s, env_step=2546688, len=21, n/ep=3, n/st=64, player_1/loss=336.012, player_2/loss=258.985, rew=494.67]


Epoch #2487: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2488: 1025it [00:02, 366.58it/s, env_step=2547712, len=25, n/ep=3, n/st=64, player_1/loss=413.164, player_2/loss=408.575, rew=688.67]


Epoch #2488: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2489: 1025it [00:02, 366.71it/s, env_step=2548736, len=15, n/ep=4, n/st=64, player_1/loss=370.252, player_2/loss=322.638, rew=260.00]


Epoch #2489: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2490: 1025it [00:02, 369.36it/s, env_step=2549760, len=22, n/ep=3, n/st=64, player_1/loss=373.466, player_2/loss=350.456, rew=535.33]


Epoch #2490: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2491: 1025it [00:02, 368.43it/s, env_step=2550784, len=16, n/ep=4, n/st=64, player_1/loss=481.213, player_2/loss=326.736, rew=313.50]


Epoch #2491: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2492: 1025it [00:02, 370.42it/s, env_step=2551808, len=29, n/ep=2, n/st=64, player_1/loss=776.559, player_2/loss=230.265, rew=910.00]


Epoch #2492: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2493: 1025it [00:02, 370.16it/s, env_step=2552832, len=11, n/ep=5, n/st=64, player_1/loss=679.423, player_2/loss=124.218, rew=160.00]


Epoch #2493: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2494: 1025it [00:02, 365.67it/s, env_step=2553856, len=20, n/ep=2, n/st=64, player_1/loss=288.346, player_2/loss=461.214, rew=499.00]


Epoch #2494: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2495: 1025it [00:02, 369.22it/s, env_step=2554880, len=30, n/ep=3, n/st=64, player_1/loss=375.320, player_2/loss=548.899, rew=994.00]


Epoch #2495: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2496: 1025it [00:02, 366.85it/s, env_step=2555904, len=27, n/ep=3, n/st=64, player_1/loss=289.172, player_2/loss=441.365, rew=862.67]


Epoch #2496: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2497: 1025it [00:02, 370.56it/s, env_step=2556928, len=24, n/ep=2, n/st=64, player_1/loss=96.905, player_2/loss=700.308, rew=719.00]


Epoch #2497: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2498: 1025it [00:02, 371.90it/s, env_step=2557952, len=20, n/ep=2, n/st=64, player_1/loss=181.600, player_2/loss=757.023, rew=481.00]


Epoch #2498: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2499: 1025it [00:02, 362.95it/s, env_step=2558976, len=16, n/ep=6, n/st=64, player_1/loss=229.994, player_2/loss=410.349, rew=367.00]


Epoch #2499: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2500: 1025it [00:02, 369.09it/s, env_step=2560000, len=13, n/ep=5, n/st=64, player_1/loss=341.340, player_2/loss=200.378, rew=206.40]


Epoch #2500: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2501: 1025it [00:02, 365.41it/s, env_step=2561024, len=13, n/ep=5, n/st=64, player_1/loss=298.732, player_2/loss=232.234, rew=187.20]


Epoch #2501: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2502: 1025it [00:02, 369.62it/s, env_step=2562048, len=19, n/ep=4, n/st=64, player_1/loss=321.367, player_2/loss=80.481, rew=417.00]


Epoch #2502: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2503: 1025it [00:02, 367.90it/s, env_step=2563072, len=20, n/ep=3, n/st=64, player_1/loss=350.657, player_2/loss=34.443, rew=430.67]


Epoch #2503: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2504: 1025it [00:02, 365.02it/s, env_step=2564096, len=28, n/ep=3, n/st=64, player_1/loss=482.756, player_2/loss=543.397, rew=950.00]


Epoch #2504: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2505: 1025it [00:02, 363.98it/s, env_step=2565120, len=25, n/ep=2, n/st=64, player_1/loss=605.511, player_2/loss=597.032, rew=652.00]


Epoch #2505: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2506: 1025it [00:02, 369.22it/s, env_step=2566144, len=24, n/ep=3, n/st=64, player_1/loss=645.986, rew=602.67]  


Epoch #2506: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2507: 1025it [00:02, 367.24it/s, env_step=2567168, len=35, n/ep=2, n/st=64, player_1/loss=421.322, player_2/loss=664.340, rew=1300.00]


Epoch #2507: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2508: 1025it [00:02, 367.90it/s, env_step=2568192, len=38, n/ep=2, n/st=64, player_1/loss=428.721, player_2/loss=790.762, rew=1519.00]


Epoch #2508: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2509: 1025it [00:02, 366.85it/s, env_step=2569216, len=14, n/ep=3, n/st=64, player_1/loss=460.490, player_2/loss=426.311, rew=221.33]


Epoch #2509: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2510: 1025it [00:02, 367.37it/s, env_step=2570240, len=26, n/ep=2, n/st=64, player_1/loss=132.897, player_2/loss=700.667, rew=909.00]


Epoch #2510: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2511: 1025it [00:02, 363.21it/s, env_step=2571264, len=23, n/ep=2, n/st=64, player_1/loss=237.845, player_2/loss=711.345, rew=746.00]


Epoch #2511: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2512: 1025it [00:02, 370.56it/s, env_step=2572288, len=16, n/ep=4, n/st=64, player_1/loss=541.433, player_2/loss=334.648, rew=275.00]


Epoch #2512: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2513: 1025it [00:02, 367.50it/s, env_step=2573312, len=15, n/ep=5, n/st=64, player_1/loss=387.342, player_2/loss=315.740, rew=252.40]


Epoch #2513: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2514: 1025it [00:02, 364.37it/s, env_step=2574336, len=27, n/ep=3, n/st=64, player_1/loss=161.437, player_2/loss=414.115, rew=783.33]


Epoch #2514: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2515: 1025it [00:02, 367.11it/s, env_step=2575360, len=12, n/ep=5, n/st=64, player_1/loss=144.115, player_2/loss=435.408, rew=168.00]


Epoch #2515: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2516: 1025it [00:02, 367.11it/s, env_step=2576384, len=12, n/ep=5, n/st=64, player_1/loss=78.446, player_2/loss=285.149, rew=176.40]


Epoch #2516: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2517: 1025it [00:02, 370.02it/s, env_step=2577408, len=16, n/ep=5, n/st=64, player_1/loss=145.936, player_2/loss=66.855, rew=279.60]


Epoch #2517: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2518: 1025it [00:02, 368.96it/s, env_step=2578432, len=19, n/ep=5, n/st=64, player_1/loss=231.907, player_2/loss=451.035, rew=496.80]


Epoch #2518: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2519: 1025it [00:02, 367.24it/s, env_step=2579456, len=11, n/ep=5, n/st=64, player_1/loss=341.649, player_2/loss=479.203, rew=134.80]


Epoch #2519: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2520: 1025it [00:02, 368.16it/s, env_step=2580480, len=13, n/ep=6, n/st=64, player_1/loss=427.820, player_2/loss=80.884, rew=190.67]


Epoch #2520: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2521: 1025it [00:02, 367.77it/s, env_step=2581504, len=21, n/ep=3, n/st=64, player_1/loss=611.380, player_2/loss=302.095, rew=468.00]


Epoch #2521: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2522: 1025it [00:02, 369.09it/s, env_step=2582528, len=25, n/ep=3, n/st=64, player_1/loss=859.511, player_2/loss=406.850, rew=666.67]


Epoch #2522: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2523: 1025it [00:02, 368.56it/s, env_step=2583552, len=16, n/ep=3, n/st=64, player_1/loss=564.455, player_2/loss=689.125, rew=292.67]


Epoch #2523: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2524: 1025it [00:02, 368.96it/s, env_step=2584576, len=26, n/ep=2, n/st=64, player_2/loss=482.386, rew=727.00]  


Epoch #2524: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2525: 1025it [00:02, 364.63it/s, env_step=2585600, len=22, n/ep=4, n/st=64, player_1/loss=326.579, player_2/loss=776.673, rew=633.00]


Epoch #2525: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2526: 1025it [00:02, 368.43it/s, env_step=2586624, len=21, n/ep=3, n/st=64, player_1/loss=593.524, player_2/loss=580.571, rew=542.67]


Epoch #2526: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2527: 1025it [00:02, 369.62it/s, env_step=2587648, len=9, n/ep=7, n/st=64, player_1/loss=709.221, player_2/loss=81.938, rew=97.43]


Epoch #2527: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2528: 1025it [00:02, 368.03it/s, env_step=2588672, len=12, n/ep=7, n/st=64, player_1/loss=749.068, player_2/loss=375.455, rew=256.29]


Epoch #2528: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2529: 1025it [00:02, 368.56it/s, env_step=2589696, len=20, n/ep=3, n/st=64, player_1/loss=501.497, player_2/loss=599.210, rew=420.00]


Epoch #2529: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2530: 1025it [00:02, 365.54it/s, env_step=2590720, len=30, n/ep=3, n/st=64, player_1/loss=661.485, player_2/loss=435.118, rew=949.33]


Epoch #2530: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2531: 1025it [00:02, 367.37it/s, env_step=2591744, len=21, n/ep=3, n/st=64, player_1/loss=609.413, player_2/loss=466.550, rew=584.00]


Epoch #2531: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2532: 1025it [00:02, 366.32it/s, env_step=2592768, len=28, n/ep=3, n/st=64, player_1/loss=668.552, player_2/loss=397.267, rew=924.00]


Epoch #2532: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2533: 1025it [00:02, 367.77it/s, env_step=2593792, len=26, n/ep=3, n/st=64, player_1/loss=514.191, player_2/loss=97.582, rew=891.33]


Epoch #2533: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2534: 1025it [00:02, 366.71it/s, env_step=2594816, len=19, n/ep=4, n/st=64, player_1/loss=146.043, player_2/loss=216.023, rew=387.00]


Epoch #2534: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2535: 1025it [00:02, 368.43it/s, env_step=2595840, len=7, n/ep=6, n/st=64, player_1/loss=413.300, player_2/loss=549.290, rew=62.33]


Epoch #2535: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2536: 1025it [00:02, 368.29it/s, env_step=2596864, len=19, n/ep=3, n/st=64, player_1/loss=513.166, player_2/loss=915.108, rew=416.00]


Epoch #2536: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2537: 1025it [00:02, 368.16it/s, env_step=2597888, len=21, n/ep=3, n/st=64, player_1/loss=536.117, player_2/loss=580.637, rew=492.67]


Epoch #2537: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2538: 1025it [00:02, 370.16it/s, env_step=2598912, len=21, n/ep=3, n/st=64, player_1/loss=636.125, player_2/loss=484.012, rew=474.67]


Epoch #2538: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2539: 1025it [00:02, 365.80it/s, env_step=2599936, len=17, n/ep=3, n/st=64, player_1/loss=496.102, player_2/loss=931.472, rew=322.00]


Epoch #2539: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2540: 1025it [00:02, 369.22it/s, env_step=2600960, len=9, n/ep=8, n/st=64, player_1/loss=79.891, player_2/loss=733.061, rew=96.00]


Epoch #2540: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2541: 1025it [00:02, 367.63it/s, env_step=2601984, len=28, n/ep=2, n/st=64, player_2/loss=188.637, rew=841.00]  


Epoch #2541: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2542: 1025it [00:02, 368.29it/s, env_step=2603008, len=19, n/ep=4, n/st=64, player_1/loss=493.381, player_2/loss=113.642, rew=428.00]


Epoch #2542: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2543: 1025it [00:02, 367.50it/s, env_step=2604032, len=17, n/ep=3, n/st=64, player_1/loss=506.124, player_2/loss=113.017, rew=352.67]


Epoch #2543: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2544: 1025it [00:02, 368.03it/s, env_step=2605056, len=29, n/ep=3, n/st=64, player_1/loss=315.613, player_2/loss=94.925, rew=874.00]


Epoch #2544: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2545: 1025it [00:02, 369.36it/s, env_step=2606080, len=23, n/ep=3, n/st=64, player_1/loss=437.636, player_2/loss=321.092, rew=672.67]


Epoch #2545: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2546: 1025it [00:02, 365.02it/s, env_step=2607104, len=19, n/ep=3, n/st=64, player_1/loss=408.871, player_2/loss=407.108, rew=384.00]


Epoch #2546: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2547: 1025it [00:02, 365.28it/s, env_step=2608128, len=7, n/ep=9, n/st=64, player_1/loss=337.931, player_2/loss=282.020, rew=55.78]


Epoch #2547: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2548: 1025it [00:02, 365.41it/s, env_step=2609152, len=31, n/ep=2, n/st=64, player_1/loss=318.406, player_2/loss=224.653, rew=1006.00]


Epoch #2548: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2549: 1025it [00:02, 369.09it/s, env_step=2610176, len=38, n/ep=2, n/st=64, player_1/loss=243.465, player_2/loss=585.797, rew=1481.00]


Epoch #2549: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2550: 1025it [00:02, 366.58it/s, env_step=2611200, len=27, n/ep=2, n/st=64, player_1/loss=581.341, player_2/loss=518.625, rew=754.00]


Epoch #2550: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2551: 1025it [00:02, 366.85it/s, env_step=2612224, len=18, n/ep=3, n/st=64, player_1/loss=597.434, player_2/loss=365.531, rew=358.00]


Epoch #2551: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2552: 1025it [00:02, 367.77it/s, env_step=2613248, len=30, n/ep=2, n/st=64, player_1/loss=212.025, player_2/loss=587.673, rew=961.00]


Epoch #2552: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2553: 1025it [00:02, 370.42it/s, env_step=2614272, len=24, n/ep=3, n/st=64, player_1/loss=327.663, player_2/loss=492.385, rew=647.33]


Epoch #2553: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2554: 1025it [00:02, 364.11it/s, env_step=2615296, len=38, n/ep=2, n/st=64, player_1/loss=333.813, player_2/loss=174.674, rew=1519.00]


Epoch #2554: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2555: 1025it [00:02, 370.02it/s, env_step=2616320, len=38, n/ep=1, n/st=64, player_1/loss=674.981, player_2/loss=186.530, rew=1480.00]


Epoch #2555: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2556: 1025it [00:02, 368.82it/s, env_step=2617344, len=28, n/ep=2, n/st=64, player_1/loss=736.766, player_2/loss=391.273, rew=811.00]


Epoch #2556: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2557: 1025it [00:02, 371.50it/s, env_step=2618368, len=26, n/ep=3, n/st=64, player_1/loss=595.073, player_2/loss=702.438, rew=708.67]


Epoch #2557: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2558: 1025it [00:02, 365.28it/s, env_step=2619392, len=10, n/ep=8, n/st=64, player_1/loss=201.326, player_2/loss=517.170, rew=124.25]


Epoch #2558: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2559: 1025it [00:02, 369.62it/s, env_step=2620416, len=14, n/ep=4, n/st=64, player_1/loss=381.798, player_2/loss=714.823, rew=289.00]


Epoch #2559: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2560: 1025it [00:02, 365.80it/s, env_step=2621440, len=23, n/ep=3, n/st=64, player_1/loss=624.809, player_2/loss=794.709, rew=673.33]


Epoch #2560: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2561: 1025it [00:02, 368.29it/s, env_step=2622464, len=34, n/ep=2, n/st=64, player_1/loss=484.391, player_2/loss=575.599, rew=1189.00]


Epoch #2561: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2562: 1025it [00:02, 365.67it/s, env_step=2623488, len=21, n/ep=2, n/st=64, player_1/loss=343.656, player_2/loss=499.466, rew=460.00]


Epoch #2562: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2563: 1025it [00:02, 371.63it/s, env_step=2624512, len=37, n/ep=2, n/st=64, player_1/loss=191.693, player_2/loss=503.089, rew=1442.00]


Epoch #2563: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2564: 1025it [00:02, 364.24it/s, env_step=2625536, len=19, n/ep=3, n/st=64, player_1/loss=60.062, player_2/loss=247.114, rew=395.33]


Epoch #2564: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2565: 1025it [00:02, 368.56it/s, env_step=2626560, len=37, n/ep=2, n/st=64, player_1/loss=1176.490, player_2/loss=135.102, rew=1448.00]


Epoch #2565: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2566: 1025it [00:02, 364.50it/s, env_step=2627584, len=38, n/ep=1, n/st=64, player_1/loss=1472.706, player_2/loss=139.634, rew=1480.00]


Epoch #2566: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2567: 1025it [00:02, 371.36it/s, env_step=2628608, len=40, n/ep=2, n/st=64, player_1/loss=478.097, player_2/loss=109.383, rew=1639.00]


Epoch #2567: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2568: 1025it [00:02, 359.39it/s, env_step=2629632, len=35, n/ep=2, n/st=64, player_1/loss=308.414, player_2/loss=70.991, rew=1262.00]


Epoch #2568: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2569: 1025it [00:02, 371.63it/s, env_step=2630656, len=26, n/ep=2, n/st=64, player_1/loss=509.903, player_2/loss=461.147, rew=700.00]


Epoch #2569: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2570: 1025it [00:02, 366.32it/s, env_step=2631680, len=29, n/ep=3, n/st=64, player_1/loss=520.030, player_2/loss=486.698, rew=982.67]


Epoch #2570: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2571: 1025it [00:02, 368.82it/s, env_step=2632704, len=37, n/ep=2, n/st=64, player_1/loss=720.951, player_2/loss=81.142, rew=1404.00]


Epoch #2571: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2572: 1025it [00:02, 364.63it/s, env_step=2633728, len=33, n/ep=2, n/st=64, player_1/loss=572.625, player_2/loss=343.392, rew=1154.00]


Epoch #2572: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2573: 1025it [00:02, 366.45it/s, env_step=2634752, len=38, n/ep=2, n/st=64, player_1/loss=366.056, player_2/loss=440.083, rew=1484.00]


Epoch #2573: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2574: 1025it [00:02, 366.45it/s, env_step=2635776, len=29, n/ep=2, n/st=64, player_1/loss=246.766, player_2/loss=268.843, rew=940.00]


Epoch #2574: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2575: 1025it [00:02, 367.63it/s, env_step=2636800, len=36, n/ep=2, n/st=64, player_1/loss=495.709, player_2/loss=392.500, rew=1334.00]


Epoch #2575: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2576: 1025it [00:02, 366.58it/s, env_step=2637824, len=37, n/ep=2, n/st=64, player_1/loss=726.650, player_2/loss=502.725, rew=1442.00]


Epoch #2576: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2577: 1025it [00:02, 371.50it/s, env_step=2638848, len=25, n/ep=2, n/st=64, player_1/loss=472.513, player_2/loss=252.187, rew=676.00]


Epoch #2577: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2578: 1025it [00:02, 366.58it/s, env_step=2639872, len=28, n/ep=3, n/st=64, player_1/loss=747.983, rew=814.67]  


Epoch #2578: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2579: 1025it [00:02, 368.56it/s, env_step=2640896, len=11, n/ep=6, n/st=64, player_1/loss=954.090, player_2/loss=517.243, rew=165.33]


Epoch #2579: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2580: 1025it [00:02, 368.43it/s, env_step=2641920, len=29, n/ep=3, n/st=64, player_1/loss=605.029, player_2/loss=335.408, rew=870.00]


Epoch #2580: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2581: 1025it [00:02, 368.96it/s, env_step=2642944, len=26, n/ep=2, n/st=64, player_1/loss=561.303, player_2/loss=252.269, rew=727.00]


Epoch #2581: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2582: 1025it [00:02, 368.43it/s, env_step=2643968, len=19, n/ep=3, n/st=64, player_1/loss=301.769, player_2/loss=356.591, rew=414.00]


Epoch #2582: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2583: 1025it [00:02, 368.56it/s, env_step=2644992, len=29, n/ep=3, n/st=64, player_1/loss=146.341, player_2/loss=169.637, rew=889.33]


Epoch #2583: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2584: 1025it [00:02, 363.98it/s, env_step=2646016, len=30, n/ep=2, n/st=64, player_1/loss=528.196, player_2/loss=77.659, rew=977.00]


Epoch #2584: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2585: 1025it [00:02, 367.63it/s, env_step=2647040, len=28, n/ep=2, n/st=64, player_1/loss=637.165, player_2/loss=156.447, rew=826.00]


Epoch #2585: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2586: 1025it [00:02, 367.77it/s, env_step=2648064, len=28, n/ep=2, n/st=64, player_1/loss=776.549, player_2/loss=247.719, rew=839.00]


Epoch #2586: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2587: 1025it [00:02, 368.96it/s, env_step=2649088, len=29, n/ep=2, n/st=64, player_1/loss=591.640, player_2/loss=395.822, rew=904.00]


Epoch #2587: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2588: 1025it [00:02, 367.77it/s, env_step=2650112, len=36, n/ep=2, n/st=64, player_1/loss=591.669, player_2/loss=932.306, rew=1334.00]


Epoch #2588: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2589: 1025it [00:02, 372.04it/s, env_step=2651136, len=21, n/ep=3, n/st=64, player_1/loss=482.530, player_2/loss=685.050, rew=537.33]


Epoch #2589: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2590: 1025it [00:02, 363.34it/s, env_step=2652160, len=28, n/ep=3, n/st=64, player_1/loss=499.018, player_2/loss=395.710, rew=898.67]


Epoch #2590: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2591: 1025it [00:02, 371.77it/s, env_step=2653184, len=26, n/ep=3, n/st=64, player_1/loss=581.226, player_2/loss=636.967, rew=742.67]


Epoch #2591: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2592: 1025it [00:02, 365.54it/s, env_step=2654208, len=32, n/ep=2, n/st=64, player_1/loss=480.704, player_2/loss=653.983, rew=1107.00]


Epoch #2592: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2593: 1025it [00:02, 367.50it/s, env_step=2655232, len=27, n/ep=2, n/st=64, player_1/loss=409.896, player_2/loss=602.831, rew=782.00]


Epoch #2593: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2594: 1025it [00:02, 365.93it/s, env_step=2656256, len=33, n/ep=2, n/st=64, player_1/loss=377.363, player_2/loss=609.346, rew=1156.00]


Epoch #2594: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2595: 1025it [00:02, 371.50it/s, env_step=2657280, len=29, n/ep=2, n/st=64, player_1/loss=542.964, player_2/loss=102.955, rew=869.00]


Epoch #2595: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2596: 1025it [00:02, 368.96it/s, env_step=2658304, len=36, n/ep=2, n/st=64, player_1/loss=426.032, player_2/loss=182.828, rew=1331.00]


Epoch #2596: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2597: 1025it [00:02, 369.49it/s, env_step=2659328, len=29, n/ep=2, n/st=64, player_1/loss=167.204, player_2/loss=211.966, rew=884.00]


Epoch #2597: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2598: 1025it [00:02, 366.58it/s, env_step=2660352, len=36, n/ep=2, n/st=64, player_1/loss=142.852, player_2/loss=224.076, rew=1331.00]


Epoch #2598: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2599: 1025it [00:02, 369.76it/s, env_step=2661376, len=28, n/ep=2, n/st=64, player_1/loss=261.949, player_2/loss=203.810, rew=881.00]


Epoch #2599: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2600: 1025it [00:02, 365.93it/s, env_step=2662400, len=38, n/ep=1, n/st=64, player_1/loss=257.366, player_2/loss=326.176, rew=1480.00]


Epoch #2600: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2601: 1025it [00:02, 369.09it/s, env_step=2663424, len=25, n/ep=2, n/st=64, player_1/loss=504.048, player_2/loss=299.744, rew=674.00]


Epoch #2601: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2602: 1025it [00:02, 371.36it/s, env_step=2664448, len=34, n/ep=2, n/st=64, player_1/loss=1064.526, player_2/loss=279.361, rew=1213.00]


Epoch #2602: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2603: 1025it [00:02, 368.69it/s, env_step=2665472, len=38, n/ep=1, n/st=64, player_1/loss=683.386, player_2/loss=290.386, rew=1480.00]


Epoch #2603: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2604: 1025it [00:02, 369.62it/s, env_step=2666496, len=25, n/ep=2, n/st=64, player_1/loss=264.111, player_2/loss=620.123, rew=730.00]


Epoch #2604: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2605: 1025it [00:02, 366.71it/s, env_step=2667520, len=23, n/ep=2, n/st=64, player_1/loss=269.350, player_2/loss=577.294, rew=806.00]


Epoch #2605: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2606: 1025it [00:02, 368.03it/s, env_step=2668544, len=32, n/ep=2, n/st=64, player_1/loss=240.338, rew=1079.00] 


Epoch #2606: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2607: 1025it [00:02, 366.32it/s, env_step=2669568, len=35, n/ep=1, n/st=64, player_1/loss=103.960, player_2/loss=401.817, rew=1258.00]


Epoch #2607: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2608: 1025it [00:02, 368.16it/s, env_step=2670592, len=21, n/ep=2, n/st=64, player_1/loss=140.368, player_2/loss=541.568, rew=496.00]


Epoch #2608: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2609: 1025it [00:02, 366.85it/s, env_step=2671616, len=27, n/ep=3, n/st=64, player_1/loss=305.211, player_2/loss=577.392, rew=853.33]


Epoch #2609: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2610: 1025it [00:02, 370.42it/s, env_step=2672640, len=28, n/ep=2, n/st=64, player_1/loss=267.731, player_2/loss=719.662, rew=851.00]


Epoch #2610: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2611: 1025it [00:02, 367.77it/s, env_step=2673664, len=28, n/ep=2, n/st=64, player_1/loss=216.281, player_2/loss=437.568, rew=851.00]


Epoch #2611: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2612: 1025it [00:02, 368.69it/s, env_step=2674688, len=27, n/ep=3, n/st=64, player_1/loss=206.835, player_2/loss=118.001, rew=772.67]


Epoch #2612: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2613: 1025it [00:02, 368.82it/s, env_step=2675712, len=36, n/ep=2, n/st=64, player_1/loss=230.038, player_2/loss=265.910, rew=1339.00]


Epoch #2613: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2614: 1025it [00:02, 368.82it/s, env_step=2676736, len=40, n/ep=1, n/st=64, player_1/loss=251.812, player_2/loss=348.887, rew=1638.00]


Epoch #2614: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2615: 1025it [00:02, 364.89it/s, env_step=2677760, len=31, n/ep=2, n/st=64, player_1/loss=461.180, player_2/loss=747.195, rew=1078.00]


Epoch #2615: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2616: 1025it [00:02, 370.69it/s, env_step=2678784, len=33, n/ep=2, n/st=64, player_1/loss=475.828, player_2/loss=689.379, rew=1166.00]


Epoch #2616: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2617: 1025it [00:02, 367.24it/s, env_step=2679808, len=32, n/ep=2, n/st=64, player_1/loss=305.503, player_2/loss=524.226, rew=1054.00]


Epoch #2617: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2618: 1025it [00:02, 371.50it/s, env_step=2680832, len=24, n/ep=3, n/st=64, player_1/loss=876.702, player_2/loss=413.293, rew=720.67]


Epoch #2618: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2619: 1025it [00:02, 364.50it/s, env_step=2681856, len=35, n/ep=2, n/st=64, player_1/loss=1146.817, player_2/loss=228.292, rew=1294.00]


Epoch #2619: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2620: 1025it [00:02, 367.50it/s, env_step=2682880, len=27, n/ep=3, n/st=64, player_1/loss=707.328, player_2/loss=326.577, rew=754.67]


Epoch #2620: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2621: 1025it [00:02, 363.98it/s, env_step=2683904, len=20, n/ep=3, n/st=64, player_1/loss=478.142, player_2/loss=476.382, rew=512.67]


Epoch #2621: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2622: 1025it [00:02, 368.96it/s, env_step=2684928, len=35, n/ep=2, n/st=64, player_1/loss=299.227, player_2/loss=568.179, rew=1267.00]


Epoch #2622: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2623: 1025it [00:02, 366.98it/s, env_step=2685952, len=29, n/ep=2, n/st=64, player_1/loss=650.527, player_2/loss=428.519, rew=869.00]


Epoch #2623: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2624: 1025it [00:02, 370.16it/s, env_step=2686976, len=28, n/ep=2, n/st=64, player_1/loss=803.107, player_2/loss=790.361, rew=811.00]


Epoch #2624: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2625: 1025it [00:02, 367.77it/s, env_step=2688000, len=38, n/ep=1, n/st=64, player_1/loss=428.133, player_2/loss=1057.418, rew=1480.00]


Epoch #2625: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2626: 1025it [00:02, 368.29it/s, env_step=2689024, len=42, n/ep=1, n/st=64, player_1/loss=183.919, player_2/loss=786.743, rew=1834.00]


Epoch #2626: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2627: 1025it [00:02, 365.02it/s, env_step=2690048, len=29, n/ep=2, n/st=64, player_1/loss=86.271, player_2/loss=802.643, rew=904.00]


Epoch #2627: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2628: 1025it [00:02, 369.09it/s, env_step=2691072, len=32, n/ep=2, n/st=64, player_1/loss=81.330, player_2/loss=408.741, rew=1079.00]


Epoch #2628: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2629: 1025it [00:02, 366.58it/s, env_step=2692096, len=38, n/ep=1, n/st=64, player_1/loss=211.022, player_2/loss=123.675, rew=1480.00]


Epoch #2629: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2630: 1025it [00:02, 371.77it/s, env_step=2693120, len=39, n/ep=2, n/st=64, player_1/loss=201.316, player_2/loss=130.466, rew=1562.00]


Epoch #2630: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2631: 1025it [00:02, 368.43it/s, env_step=2694144, len=33, n/ep=2, n/st=64, player_1/loss=459.304, player_2/loss=120.056, rew=1145.00]


Epoch #2631: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2632: 1025it [00:02, 362.95it/s, env_step=2695168, len=19, n/ep=3, n/st=64, player_1/loss=810.639, player_2/loss=296.459, rew=410.67]


Epoch #2632: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2633: 1025it [00:02, 369.09it/s, env_step=2696192, len=20, n/ep=3, n/st=64, player_1/loss=506.223, player_2/loss=297.692, rew=447.33]


Epoch #2633: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2634: 1025it [00:02, 366.32it/s, env_step=2697216, len=23, n/ep=4, n/st=64, player_1/loss=288.334, player_2/loss=614.031, rew=644.50]


Epoch #2634: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2635: 1025it [00:02, 368.43it/s, env_step=2698240, len=33, n/ep=2, n/st=64, player_1/loss=610.407, player_2/loss=805.352, rew=1154.00]


Epoch #2635: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2636: 1025it [00:02, 365.80it/s, env_step=2699264, len=26, n/ep=3, n/st=64, player_1/loss=528.794, player_2/loss=862.478, rew=719.33]


Epoch #2636: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2637: 1025it [00:02, 369.62it/s, env_step=2700288, len=20, n/ep=3, n/st=64, player_1/loss=366.614, player_2/loss=1256.461, rew=450.67]


Epoch #2637: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2638: 1025it [00:02, 364.50it/s, env_step=2701312, len=22, n/ep=3, n/st=64, player_1/loss=387.290, player_2/loss=1038.700, rew=602.67]


Epoch #2638: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2639: 1025it [00:02, 368.96it/s, env_step=2702336, len=23, n/ep=3, n/st=64, player_1/loss=327.088, player_2/loss=514.106, rew=552.67]


Epoch #2639: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2640: 1025it [00:02, 367.63it/s, env_step=2703360, len=30, n/ep=2, n/st=64, player_1/loss=399.113, player_2/loss=800.598, rew=929.00]


Epoch #2640: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2641: 1025it [00:02, 370.02it/s, env_step=2704384, len=35, n/ep=2, n/st=64, player_1/loss=329.641, player_2/loss=468.041, rew=1300.00]


Epoch #2641: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2642: 1025it [00:02, 367.37it/s, env_step=2705408, len=31, n/ep=2, n/st=64, player_1/loss=468.687, player_2/loss=258.508, rew=1026.00]


Epoch #2642: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2643: 1025it [00:02, 369.76it/s, env_step=2706432, len=30, n/ep=2, n/st=64, player_1/loss=417.122, player_2/loss=267.436, rew=929.00]


Epoch #2643: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2644: 1025it [00:02, 368.16it/s, env_step=2707456, len=24, n/ep=3, n/st=64, player_1/loss=55.135, player_2/loss=125.637, rew=651.33]


Epoch #2644: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2645: 1025it [00:02, 368.56it/s, env_step=2708480, len=30, n/ep=2, n/st=64, player_1/loss=41.342, player_2/loss=484.491, rew=1001.00]


Epoch #2645: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2646: 1025it [00:02, 366.06it/s, env_step=2709504, len=22, n/ep=3, n/st=64, player_1/loss=325.829, player_2/loss=451.178, rew=602.67]


Epoch #2646: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2647: 1025it [00:02, 368.96it/s, env_step=2710528, len=29, n/ep=2, n/st=64, player_1/loss=334.950, player_2/loss=310.448, rew=898.00]


Epoch #2647: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2648: 1025it [00:02, 367.90it/s, env_step=2711552, len=31, n/ep=2, n/st=64, player_1/loss=247.983, player_2/loss=355.949, rew=1006.00]


Epoch #2648: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2649: 1025it [00:02, 371.50it/s, env_step=2712576, len=28, n/ep=2, n/st=64, player_1/loss=493.271, player_2/loss=123.894, rew=869.00]


Epoch #2649: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2650: 1025it [00:02, 363.21it/s, env_step=2713600, len=30, n/ep=2, n/st=64, player_1/loss=651.852, player_2/loss=88.356, rew=937.00]


Epoch #2650: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2651: 1025it [00:02, 371.23it/s, env_step=2714624, len=23, n/ep=2, n/st=64, player_1/loss=872.850, player_2/loss=210.206, rew=551.00]


Epoch #2651: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2652: 1025it [00:02, 368.29it/s, env_step=2715648, len=36, n/ep=2, n/st=64, player_1/loss=913.453, player_2/loss=511.201, rew=1334.00]


Epoch #2652: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2653: 1025it [00:02, 369.76it/s, env_step=2716672, len=25, n/ep=2, n/st=64, player_1/loss=640.582, player_2/loss=844.846, rew=686.00]


Epoch #2653: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2654: 1025it [00:02, 367.11it/s, env_step=2717696, len=32, n/ep=2, n/st=64, player_1/loss=680.736, player_2/loss=473.314, rew=1093.00]


Epoch #2654: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2655: 1025it [00:02, 363.59it/s, env_step=2718720, len=30, n/ep=2, n/st=64, player_1/loss=772.390, player_2/loss=155.558, rew=929.00]


Epoch #2655: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2656: 1025it [00:02, 367.77it/s, env_step=2719744, len=32, n/ep=2, n/st=64, player_1/loss=987.598, player_2/loss=550.882, rew=1058.00]


Epoch #2656: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2657: 1025it [00:02, 371.23it/s, env_step=2720768, len=34, n/ep=2, n/st=64, player_1/loss=1040.563, player_2/loss=855.367, rew=1213.00]


Epoch #2657: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2658: 1025it [00:02, 368.82it/s, env_step=2721792, len=29, n/ep=2, n/st=64, player_1/loss=854.387, player_2/loss=867.755, rew=918.00]


Epoch #2658: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2659: 1025it [00:02, 367.90it/s, env_step=2722816, len=29, n/ep=3, n/st=64, player_1/loss=375.810, rew=954.67]  


Epoch #2659: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2660: 1025it [00:02, 365.02it/s, env_step=2723840, len=34, n/ep=2, n/st=64, player_1/loss=124.522, player_2/loss=387.427, rew=1213.00]


Epoch #2660: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2661: 1025it [00:02, 367.77it/s, env_step=2724864, len=35, n/ep=2, n/st=64, player_1/loss=172.967, player_2/loss=263.969, rew=1283.00]


Epoch #2661: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2662: 1025it [00:02, 369.09it/s, env_step=2725888, len=25, n/ep=3, n/st=64, player_1/loss=665.716, player_2/loss=235.537, rew=704.00]


Epoch #2662: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2663: 1025it [00:02, 368.16it/s, env_step=2726912, len=20, n/ep=2, n/st=64, player_1/loss=624.708, player_2/loss=412.127, rew=549.00]


Epoch #2663: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2664: 1025it [00:02, 368.16it/s, env_step=2727936, len=21, n/ep=3, n/st=64, player_1/loss=291.861, player_2/loss=392.926, rew=460.67]


Epoch #2664: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2665: 1025it [00:02, 369.49it/s, env_step=2728960, len=9, n/ep=7, n/st=64, player_1/loss=632.873, player_2/loss=409.485, rew=127.14]


Epoch #2665: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2666: 1025it [00:02, 362.69it/s, env_step=2729984, len=10, n/ep=8, n/st=64, player_1/loss=788.211, player_2/loss=672.286, rew=159.50]


Epoch #2666: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2667: 1025it [00:02, 368.56it/s, env_step=2731008, len=15, n/ep=3, n/st=64, player_1/loss=780.666, player_2/loss=845.153, rew=335.33]


Epoch #2667: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2668: 1025it [00:02, 364.50it/s, env_step=2732032, len=21, n/ep=3, n/st=64, player_1/loss=584.349, player_2/loss=651.654, rew=481.33]


Epoch #2668: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2669: 1025it [00:02, 370.69it/s, env_step=2733056, len=25, n/ep=2, n/st=64, player_1/loss=265.864, player_2/loss=720.815, rew=746.00]


Epoch #2669: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2670: 1025it [00:02, 366.58it/s, env_step=2734080, len=21, n/ep=3, n/st=64, player_1/loss=186.129, player_2/loss=651.282, rew=478.67]


Epoch #2670: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2671: 1025it [00:02, 368.69it/s, env_step=2735104, len=36, n/ep=2, n/st=64, player_1/loss=312.855, player_2/loss=195.966, rew=1373.00]


Epoch #2671: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2672: 1025it [00:02, 367.37it/s, env_step=2736128, len=39, n/ep=1, n/st=64, player_1/loss=691.563, player_2/loss=69.609, rew=1558.00]


Epoch #2672: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2673: 1025it [00:02, 360.65it/s, env_step=2737152, len=25, n/ep=3, n/st=64, player_1/loss=493.889, player_2/loss=182.322, rew=741.33]


Epoch #2673: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2674: 1025it [00:02, 357.13it/s, env_step=2738176, len=40, n/ep=1, n/st=64, player_1/loss=235.671, player_2/loss=527.316, rew=1638.00]


Epoch #2674: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2675: 1025it [00:02, 359.39it/s, env_step=2739200, len=22, n/ep=3, n/st=64, player_1/loss=223.120, player_2/loss=816.857, rew=658.67]


Epoch #2675: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2676: 1025it [00:02, 358.76it/s, env_step=2740224, len=29, n/ep=2, n/st=64, player_1/loss=582.202, player_2/loss=891.713, rew=877.00]


Epoch #2676: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2677: 1025it [00:02, 357.88it/s, env_step=2741248, len=31, n/ep=2, n/st=64, player_1/loss=478.325, player_2/loss=607.999, rew=994.00]


Epoch #2677: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2678: 1025it [00:02, 356.64it/s, env_step=2742272, len=33, n/ep=2, n/st=64, player_1/loss=205.189, player_2/loss=531.086, rew=1120.00]


Epoch #2678: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2679: 1025it [00:02, 356.02it/s, env_step=2743296, len=34, n/ep=2, n/st=64, player_1/loss=405.590, player_2/loss=365.740, rew=1223.00]


Epoch #2679: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2680: 1025it [00:02, 358.76it/s, env_step=2744320, len=27, n/ep=2, n/st=64, player_1/loss=367.827, player_2/loss=470.312, rew=754.00]


Epoch #2680: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2681: 1025it [00:02, 357.76it/s, env_step=2745344, len=24, n/ep=3, n/st=64, player_1/loss=862.032, player_2/loss=360.701, rew=626.00]


Epoch #2681: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2682: 1025it [00:02, 362.18it/s, env_step=2746368, len=25, n/ep=2, n/st=64, player_1/loss=881.469, player_2/loss=333.841, rew=648.00]


Epoch #2682: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2683: 1025it [00:02, 361.54it/s, env_step=2747392, len=35, n/ep=2, n/st=64, player_1/loss=395.250, player_2/loss=228.085, rew=1258.00]


Epoch #2683: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2684: 1025it [00:02, 363.21it/s, env_step=2748416, len=33, n/ep=2, n/st=64, player_1/loss=643.127, player_2/loss=117.173, rew=1124.00]


Epoch #2684: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2685: 1025it [00:02, 358.51it/s, env_step=2749440, len=29, n/ep=2, n/st=64, player_1/loss=493.791, player_2/loss=142.587, rew=918.00]


Epoch #2685: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2686: 1025it [00:02, 362.18it/s, env_step=2750464, len=31, n/ep=1, n/st=64, player_1/loss=360.838, player_2/loss=105.805, rew=990.00]


Epoch #2686: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2687: 1025it [00:02, 360.91it/s, env_step=2751488, len=15, n/ep=4, n/st=64, player_1/loss=219.619, player_2/loss=343.592, rew=319.00]


Epoch #2687: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2688: 1025it [00:02, 359.01it/s, env_step=2752512, len=39, n/ep=1, n/st=64, player_1/loss=295.993, player_2/loss=660.583, rew=1558.00]


Epoch #2688: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2689: 1025it [00:02, 358.63it/s, env_step=2753536, len=30, n/ep=2, n/st=64, player_1/loss=387.860, player_2/loss=405.099, rew=1009.00]


Epoch #2689: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2690: 1025it [00:02, 364.11it/s, env_step=2754560, len=34, n/ep=2, n/st=64, player_1/loss=208.343, player_2/loss=96.354, rew=1235.00]


Epoch #2690: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2691: 1025it [00:02, 358.01it/s, env_step=2755584, len=27, n/ep=2, n/st=64, player_1/loss=221.637, player_2/loss=173.789, rew=824.00]


Epoch #2691: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2692: 1025it [00:02, 360.52it/s, env_step=2756608, len=38, n/ep=2, n/st=64, player_1/loss=216.824, player_2/loss=185.106, rew=1546.00]


Epoch #2692: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2693: 1025it [00:02, 362.82it/s, env_step=2757632, len=32, n/ep=2, n/st=64, player_1/loss=176.283, player_2/loss=340.939, rew=1058.00]


Epoch #2693: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2694: 1025it [00:02, 360.14it/s, env_step=2758656, len=27, n/ep=2, n/st=64, player_1/loss=160.168, player_2/loss=844.730, rew=758.00]


Epoch #2694: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2695: 1025it [00:02, 363.08it/s, env_step=2759680, len=25, n/ep=3, n/st=64, player_1/loss=612.184, player_2/loss=1066.858, rew=669.33]


Epoch #2695: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2696: 1025it [00:02, 357.26it/s, env_step=2760704, len=24, n/ep=3, n/st=64, player_1/loss=585.070, player_2/loss=885.864, rew=686.67]


Epoch #2696: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2697: 1025it [00:02, 362.69it/s, env_step=2761728, len=39, n/ep=2, n/st=64, player_1/loss=214.945, player_2/loss=688.495, rew=1582.00]


Epoch #2697: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2698: 1025it [00:02, 359.77it/s, env_step=2762752, len=29, n/ep=3, n/st=64, player_1/loss=286.109, player_2/loss=1225.412, rew=1021.33]


Epoch #2698: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2699: 1025it [00:02, 360.40it/s, env_step=2763776, len=18, n/ep=4, n/st=64, player_1/loss=528.719, player_2/loss=1178.952, rew=359.00]


Epoch #2699: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2700: 1025it [00:02, 360.52it/s, env_step=2764800, len=21, n/ep=3, n/st=64, player_1/loss=395.586, player_2/loss=448.960, rew=494.67]


Epoch #2700: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2701: 1025it [00:02, 363.85it/s, env_step=2765824, len=27, n/ep=2, n/st=64, player_1/loss=266.009, player_2/loss=317.398, rew=758.00]


Epoch #2701: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2702: 1025it [00:02, 357.01it/s, env_step=2766848, len=33, n/ep=2, n/st=64, player_1/loss=478.056, player_2/loss=205.746, rew=1156.00]


Epoch #2702: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2703: 1025it [00:02, 361.67it/s, env_step=2767872, len=27, n/ep=3, n/st=64, player_1/loss=524.588, player_2/loss=588.608, rew=794.00]


Epoch #2703: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2704: 1025it [00:02, 358.13it/s, env_step=2768896, len=40, n/ep=2, n/st=64, player_1/loss=607.823, player_2/loss=798.197, rew=1657.00]


Epoch #2704: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2705: 1025it [00:02, 363.72it/s, env_step=2769920, len=36, n/ep=2, n/st=64, player_1/loss=496.860, player_2/loss=624.013, rew=1331.00]


Epoch #2705: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2706: 1025it [00:02, 356.64it/s, env_step=2770944, len=35, n/ep=2, n/st=64, player_1/loss=377.385, player_2/loss=466.103, rew=1283.00]


Epoch #2706: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2707: 1025it [00:02, 362.56it/s, env_step=2771968, len=24, n/ep=2, n/st=64, player_1/loss=473.359, player_2/loss=551.011, rew=599.00]


Epoch #2707: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2708: 1025it [00:02, 358.38it/s, env_step=2772992, len=26, n/ep=3, n/st=64, player_1/loss=474.381, player_2/loss=558.945, rew=737.33]


Epoch #2708: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2709: 1025it [00:02, 363.72it/s, env_step=2774016, len=26, n/ep=2, n/st=64, player_1/loss=241.534, player_2/loss=786.762, rew=700.00]


Epoch #2709: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2710: 1025it [00:02, 356.76it/s, env_step=2775040, len=25, n/ep=3, n/st=64, player_1/loss=315.586, player_2/loss=581.336, rew=704.00]


Epoch #2710: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2711: 1025it [00:02, 362.95it/s, env_step=2776064, len=28, n/ep=2, n/st=64, player_1/loss=375.020, player_2/loss=542.228, rew=929.00]


Epoch #2711: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2712: 1025it [00:02, 361.41it/s, env_step=2777088, len=26, n/ep=2, n/st=64, player_1/loss=225.573, player_2/loss=732.105, rew=716.00]


Epoch #2712: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2713: 1025it [00:02, 362.31it/s, env_step=2778112, len=37, n/ep=2, n/st=64, player_1/loss=170.394, player_2/loss=317.931, rew=1408.00]


Epoch #2713: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2714: 1025it [00:02, 359.77it/s, env_step=2779136, len=37, n/ep=2, n/st=64, player_1/loss=219.194, player_2/loss=119.678, rew=1444.00]


Epoch #2714: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2715: 1025it [00:02, 360.02it/s, env_step=2780160, len=22, n/ep=3, n/st=64, player_1/loss=313.496, player_2/loss=817.330, rew=534.67]


Epoch #2715: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2716: 1025it [00:02, 362.69it/s, env_step=2781184, len=23, n/ep=3, n/st=64, player_1/loss=396.589, player_2/loss=810.003, rew=574.67]


Epoch #2716: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2717: 1025it [00:02, 361.29it/s, env_step=2782208, len=20, n/ep=3, n/st=64, player_1/loss=448.965, player_2/loss=291.077, rew=418.67]


Epoch #2717: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2718: 1025it [00:02, 365.28it/s, env_step=2783232, len=35, n/ep=2, n/st=64, player_1/loss=441.771, player_2/loss=822.771, rew=1306.00]


Epoch #2718: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2719: 1025it [00:02, 362.56it/s, env_step=2784256, len=21, n/ep=3, n/st=64, player_1/loss=777.588, player_2/loss=822.312, rew=548.00]


Epoch #2719: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2720: 1025it [00:02, 362.44it/s, env_step=2785280, len=29, n/ep=3, n/st=64, player_1/loss=760.346, player_2/loss=167.258, rew=919.33]


Epoch #2720: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2721: 1025it [00:02, 362.44it/s, env_step=2786304, len=31, n/ep=2, n/st=64, player_1/loss=398.193, player_2/loss=508.265, rew=1024.00]


Epoch #2721: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2722: 1025it [00:02, 367.77it/s, env_step=2787328, len=24, n/ep=3, n/st=64, player_1/loss=333.615, player_2/loss=892.094, rew=608.67]


Epoch #2722: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2723: 1025it [00:02, 364.76it/s, env_step=2788352, len=25, n/ep=3, n/st=64, player_1/loss=580.617, player_2/loss=969.821, rew=656.00]


Epoch #2723: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2724: 1025it [00:02, 365.41it/s, env_step=2789376, len=19, n/ep=3, n/st=64, player_1/loss=666.646, player_2/loss=990.593, rew=408.67]


Epoch #2724: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2725: 1025it [00:02, 362.82it/s, env_step=2790400, len=23, n/ep=3, n/st=64, player_1/loss=467.093, player_2/loss=833.772, rew=554.67]


Epoch #2725: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2726: 1025it [00:02, 368.96it/s, env_step=2791424, len=33, n/ep=2, n/st=64, player_1/loss=353.530, player_2/loss=509.232, rew=1136.00]


Epoch #2726: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2727: 1025it [00:02, 362.95it/s, env_step=2792448, len=22, n/ep=3, n/st=64, player_1/loss=309.157, player_2/loss=373.859, rew=596.00]


Epoch #2727: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2728: 1025it [00:02, 366.71it/s, env_step=2793472, len=26, n/ep=3, n/st=64, player_1/loss=238.152, player_2/loss=370.982, rew=828.67]


Epoch #2728: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2729: 1025it [00:02, 364.50it/s, env_step=2794496, len=22, n/ep=3, n/st=64, player_1/loss=267.530, player_2/loss=56.949, rew=512.67]


Epoch #2729: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2730: 1025it [00:02, 368.43it/s, env_step=2795520, len=31, n/ep=2, n/st=64, player_1/loss=236.982, player_2/loss=169.295, rew=1015.00]


Epoch #2730: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2731: 1025it [00:02, 362.69it/s, env_step=2796544, len=22, n/ep=3, n/st=64, player_1/loss=508.853, player_2/loss=420.576, rew=522.00]


Epoch #2731: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2732: 1025it [00:02, 366.45it/s, env_step=2797568, len=38, n/ep=2, n/st=64, player_1/loss=528.723, player_2/loss=430.177, rew=1489.00]


Epoch #2732: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2733: 1025it [00:02, 364.50it/s, env_step=2798592, len=33, n/ep=2, n/st=64, player_1/loss=340.214, player_2/loss=57.134, rew=1174.00]


Epoch #2733: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2734: 1025it [00:02, 367.37it/s, env_step=2799616, len=20, n/ep=3, n/st=64, player_1/loss=365.672, player_2/loss=332.253, rew=434.67]


Epoch #2734: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2735: 1025it [00:02, 366.32it/s, env_step=2800640, len=19, n/ep=3, n/st=64, player_1/loss=332.752, player_2/loss=392.126, rew=380.67]


Epoch #2735: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2736: 1025it [00:02, 368.29it/s, env_step=2801664, len=20, n/ep=4, n/st=64, player_1/loss=342.264, player_2/loss=260.434, rew=562.50]


Epoch #2736: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2737: 1025it [00:02, 365.80it/s, env_step=2802688, len=16, n/ep=4, n/st=64, player_1/loss=284.984, player_2/loss=441.336, rew=337.50]


Epoch #2737: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2738: 1025it [00:02, 366.19it/s, env_step=2803712, len=21, n/ep=3, n/st=64, player_1/loss=260.820, player_2/loss=317.281, rew=460.67]


Epoch #2738: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2739: 1025it [00:02, 365.54it/s, env_step=2804736, len=16, n/ep=4, n/st=64, player_1/loss=247.155, player_2/loss=278.256, rew=306.50]


Epoch #2739: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2740: 1025it [00:02, 367.24it/s, env_step=2805760, len=15, n/ep=4, n/st=64, player_1/loss=748.504, player_2/loss=289.021, rew=240.00]


Epoch #2740: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2741: 1025it [00:02, 366.19it/s, env_step=2806784, len=22, n/ep=3, n/st=64, player_1/loss=787.214, player_2/loss=626.509, rew=527.33]


Epoch #2741: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2742: 1025it [00:02, 369.09it/s, env_step=2807808, len=24, n/ep=2, n/st=64, player_1/loss=331.263, player_2/loss=599.363, rew=599.00]


Epoch #2742: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2743: 1025it [00:02, 364.11it/s, env_step=2808832, len=34, n/ep=2, n/st=64, player_1/loss=496.021, player_2/loss=393.866, rew=1213.00]


Epoch #2743: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2744: 1025it [00:02, 367.63it/s, env_step=2809856, len=38, n/ep=2, n/st=64, player_1/loss=554.860, player_2/loss=632.381, rew=1481.00]


Epoch #2744: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2745: 1025it [00:02, 362.69it/s, env_step=2810880, len=34, n/ep=2, n/st=64, player_1/loss=284.815, player_2/loss=513.115, rew=1192.00]


Epoch #2745: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2746: 1025it [00:02, 367.24it/s, env_step=2811904, len=27, n/ep=2, n/st=64, player_1/loss=111.858, player_2/loss=267.343, rew=824.00]


Epoch #2746: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2747: 1025it [00:02, 365.54it/s, env_step=2812928, len=38, n/ep=2, n/st=64, player_1/loss=110.677, player_2/loss=68.203, rew=1480.00]


Epoch #2747: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2748: 1025it [00:02, 368.16it/s, env_step=2813952, len=21, n/ep=2, n/st=64, player_1/loss=239.398, player_2/loss=114.440, rew=554.00]


Epoch #2748: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2749: 1025it [00:02, 364.50it/s, env_step=2814976, len=37, n/ep=2, n/st=64, player_1/loss=505.866, player_2/loss=117.760, rew=1408.00]


Epoch #2749: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2750: 1025it [00:02, 368.29it/s, env_step=2816000, len=24, n/ep=2, n/st=64, player_1/loss=388.924, player_2/loss=378.696, rew=599.00]


Epoch #2750: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2751: 1025it [00:02, 364.37it/s, env_step=2817024, len=19, n/ep=3, n/st=64, player_1/loss=79.865, player_2/loss=614.987, rew=458.00]


Epoch #2751: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2752: 1025it [00:02, 368.82it/s, env_step=2818048, len=35, n/ep=2, n/st=64, player_1/loss=111.729, player_2/loss=625.699, rew=1300.00]


Epoch #2752: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2753: 1025it [00:02, 364.89it/s, env_step=2819072, len=29, n/ep=2, n/st=64, player_1/loss=133.479, player_2/loss=681.190, rew=869.00]


Epoch #2753: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2754: 1025it [00:02, 365.67it/s, env_step=2820096, len=32, n/ep=2, n/st=64, player_1/loss=145.733, player_2/loss=436.772, rew=1079.00]


Epoch #2754: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2755: 1025it [00:02, 362.82it/s, env_step=2821120, len=14, n/ep=4, n/st=64, player_1/loss=197.823, player_2/loss=276.485, rew=230.50]


Epoch #2755: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2756: 1025it [00:02, 362.44it/s, env_step=2822144, len=14, n/ep=5, n/st=64, player_1/loss=36.510, player_2/loss=632.430, rew=236.80]


Epoch #2756: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2757: 1025it [00:02, 365.54it/s, env_step=2823168, len=30, n/ep=3, n/st=64, player_1/loss=33.659, player_2/loss=830.344, rew=936.00]


Epoch #2757: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2758: 1025it [00:02, 363.46it/s, env_step=2824192, len=23, n/ep=3, n/st=64, player_1/loss=464.963, player_2/loss=716.221, rew=552.67]


Epoch #2758: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2759: 1025it [00:02, 366.32it/s, env_step=2825216, len=17, n/ep=3, n/st=64, player_1/loss=648.002, player_2/loss=656.848, rew=318.67]


Epoch #2759: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2760: 1025it [00:02, 364.11it/s, env_step=2826240, len=15, n/ep=4, n/st=64, player_1/loss=382.011, player_2/loss=382.220, rew=240.00]


Epoch #2760: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2761: 1025it [00:02, 369.36it/s, env_step=2827264, len=16, n/ep=4, n/st=64, player_1/loss=324.067, player_2/loss=63.825, rew=296.50]


Epoch #2761: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2762: 1025it [00:02, 363.46it/s, env_step=2828288, len=26, n/ep=3, n/st=64, player_1/loss=258.968, player_2/loss=289.638, rew=732.00]


Epoch #2762: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2763: 1025it [00:02, 363.46it/s, env_step=2829312, len=25, n/ep=3, n/st=64, player_1/loss=177.485, player_2/loss=439.944, rew=688.67]


Epoch #2763: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2764: 1025it [00:02, 367.24it/s, env_step=2830336, len=32, n/ep=2, n/st=64, player_1/loss=234.990, player_2/loss=302.339, rew=1087.00]


Epoch #2764: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2765: 1025it [00:02, 365.28it/s, env_step=2831360, len=29, n/ep=2, n/st=64, player_1/loss=323.645, player_2/loss=287.841, rew=949.00]


Epoch #2765: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2766: 1025it [00:02, 365.02it/s, env_step=2832384, len=27, n/ep=2, n/st=64, player_1/loss=195.139, player_2/loss=242.228, rew=788.00]


Epoch #2766: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2767: 1025it [00:02, 363.98it/s, env_step=2833408, len=28, n/ep=3, n/st=64, player_1/loss=319.107, player_2/loss=269.737, rew=936.67]


Epoch #2767: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2768: 1025it [00:02, 366.71it/s, env_step=2834432, len=28, n/ep=2, n/st=64, player_1/loss=383.663, player_2/loss=992.195, rew=826.00]


Epoch #2768: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2769: 1025it [00:02, 366.19it/s, env_step=2835456, len=27, n/ep=2, n/st=64, player_1/loss=453.275, player_2/loss=1137.173, rew=854.00]


Epoch #2769: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2770: 1025it [00:02, 368.29it/s, env_step=2836480, len=29, n/ep=2, n/st=64, player_1/loss=725.705, player_2/loss=425.682, rew=893.00]


Epoch #2770: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2771: 1025it [00:02, 364.89it/s, env_step=2837504, len=26, n/ep=2, n/st=64, player_1/loss=817.148, player_2/loss=695.617, rew=701.00]


Epoch #2771: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2772: 1025it [00:02, 365.67it/s, env_step=2838528, len=25, n/ep=3, n/st=64, player_1/loss=605.763, player_2/loss=299.616, rew=760.00]


Epoch #2772: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2773: 1025it [00:02, 365.80it/s, env_step=2839552, len=27, n/ep=3, n/st=64, player_1/loss=504.907, player_2/loss=104.405, rew=796.00]


Epoch #2773: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2774: 1025it [00:02, 365.54it/s, env_step=2840576, len=14, n/ep=3, n/st=64, player_1/loss=321.507, player_2/loss=202.736, rew=218.00]


Epoch #2774: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2775: 1025it [00:02, 365.67it/s, env_step=2841600, len=26, n/ep=2, n/st=64, player_1/loss=157.284, player_2/loss=248.210, rew=733.00]


Epoch #2775: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2776: 1025it [00:02, 364.89it/s, env_step=2842624, len=19, n/ep=3, n/st=64, player_1/loss=253.214, player_2/loss=394.505, rew=456.00]


Epoch #2776: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2777: 1025it [00:02, 362.69it/s, env_step=2843648, len=26, n/ep=2, n/st=64, player_1/loss=255.263, player_2/loss=563.743, rew=727.00]


Epoch #2777: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2778: 1025it [00:02, 363.59it/s, env_step=2844672, len=32, n/ep=2, n/st=64, player_1/loss=303.617, player_2/loss=574.057, rew=1103.00]


Epoch #2778: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2779: 1025it [00:02, 361.67it/s, env_step=2845696, len=36, n/ep=1, n/st=64, player_1/loss=254.048, player_2/loss=263.447, rew=1330.00]


Epoch #2779: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2780: 1025it [00:02, 367.63it/s, env_step=2846720, len=35, n/ep=2, n/st=64, player_1/loss=280.131, player_2/loss=533.232, rew=1262.00]


Epoch #2780: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2781: 1025it [00:02, 365.54it/s, env_step=2847744, len=38, n/ep=1, n/st=64, player_1/loss=474.487, player_2/loss=755.670, rew=1480.00]


Epoch #2781: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2782: 1025it [00:02, 365.54it/s, env_step=2848768, len=31, n/ep=2, n/st=64, player_1/loss=415.019, player_2/loss=740.531, rew=1147.00]


Epoch #2782: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2783: 1025it [00:02, 367.77it/s, env_step=2849792, len=10, n/ep=7, n/st=64, player_1/loss=352.296, player_2/loss=409.146, rew=149.71]


Epoch #2783: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2784: 1025it [00:02, 365.54it/s, env_step=2850816, len=27, n/ep=2, n/st=64, player_1/loss=504.659, player_2/loss=337.292, rew=938.00]


Epoch #2784: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2785: 1025it [00:02, 366.45it/s, env_step=2851840, len=27, n/ep=2, n/st=64, player_1/loss=285.300, player_2/loss=430.200, rew=794.00]


Epoch #2785: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2786: 1025it [00:02, 368.56it/s, env_step=2852864, len=42, n/ep=1, n/st=64, player_1/loss=232.734, player_2/loss=457.319, rew=1834.00]


Epoch #2786: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2787: 1025it [00:02, 360.78it/s, env_step=2853888, len=32, n/ep=2, n/st=64, player_1/loss=367.853, player_2/loss=359.882, rew=1079.00]


Epoch #2787: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2788: 1025it [00:02, 354.30it/s, env_step=2854912, len=37, n/ep=1, n/st=64, player_1/loss=621.461, player_2/loss=402.326, rew=1404.00]


Epoch #2788: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2789: 1025it [00:02, 361.03it/s, env_step=2855936, len=29, n/ep=2, n/st=64, player_1/loss=482.159, player_2/loss=611.686, rew=877.00]


Epoch #2789: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2790: 1025it [00:02, 370.02it/s, env_step=2856960, len=27, n/ep=2, n/st=64, player_1/loss=844.734, player_2/loss=584.446, rew=784.00]


Epoch #2790: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2791: 1025it [00:02, 363.33it/s, env_step=2857984, len=33, n/ep=2, n/st=64, player_1/loss=536.534, player_2/loss=545.301, rew=1145.00]


Epoch #2791: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2792: 1025it [00:02, 367.77it/s, env_step=2859008, len=26, n/ep=3, n/st=64, player_1/loss=190.514, player_2/loss=608.343, rew=806.00]


Epoch #2792: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2793: 1025it [00:02, 365.80it/s, env_step=2860032, len=26, n/ep=3, n/st=64, player_1/loss=311.220, player_2/loss=486.500, rew=719.33]


Epoch #2793: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2794: 1025it [00:02, 366.98it/s, env_step=2861056, len=31, n/ep=2, n/st=64, player_1/loss=410.896, player_2/loss=675.423, rew=1054.00]


Epoch #2794: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2795: 1025it [00:02, 366.45it/s, env_step=2862080, len=23, n/ep=3, n/st=64, player_1/loss=491.687, player_2/loss=503.212, rew=666.67]


Epoch #2795: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2796: 1025it [00:02, 367.77it/s, env_step=2863104, len=36, n/ep=2, n/st=64, player_1/loss=399.110, player_2/loss=119.233, rew=1381.00]


Epoch #2796: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2797: 1025it [00:02, 364.89it/s, env_step=2864128, len=36, n/ep=1, n/st=64, player_1/loss=167.017, player_2/loss=230.336, rew=1330.00]


Epoch #2797: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2798: 1025it [00:02, 361.54it/s, env_step=2865152, len=30, n/ep=2, n/st=64, player_1/loss=541.768, player_2/loss=213.342, rew=929.00]


Epoch #2798: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2799: 1025it [00:02, 367.11it/s, env_step=2866176, len=33, n/ep=2, n/st=64, player_1/loss=640.632, player_2/loss=165.623, rew=1120.00]


Epoch #2799: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2800: 1025it [00:02, 362.69it/s, env_step=2867200, len=30, n/ep=2, n/st=64, player_1/loss=609.475, player_2/loss=278.148, rew=961.00]


Epoch #2800: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2801: 1025it [00:02, 366.19it/s, env_step=2868224, len=22, n/ep=3, n/st=64, player_1/loss=705.077, rew=624.00]  


Epoch #2801: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2802: 1025it [00:02, 367.24it/s, env_step=2869248, len=23, n/ep=3, n/st=64, player_1/loss=187.296, player_2/loss=591.841, rew=646.00]


Epoch #2802: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2803: 1025it [00:02, 367.63it/s, env_step=2870272, len=32, n/ep=2, n/st=64, player_1/loss=246.298, player_2/loss=1335.868, rew=1129.00]


Epoch #2803: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2804: 1025it [00:02, 366.06it/s, env_step=2871296, len=30, n/ep=2, n/st=64, player_1/loss=329.990, player_2/loss=1308.618, rew=1001.00]


Epoch #2804: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2805: 1025it [00:02, 368.43it/s, env_step=2872320, len=16, n/ep=4, n/st=64, player_1/loss=649.307, player_2/loss=971.848, rew=271.00]


Epoch #2805: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2806: 1025it [00:02, 364.76it/s, env_step=2873344, len=38, n/ep=1, n/st=64, player_1/loss=788.012, player_2/loss=1338.613, rew=1480.00]


Epoch #2806: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2807: 1025it [00:02, 367.77it/s, env_step=2874368, len=16, n/ep=4, n/st=64, player_1/loss=573.515, player_2/loss=1341.587, rew=427.00]


Epoch #2807: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2808: 1025it [00:02, 363.59it/s, env_step=2875392, len=9, n/ep=6, n/st=64, player_1/loss=411.058, player_2/loss=1360.703, rew=95.33]


Epoch #2808: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2809: 1025it [00:02, 367.24it/s, env_step=2876416, len=15, n/ep=4, n/st=64, player_1/loss=194.520, player_2/loss=737.121, rew=240.00]


Epoch #2809: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2810: 1025it [00:02, 363.21it/s, env_step=2877440, len=15, n/ep=4, n/st=64, player_1/loss=371.609, player_2/loss=471.117, rew=265.00]


Epoch #2810: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2811: 1025it [00:02, 366.06it/s, env_step=2878464, len=33, n/ep=2, n/st=64, player_1/loss=653.376, player_2/loss=311.732, rew=1156.00]


Epoch #2811: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2812: 1025it [00:02, 363.59it/s, env_step=2879488, len=20, n/ep=3, n/st=64, player_1/loss=756.324, player_2/loss=413.938, rew=522.00]


Epoch #2812: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2813: 1025it [00:02, 367.37it/s, env_step=2880512, len=30, n/ep=2, n/st=64, player_1/loss=469.387, player_2/loss=554.807, rew=1015.00]


Epoch #2813: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2814: 1025it [00:02, 366.45it/s, env_step=2881536, len=30, n/ep=2, n/st=64, player_1/loss=455.073, player_2/loss=822.640, rew=965.00]


Epoch #2814: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2815: 1025it [00:02, 367.90it/s, env_step=2882560, len=34, n/ep=2, n/st=64, player_1/loss=263.693, player_2/loss=1237.898, rew=1188.00]


Epoch #2815: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2816: 1025it [00:02, 365.93it/s, env_step=2883584, len=16, n/ep=4, n/st=64, player_1/loss=228.870, rew=291.00]  


Epoch #2816: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2817: 1025it [00:02, 368.03it/s, env_step=2884608, len=33, n/ep=2, n/st=64, player_1/loss=212.217, player_2/loss=317.643, rew=1121.00]


Epoch #2817: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2818: 1025it [00:02, 365.41it/s, env_step=2885632, len=37, n/ep=1, n/st=64, player_1/loss=263.010, player_2/loss=501.802, rew=1404.00]


Epoch #2818: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2819: 1025it [00:02, 368.96it/s, env_step=2886656, len=15, n/ep=4, n/st=64, player_1/loss=328.465, player_2/loss=536.481, rew=240.00]


Epoch #2819: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2820: 1025it [00:02, 365.80it/s, env_step=2887680, len=35, n/ep=2, n/st=64, player_1/loss=233.993, player_2/loss=712.379, rew=1300.00]


Epoch #2820: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2821: 1025it [00:02, 366.32it/s, env_step=2888704, len=22, n/ep=3, n/st=64, player_1/loss=271.930, player_2/loss=787.383, rew=554.67]


Epoch #2821: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2822: 1025it [00:02, 365.67it/s, env_step=2889728, len=39, n/ep=1, n/st=64, player_1/loss=209.432, player_2/loss=755.828, rew=1558.00]


Epoch #2822: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2823: 1025it [00:02, 366.85it/s, env_step=2890752, len=15, n/ep=4, n/st=64, player_2/loss=896.594, rew=256.00]  


Epoch #2823: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2824: 1025it [00:02, 364.76it/s, env_step=2891776, len=14, n/ep=4, n/st=64, player_1/loss=418.260, player_2/loss=533.040, rew=223.00]


Epoch #2824: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2825: 1025it [00:02, 367.37it/s, env_step=2892800, len=23, n/ep=3, n/st=64, player_1/loss=595.699, player_2/loss=364.382, rew=586.00]


Epoch #2825: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2826: 1025it [00:02, 362.31it/s, env_step=2893824, len=34, n/ep=2, n/st=64, player_1/loss=692.790, player_2/loss=679.218, rew=1204.00]


Epoch #2826: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2827: 1025it [00:02, 366.71it/s, env_step=2894848, len=36, n/ep=2, n/st=64, player_1/loss=393.329, player_2/loss=808.620, rew=1387.00]


Epoch #2827: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2828: 1025it [00:02, 366.19it/s, env_step=2895872, len=21, n/ep=3, n/st=64, player_1/loss=137.594, player_2/loss=354.241, rew=622.67]


Epoch #2828: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2829: 1025it [00:02, 369.49it/s, env_step=2896896, len=38, n/ep=2, n/st=64, player_1/loss=435.757, player_2/loss=165.008, rew=1519.00]


Epoch #2829: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2830: 1025it [00:02, 361.29it/s, env_step=2897920, len=34, n/ep=2, n/st=64, player_1/loss=226.875, player_2/loss=458.550, rew=1229.00]


Epoch #2830: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2831: 1025it [00:02, 368.82it/s, env_step=2898944, len=18, n/ep=2, n/st=64, player_1/loss=89.111, player_2/loss=476.023, rew=371.00]


Epoch #2831: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2832: 1025it [00:02, 364.63it/s, env_step=2899968, len=17, n/ep=3, n/st=64, player_1/loss=153.996, player_2/loss=98.510, rew=326.67]


Epoch #2832: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2833: 1025it [00:02, 367.76it/s, env_step=2900992, len=20, n/ep=3, n/st=64, player_1/loss=360.361, player_2/loss=584.218, rew=432.67]


Epoch #2833: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2834: 1025it [00:02, 363.21it/s, env_step=2902016, len=26, n/ep=3, n/st=64, player_1/loss=330.609, player_2/loss=707.175, rew=738.67]


Epoch #2834: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2835: 1025it [00:02, 366.45it/s, env_step=2903040, len=32, n/ep=2, n/st=64, player_1/loss=121.742, player_2/loss=389.471, rew=1054.00]


Epoch #2835: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2836: 1025it [00:02, 364.76it/s, env_step=2904064, len=39, n/ep=1, n/st=64, player_1/loss=248.715, player_2/loss=180.145, rew=1558.00]


Epoch #2836: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2837: 1025it [00:02, 364.89it/s, env_step=2905088, len=35, n/ep=1, n/st=64, player_1/loss=302.243, player_2/loss=218.381, rew=1258.00]


Epoch #2837: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2838: 1025it [00:02, 364.89it/s, env_step=2906112, len=22, n/ep=3, n/st=64, player_1/loss=205.586, player_2/loss=569.803, rew=506.00]


Epoch #2838: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2839: 1025it [00:02, 365.80it/s, env_step=2907136, len=32, n/ep=2, n/st=64, player_1/loss=262.322, player_2/loss=626.056, rew=1099.00]


Epoch #2839: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2840: 1025it [00:02, 364.11it/s, env_step=2908160, len=15, n/ep=4, n/st=64, player_1/loss=298.416, player_2/loss=538.112, rew=256.00]


Epoch #2840: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2841: 1025it [00:02, 365.15it/s, env_step=2909184, len=20, n/ep=4, n/st=64, player_1/loss=230.254, player_2/loss=308.760, rew=439.00]


Epoch #2841: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2842: 1025it [00:02, 364.89it/s, env_step=2910208, len=28, n/ep=2, n/st=64, player_1/loss=237.576, player_2/loss=307.609, rew=814.00]


Epoch #2842: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2843: 1025it [00:02, 369.36it/s, env_step=2911232, len=22, n/ep=3, n/st=64, player_1/loss=205.294, player_2/loss=271.736, rew=506.00]


Epoch #2843: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2844: 1025it [00:02, 362.56it/s, env_step=2912256, len=28, n/ep=2, n/st=64, player_1/loss=339.369, rew=929.00]  


Epoch #2844: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2845: 1025it [00:02, 367.24it/s, env_step=2913280, len=18, n/ep=4, n/st=64, player_1/loss=317.003, player_2/loss=225.484, rew=385.50]


Epoch #2845: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2846: 1025it [00:02, 367.50it/s, env_step=2914304, len=12, n/ep=6, n/st=64, player_1/loss=214.246, player_2/loss=583.344, rew=173.33]


Epoch #2846: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2847: 1025it [00:02, 362.31it/s, env_step=2915328, len=15, n/ep=4, n/st=64, player_2/loss=554.752, rew=263.50]  


Epoch #2847: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2848: 1025it [00:02, 366.45it/s, env_step=2916352, len=24, n/ep=2, n/st=64, player_1/loss=304.903, player_2/loss=493.208, rew=713.00]


Epoch #2848: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2849: 1025it [00:02, 364.89it/s, env_step=2917376, len=23, n/ep=3, n/st=64, player_1/loss=243.237, player_2/loss=617.818, rew=596.67]


Epoch #2849: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2850: 1025it [00:02, 363.46it/s, env_step=2918400, len=32, n/ep=2, n/st=64, player_1/loss=123.329, player_2/loss=205.846, rew=1103.00]


Epoch #2850: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2851: 1025it [00:02, 367.50it/s, env_step=2919424, len=25, n/ep=3, n/st=64, player_1/loss=96.755, player_2/loss=76.896, rew=668.00]


Epoch #2851: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2852: 1025it [00:02, 363.72it/s, env_step=2920448, len=23, n/ep=2, n/st=64, player_1/loss=153.897, player_2/loss=175.557, rew=646.00]


Epoch #2852: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2853: 1025it [00:02, 366.98it/s, env_step=2921472, len=37, n/ep=2, n/st=64, player_1/loss=174.344, player_2/loss=198.631, rew=1408.00]


Epoch #2853: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2854: 1025it [00:02, 363.72it/s, env_step=2922496, len=38, n/ep=2, n/st=64, player_1/loss=110.047, player_2/loss=451.227, rew=1481.00]


Epoch #2854: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2855: 1025it [00:02, 366.98it/s, env_step=2923520, len=39, n/ep=1, n/st=64, player_1/loss=318.522, player_2/loss=851.353, rew=1558.00]


Epoch #2855: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2856: 1025it [00:02, 364.37it/s, env_step=2924544, len=25, n/ep=3, n/st=64, player_1/loss=523.153, player_2/loss=701.541, rew=652.67]


Epoch #2856: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2857: 1025it [00:02, 366.98it/s, env_step=2925568, len=26, n/ep=3, n/st=64, player_1/loss=858.793, player_2/loss=314.359, rew=910.67]


Epoch #2857: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2858: 1025it [00:02, 362.82it/s, env_step=2926592, len=15, n/ep=5, n/st=64, player_1/loss=710.622, player_2/loss=459.226, rew=340.80]


Epoch #2858: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2859: 1025it [00:02, 365.67it/s, env_step=2927616, len=21, n/ep=3, n/st=64, player_1/loss=366.920, player_2/loss=539.172, rew=534.67]


Epoch #2859: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2860: 1025it [00:02, 365.41it/s, env_step=2928640, len=12, n/ep=8, n/st=64, player_1/loss=252.913, player_2/loss=422.881, rew=250.50]


Epoch #2860: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2861: 1025it [00:02, 368.82it/s, env_step=2929664, len=32, n/ep=1, n/st=64, player_1/loss=198.248, player_2/loss=245.810, rew=1054.00]


Epoch #2861: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2862: 1025it [00:02, 363.46it/s, env_step=2930688, len=12, n/ep=7, n/st=64, player_1/loss=438.399, player_2/loss=383.439, rew=284.86]


Epoch #2862: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2863: 1025it [00:02, 368.29it/s, env_step=2931712, len=23, n/ep=3, n/st=64, player_1/loss=399.859, player_2/loss=914.795, rew=674.00]


Epoch #2863: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2864: 1025it [00:02, 364.63it/s, env_step=2932736, len=21, n/ep=3, n/st=64, player_1/loss=362.918, player_2/loss=647.969, rew=532.67]


Epoch #2864: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2865: 1025it [00:02, 367.37it/s, env_step=2933760, len=27, n/ep=2, n/st=64, player_1/loss=254.192, player_2/loss=329.739, rew=782.00]


Epoch #2865: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2866: 1025it [00:02, 363.21it/s, env_step=2934784, len=15, n/ep=3, n/st=64, player_1/loss=235.676, player_2/loss=426.715, rew=302.67]


Epoch #2866: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2867: 1025it [00:02, 366.06it/s, env_step=2935808, len=13, n/ep=5, n/st=64, player_1/loss=401.807, player_2/loss=414.126, rew=196.40]


Epoch #2867: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2868: 1025it [00:02, 368.82it/s, env_step=2936832, len=10, n/ep=8, n/st=64, player_1/loss=438.942, player_2/loss=198.900, rew=170.75]


Epoch #2868: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2869: 1025it [00:02, 367.37it/s, env_step=2937856, len=32, n/ep=2, n/st=64, player_1/loss=388.940, player_2/loss=253.608, rew=1103.00]


Epoch #2869: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2870: 1025it [00:02, 366.19it/s, env_step=2938880, len=8, n/ep=8, n/st=64, player_1/loss=490.724, player_2/loss=330.886, rew=77.00]


Epoch #2870: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2871: 1025it [00:02, 365.02it/s, env_step=2939904, len=14, n/ep=5, n/st=64, player_1/loss=514.484, player_2/loss=283.508, rew=226.00]


Epoch #2871: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2872: 1025it [00:02, 365.15it/s, env_step=2940928, len=15, n/ep=3, n/st=64, player_1/loss=177.830, player_2/loss=350.220, rew=249.33]


Epoch #2872: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2873: 1025it [00:02, 364.50it/s, env_step=2941952, len=19, n/ep=3, n/st=64, player_1/loss=238.079, player_2/loss=357.290, rew=411.33]


Epoch #2873: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2874: 1025it [00:02, 364.89it/s, env_step=2942976, len=27, n/ep=2, n/st=64, player_1/loss=234.908, player_2/loss=148.595, rew=779.00]


Epoch #2874: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2875: 1025it [00:02, 365.15it/s, env_step=2944000, len=27, n/ep=2, n/st=64, player_1/loss=370.698, player_2/loss=203.828, rew=763.00]


Epoch #2875: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2876: 1025it [00:02, 367.11it/s, env_step=2945024, len=17, n/ep=3, n/st=64, player_1/loss=466.184, player_2/loss=381.310, rew=354.67]


Epoch #2876: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2877: 1025it [00:02, 363.08it/s, env_step=2946048, len=31, n/ep=2, n/st=64, player_1/loss=413.731, player_2/loss=428.065, rew=1028.00]


Epoch #2877: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2878: 1025it [00:02, 367.24it/s, env_step=2947072, len=20, n/ep=4, n/st=64, player_1/loss=236.826, player_2/loss=415.024, rew=420.50]


Epoch #2878: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2879: 1025it [00:02, 364.50it/s, env_step=2948096, len=21, n/ep=3, n/st=64, player_1/loss=417.176, player_2/loss=257.767, rew=460.67]


Epoch #2879: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2880: 1025it [00:02, 366.06it/s, env_step=2949120, len=29, n/ep=2, n/st=64, player_1/loss=763.647, player_2/loss=310.714, rew=970.00]


Epoch #2880: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2881: 1025it [00:02, 365.80it/s, env_step=2950144, len=37, n/ep=2, n/st=64, player_1/loss=847.172, player_2/loss=456.884, rew=1442.00]


Epoch #2881: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2882: 1025it [00:02, 356.39it/s, env_step=2951168, len=39, n/ep=1, n/st=64, player_1/loss=526.419, player_2/loss=608.210, rew=1558.00]


Epoch #2882: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2883: 1025it [00:02, 362.18it/s, env_step=2952192, len=19, n/ep=3, n/st=64, player_1/loss=340.838, player_2/loss=494.325, rew=396.00]


Epoch #2883: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2884: 1025it [00:02, 364.37it/s, env_step=2953216, len=33, n/ep=3, n/st=64, player_1/loss=272.641, player_2/loss=250.941, rew=1218.67]


Epoch #2884: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2885: 1025it [00:02, 366.45it/s, env_step=2954240, len=12, n/ep=5, n/st=64, player_1/loss=178.228, player_2/loss=569.376, rew=189.60]


Epoch #2885: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2886: 1025it [00:02, 361.67it/s, env_step=2955264, len=25, n/ep=3, n/st=64, player_1/loss=174.887, player_2/loss=759.324, rew=781.33]


Epoch #2886: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2887: 1025it [00:02, 367.77it/s, env_step=2956288, len=28, n/ep=3, n/st=64, player_1/loss=261.028, player_2/loss=678.817, rew=920.00]


Epoch #2887: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2888: 1025it [00:02, 363.08it/s, env_step=2957312, len=40, n/ep=1, n/st=64, player_1/loss=234.792, player_2/loss=324.849, rew=1638.00]


Epoch #2888: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2889: 1025it [00:02, 369.89it/s, env_step=2958336, len=32, n/ep=2, n/st=64, player_1/loss=222.270, player_2/loss=302.113, rew=1058.00]


Epoch #2889: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2890: 1025it [00:02, 364.37it/s, env_step=2959360, len=18, n/ep=3, n/st=64, player_1/loss=192.875, player_2/loss=318.829, rew=364.67]


Epoch #2890: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2891: 1025it [00:02, 368.16it/s, env_step=2960384, len=39, n/ep=1, n/st=64, player_1/loss=416.177, player_2/loss=464.200, rew=1558.00]


Epoch #2891: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2892: 1025it [00:02, 363.21it/s, env_step=2961408, len=33, n/ep=1, n/st=64, player_1/loss=608.278, player_2/loss=461.756, rew=1120.00]


Epoch #2892: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2893: 1025it [00:02, 363.72it/s, env_step=2962432, len=21, n/ep=3, n/st=64, player_1/loss=445.059, player_2/loss=705.860, rew=464.67]


Epoch #2893: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2894: 1025it [00:02, 365.02it/s, env_step=2963456, len=36, n/ep=2, n/st=64, player_1/loss=343.312, player_2/loss=1288.533, rew=1334.00]


Epoch #2894: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2895: 1025it [00:02, 367.63it/s, env_step=2964480, len=30, n/ep=2, n/st=64, player_1/loss=301.420, player_2/loss=840.591, rew=929.00]


Epoch #2895: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2896: 1025it [00:02, 363.59it/s, env_step=2965504, len=29, n/ep=2, n/st=64, player_1/loss=93.722, player_2/loss=549.186, rew=872.00]


Epoch #2896: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2897: 1025it [00:02, 365.41it/s, env_step=2966528, len=14, n/ep=5, n/st=64, player_1/loss=286.198, player_2/loss=387.209, rew=384.40]


Epoch #2897: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2898: 1025it [00:02, 362.69it/s, env_step=2967552, len=14, n/ep=4, n/st=64, player_1/loss=455.504, player_2/loss=468.488, rew=225.50]


Epoch #2898: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2899: 1025it [00:02, 368.03it/s, env_step=2968576, len=14, n/ep=4, n/st=64, player_1/loss=272.689, player_2/loss=295.734, rew=223.00]


Epoch #2899: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2900: 1025it [00:02, 363.59it/s, env_step=2969600, len=16, n/ep=4, n/st=64, player_1/loss=399.494, player_2/loss=68.190, rew=296.50]


Epoch #2900: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2901: 1025it [00:02, 364.11it/s, env_step=2970624, len=21, n/ep=2, n/st=64, player_1/loss=188.800, player_2/loss=164.883, rew=482.00]


Epoch #2901: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2902: 1025it [00:02, 366.19it/s, env_step=2971648, len=15, n/ep=4, n/st=64, player_1/loss=248.651, player_2/loss=369.227, rew=326.00]


Epoch #2902: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2903: 1025it [00:02, 366.45it/s, env_step=2972672, len=28, n/ep=2, n/st=64, player_1/loss=167.372, player_2/loss=307.440, rew=895.00]


Epoch #2903: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2904: 1025it [00:02, 364.89it/s, env_step=2973696, len=25, n/ep=3, n/st=64, player_1/loss=344.263, player_2/loss=413.345, rew=694.00]


Epoch #2904: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2905: 1025it [00:02, 365.02it/s, env_step=2974720, len=25, n/ep=3, n/st=64, player_1/loss=415.400, player_2/loss=349.669, rew=684.00]


Epoch #2905: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2906: 1025it [00:02, 365.80it/s, env_step=2975744, len=42, n/ep=1, n/st=64, player_1/loss=325.666, player_2/loss=414.670, rew=1834.00]


Epoch #2906: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2907: 1025it [00:02, 363.72it/s, env_step=2976768, len=32, n/ep=2, n/st=64, player_1/loss=87.047, player_2/loss=536.152, rew=1143.00]


Epoch #2907: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2908: 1025it [00:02, 365.28it/s, env_step=2977792, len=38, n/ep=2, n/st=64, player_1/loss=190.090, player_2/loss=429.043, rew=1480.00]


Epoch #2908: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2909: 1025it [00:02, 363.08it/s, env_step=2978816, len=30, n/ep=2, n/st=64, player_1/loss=181.598, player_2/loss=220.786, rew=971.00]


Epoch #2909: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2910: 1025it [00:02, 367.11it/s, env_step=2979840, len=21, n/ep=3, n/st=64, player_1/loss=676.002, player_2/loss=615.965, rew=644.00]


Epoch #2910: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2911: 1025it [00:02, 361.80it/s, env_step=2980864, len=30, n/ep=2, n/st=64, player_1/loss=872.116, player_2/loss=1014.587, rew=961.00]


Epoch #2911: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2912: 1025it [00:02, 365.93it/s, env_step=2981888, len=31, n/ep=2, n/st=64, player_1/loss=639.406, player_2/loss=504.480, rew=1078.00]


Epoch #2912: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2913: 1025it [00:02, 365.02it/s, env_step=2982912, len=32, n/ep=2, n/st=64, player_1/loss=578.045, player_2/loss=302.955, rew=1107.00]


Epoch #2913: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2914: 1025it [00:02, 367.63it/s, env_step=2983936, len=38, n/ep=1, n/st=64, player_1/loss=227.310, player_2/loss=369.382, rew=1480.00]


Epoch #2914: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2915: 1025it [00:02, 364.37it/s, env_step=2984960, len=39, n/ep=2, n/st=64, player_1/loss=113.872, player_2/loss=275.081, rew=1582.00]


Epoch #2915: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2916: 1025it [00:02, 369.76it/s, env_step=2985984, len=29, n/ep=2, n/st=64, player_1/loss=189.737, player_2/loss=466.743, rew=900.00]


Epoch #2916: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2917: 1025it [00:02, 364.50it/s, env_step=2987008, len=30, n/ep=3, n/st=64, player_1/loss=392.086, player_2/loss=506.600, rew=983.33]


Epoch #2917: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2918: 1025it [00:02, 368.69it/s, env_step=2988032, len=26, n/ep=2, n/st=64, player_1/loss=329.432, player_2/loss=568.413, rew=821.00]


Epoch #2918: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2919: 1025it [00:02, 359.77it/s, env_step=2989056, len=22, n/ep=3, n/st=64, player_1/loss=160.252, player_2/loss=611.780, rew=506.00]


Epoch #2919: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2920: 1025it [00:02, 365.15it/s, env_step=2990080, len=20, n/ep=3, n/st=64, player_1/loss=372.537, player_2/loss=653.672, rew=451.33]


Epoch #2920: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2921: 1025it [00:02, 365.15it/s, env_step=2991104, len=39, n/ep=1, n/st=64, player_1/loss=416.381, player_2/loss=890.338, rew=1558.00]


Epoch #2921: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2922: 1025it [00:02, 363.85it/s, env_step=2992128, len=32, n/ep=2, n/st=64, player_1/loss=462.162, player_2/loss=651.925, rew=1103.00]


Epoch #2922: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2923: 1025it [00:02, 363.72it/s, env_step=2993152, len=35, n/ep=2, n/st=64, player_1/loss=624.817, player_2/loss=1025.292, rew=1306.00]


Epoch #2923: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2924: 1025it [00:02, 366.06it/s, env_step=2994176, len=36, n/ep=2, n/st=64, player_1/loss=642.030, player_2/loss=496.372, rew=1346.00]


Epoch #2924: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2925: 1025it [00:02, 365.15it/s, env_step=2995200, len=29, n/ep=2, n/st=64, player_1/loss=310.598, player_2/loss=169.336, rew=949.00]


Epoch #2925: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2926: 1025it [00:02, 365.28it/s, env_step=2996224, len=24, n/ep=2, n/st=64, player_1/loss=52.123, player_2/loss=910.014, rew=614.00]


Epoch #2926: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2927: 1025it [00:02, 362.69it/s, env_step=2997248, len=34, n/ep=2, n/st=64, player_1/loss=371.620, player_2/loss=870.619, rew=1224.00]


Epoch #2927: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2928: 1025it [00:02, 368.56it/s, env_step=2998272, len=28, n/ep=3, n/st=64, player_1/loss=470.094, player_2/loss=136.532, rew=812.00]


Epoch #2928: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2929: 1025it [00:02, 366.58it/s, env_step=2999296, len=39, n/ep=1, n/st=64, player_1/loss=224.964, player_2/loss=272.815, rew=1558.00]


Epoch #2929: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2930: 1025it [00:02, 368.43it/s, env_step=3000320, len=30, n/ep=2, n/st=64, player_1/loss=214.431, player_2/loss=596.965, rew=961.00]


Epoch #2930: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2931: 1025it [00:02, 363.21it/s, env_step=3001344, len=37, n/ep=2, n/st=64, player_1/loss=127.395, player_2/loss=668.453, rew=1442.00]


Epoch #2931: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2932: 1025it [00:02, 365.67it/s, env_step=3002368, len=29, n/ep=3, n/st=64, player_1/loss=195.201, player_2/loss=576.175, rew=1067.33]


Epoch #2932: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2933: 1025it [00:02, 363.72it/s, env_step=3003392, len=37, n/ep=2, n/st=64, player_1/loss=427.513, player_2/loss=525.894, rew=1442.00]


Epoch #2933: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2934: 1025it [00:02, 367.90it/s, env_step=3004416, len=37, n/ep=2, n/st=64, player_1/loss=608.532, player_2/loss=261.149, rew=1477.00]


Epoch #2934: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2935: 1025it [00:02, 363.46it/s, env_step=3005440, len=14, n/ep=4, n/st=64, player_1/loss=432.387, player_2/loss=77.276, rew=223.00]


Epoch #2935: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2936: 1025it [00:02, 367.90it/s, env_step=3006464, len=18, n/ep=4, n/st=64, player_1/loss=646.619, rew=367.00]  


Epoch #2936: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2937: 1025it [00:02, 363.08it/s, env_step=3007488, len=15, n/ep=4, n/st=64, player_1/loss=624.019, player_2/loss=918.100, rew=263.50]


Epoch #2937: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2938: 1025it [00:02, 365.02it/s, env_step=3008512, len=21, n/ep=3, n/st=64, player_1/loss=555.654, player_2/loss=1045.362, rew=506.00]


Epoch #2938: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2939: 1025it [00:02, 363.34it/s, env_step=3009536, len=21, n/ep=3, n/st=64, player_1/loss=539.280, player_2/loss=959.747, rew=476.00]


Epoch #2939: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2940: 1025it [00:02, 367.24it/s, env_step=3010560, len=19, n/ep=3, n/st=64, player_1/loss=352.613, player_2/loss=910.314, rew=427.33]


Epoch #2940: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2941: 1025it [00:02, 365.93it/s, env_step=3011584, len=23, n/ep=2, n/st=64, player_1/loss=460.469, player_2/loss=282.313, rew=576.00]


Epoch #2941: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2942: 1025it [00:02, 366.58it/s, env_step=3012608, len=31, n/ep=3, n/st=64, player_1/loss=492.068, player_2/loss=528.890, rew=1024.67]


Epoch #2942: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2943: 1025it [00:02, 364.24it/s, env_step=3013632, len=26, n/ep=2, n/st=64, player_1/loss=451.685, player_2/loss=1021.868, rew=727.00]


Epoch #2943: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2944: 1025it [00:02, 364.24it/s, env_step=3014656, len=26, n/ep=2, n/st=64, player_1/loss=529.358, player_2/loss=941.212, rew=727.00]


Epoch #2944: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2945: 1025it [00:02, 367.37it/s, env_step=3015680, len=15, n/ep=4, n/st=64, player_1/loss=485.645, player_2/loss=1168.665, rew=240.50]


Epoch #2945: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2946: 1025it [00:02, 361.92it/s, env_step=3016704, len=13, n/ep=5, n/st=64, player_1/loss=339.491, player_2/loss=1374.052, rew=199.20]


Epoch #2946: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2947: 1025it [00:02, 369.89it/s, env_step=3017728, len=15, n/ep=5, n/st=64, player_2/loss=1387.500, rew=242.40] 


Epoch #2947: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2948: 1025it [00:02, 366.06it/s, env_step=3018752, len=23, n/ep=3, n/st=64, player_1/loss=156.330, player_2/loss=1224.666, rew=575.33]


Epoch #2948: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2949: 1025it [00:02, 370.02it/s, env_step=3019776, len=24, n/ep=3, n/st=64, player_1/loss=233.139, player_2/loss=1344.573, rew=620.67]


Epoch #2949: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2950: 1025it [00:02, 368.56it/s, env_step=3020800, len=27, n/ep=2, n/st=64, player_1/loss=355.735, player_2/loss=1090.590, rew=892.00]


Epoch #2950: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2951: 1025it [00:02, 362.95it/s, env_step=3021824, len=16, n/ep=4, n/st=64, player_1/loss=291.164, player_2/loss=289.868, rew=271.50]


Epoch #2951: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2952: 1025it [00:02, 366.98it/s, env_step=3022848, len=29, n/ep=2, n/st=64, player_1/loss=207.980, player_2/loss=342.524, rew=898.00]


Epoch #2952: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2953: 1025it [00:02, 363.72it/s, env_step=3023872, len=33, n/ep=2, n/st=64, player_1/loss=379.428, player_2/loss=371.937, rew=1145.00]


Epoch #2953: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2954: 1025it [00:02, 369.09it/s, env_step=3024896, len=27, n/ep=2, n/st=64, player_1/loss=372.152, player_2/loss=388.475, rew=854.00]


Epoch #2954: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2955: 1025it [00:02, 364.24it/s, env_step=3025920, len=27, n/ep=2, n/st=64, player_1/loss=253.282, player_2/loss=718.888, rew=754.00]


Epoch #2955: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2956: 1025it [00:02, 368.16it/s, env_step=3026944, len=37, n/ep=1, n/st=64, player_1/loss=322.745, player_2/loss=587.996, rew=1404.00]


Epoch #2956: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2957: 1025it [00:02, 363.34it/s, env_step=3027968, len=34, n/ep=2, n/st=64, player_1/loss=303.231, player_2/loss=242.716, rew=1267.00]


Epoch #2957: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2958: 1025it [00:02, 365.80it/s, env_step=3028992, len=36, n/ep=2, n/st=64, player_1/loss=89.390, player_2/loss=251.139, rew=1334.00]


Epoch #2958: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2959: 1025it [00:02, 363.98it/s, env_step=3030016, len=34, n/ep=2, n/st=64, player_1/loss=428.090, player_2/loss=129.823, rew=1243.00]


Epoch #2959: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2960: 1025it [00:02, 367.90it/s, env_step=3031040, len=7, n/ep=8, n/st=64, player_1/loss=458.331, player_2/loss=313.845, rew=68.50]


Epoch #2960: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2961: 1025it [00:02, 361.29it/s, env_step=3032064, len=15, n/ep=3, n/st=64, player_1/loss=104.605, player_2/loss=804.952, rew=238.67]


Epoch #2961: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2962: 1025it [00:02, 366.06it/s, env_step=3033088, len=21, n/ep=4, n/st=64, player_1/loss=189.352, player_2/loss=1498.014, rew=592.50]


Epoch #2962: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2963: 1025it [00:02, 362.69it/s, env_step=3034112, len=39, n/ep=2, n/st=64, player_1/loss=178.033, player_2/loss=1008.210, rew=1558.00]


Epoch #2963: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2964: 1025it [00:02, 365.67it/s, env_step=3035136, len=40, n/ep=2, n/st=64, player_1/loss=374.623, player_2/loss=485.519, rew=1696.00]


Epoch #2964: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2965: 1025it [00:02, 366.45it/s, env_step=3036160, len=25, n/ep=2, n/st=64, player_1/loss=444.827, player_2/loss=436.061, rew=730.00]


Epoch #2965: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2966: 1025it [00:02, 369.49it/s, env_step=3037184, len=23, n/ep=2, n/st=64, player_1/loss=411.745, player_2/loss=101.282, rew=646.00]


Epoch #2966: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2967: 1025it [00:02, 367.63it/s, env_step=3038208, len=23, n/ep=3, n/st=64, player_1/loss=290.784, player_2/loss=211.491, rew=562.67]


Epoch #2967: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2968: 1025it [00:02, 364.11it/s, env_step=3039232, len=21, n/ep=3, n/st=64, player_1/loss=91.605, player_2/loss=380.536, rew=656.00]


Epoch #2968: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2969: 1025it [00:02, 365.80it/s, env_step=3040256, len=26, n/ep=2, n/st=64, player_1/loss=162.212, player_2/loss=393.170, rew=701.00]


Epoch #2969: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2970: 1025it [00:02, 364.11it/s, env_step=3041280, len=28, n/ep=2, n/st=64, player_1/loss=555.591, player_2/loss=190.457, rew=929.00]


Epoch #2970: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2971: 1025it [00:02, 365.93it/s, env_step=3042304, len=15, n/ep=4, n/st=64, player_1/loss=571.102, player_2/loss=538.953, rew=239.00]


Epoch #2971: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2972: 1025it [00:02, 364.24it/s, env_step=3043328, len=26, n/ep=2, n/st=64, player_1/loss=677.184, player_2/loss=995.989, rew=709.00]


Epoch #2972: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2973: 1025it [00:02, 365.93it/s, env_step=3044352, len=31, n/ep=2, n/st=64, player_1/loss=602.262, player_2/loss=1069.302, rew=1078.00]


Epoch #2973: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2974: 1025it [00:02, 364.24it/s, env_step=3045376, len=31, n/ep=2, n/st=64, player_1/loss=422.282, player_2/loss=593.512, rew=1034.00]


Epoch #2974: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2975: 1025it [00:02, 368.69it/s, env_step=3046400, len=13, n/ep=5, n/st=64, player_1/loss=256.163, player_2/loss=790.851, rew=240.40]


Epoch #2975: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2976: 1025it [00:02, 364.76it/s, env_step=3047424, len=16, n/ep=4, n/st=64, player_1/loss=280.461, player_2/loss=1737.319, rew=279.50]


Epoch #2976: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2977: 1025it [00:02, 366.71it/s, env_step=3048448, len=28, n/ep=2, n/st=64, player_1/loss=152.920, player_2/loss=1714.741, rew=931.00]


Epoch #2977: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2978: 1025it [00:02, 364.63it/s, env_step=3049472, len=17, n/ep=3, n/st=64, player_1/loss=165.814, player_2/loss=857.568, rew=377.33]


Epoch #2978: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2979: 1025it [00:02, 366.06it/s, env_step=3050496, len=13, n/ep=5, n/st=64, player_1/loss=250.807, player_2/loss=588.405, rew=253.60]


Epoch #2979: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2980: 1025it [00:02, 364.89it/s, env_step=3051520, len=26, n/ep=2, n/st=64, player_1/loss=317.309, player_2/loss=377.739, rew=701.00]


Epoch #2980: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2981: 1025it [00:02, 368.56it/s, env_step=3052544, len=25, n/ep=3, n/st=64, player_1/loss=282.827, player_2/loss=599.089, rew=774.00]


Epoch #2981: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2982: 1025it [00:02, 361.41it/s, env_step=3053568, len=9, n/ep=7, n/st=64, player_1/loss=592.133, player_2/loss=685.924, rew=106.86]


Epoch #2982: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2983: 1025it [00:02, 367.90it/s, env_step=3054592, len=28, n/ep=2, n/st=64, player_1/loss=566.174, player_2/loss=887.730, rew=819.00]


Epoch #2983: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2984: 1025it [00:02, 366.32it/s, env_step=3055616, len=20, n/ep=2, n/st=64, player_1/loss=224.582, player_2/loss=845.967, rew=499.00]


Epoch #2984: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2985: 1025it [00:02, 364.63it/s, env_step=3056640, len=23, n/ep=3, n/st=64, player_1/loss=270.616, player_2/loss=519.811, rew=594.00]


Epoch #2985: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2986: 1025it [00:02, 366.19it/s, env_step=3057664, len=19, n/ep=3, n/st=64, player_1/loss=369.740, player_2/loss=542.519, rew=412.67]


Epoch #2986: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2987: 1025it [00:02, 356.51it/s, env_step=3058688, len=14, n/ep=4, n/st=64, player_1/loss=212.176, player_2/loss=1082.811, rew=229.50]


Epoch #2987: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2988: 1025it [00:02, 367.63it/s, env_step=3059712, len=21, n/ep=3, n/st=64, player_1/loss=452.772, player_2/loss=1023.851, rew=494.67]


Epoch #2988: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2989: 1025it [00:02, 362.18it/s, env_step=3060736, len=24, n/ep=3, n/st=64, player_1/loss=410.463, player_2/loss=760.299, rew=671.33]


Epoch #2989: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2990: 1025it [00:02, 365.15it/s, env_step=3061760, len=32, n/ep=2, n/st=64, player_1/loss=408.785, player_2/loss=785.105, rew=1055.00]


Epoch #2990: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2991: 1025it [00:02, 363.34it/s, env_step=3062784, len=31, n/ep=2, n/st=64, player_1/loss=229.269, player_2/loss=520.372, rew=999.00]


Epoch #2991: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2992: 1025it [00:02, 363.85it/s, env_step=3063808, len=16, n/ep=4, n/st=64, player_1/loss=403.537, player_2/loss=247.747, rew=302.00]


Epoch #2992: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2993: 1025it [00:02, 366.45it/s, env_step=3064832, len=42, n/ep=1, n/st=64, player_1/loss=437.756, player_2/loss=434.303, rew=1804.00]


Epoch #2993: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2994: 1025it [00:02, 362.18it/s, env_step=3065856, len=33, n/ep=2, n/st=64, player_1/loss=222.210, player_2/loss=317.957, rew=1121.00]


Epoch #2994: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2995: 1025it [00:02, 367.50it/s, env_step=3066880, len=17, n/ep=4, n/st=64, player_1/loss=258.882, player_2/loss=86.426, rew=352.00]


Epoch #2995: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2996: 1025it [00:02, 364.89it/s, env_step=3067904, len=27, n/ep=2, n/st=64, player_1/loss=371.381, player_2/loss=202.257, rew=803.00]


Epoch #2996: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2997: 1025it [00:02, 366.98it/s, env_step=3068928, len=21, n/ep=3, n/st=64, player_1/loss=352.385, player_2/loss=446.798, rew=485.33]


Epoch #2997: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2998: 1025it [00:02, 362.95it/s, env_step=3069952, len=38, n/ep=1, n/st=64, player_1/loss=447.281, player_2/loss=564.915, rew=1480.00]


Epoch #2998: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #2999: 1025it [00:02, 364.89it/s, env_step=3070976, len=15, n/ep=4, n/st=64, player_1/loss=357.707, player_2/loss=662.368, rew=249.00]


Epoch #2999: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3000: 1025it [00:02, 364.76it/s, env_step=3072000, len=15, n/ep=5, n/st=64, player_1/loss=301.523, player_2/loss=694.282, rew=251.60]


Epoch #3000: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3001: 1025it [00:02, 364.89it/s, env_step=3073024, len=16, n/ep=4, n/st=64, player_1/loss=450.521, player_2/loss=887.699, rew=337.50]


Epoch #3001: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3002: 1025it [00:02, 363.98it/s, env_step=3074048, len=17, n/ep=3, n/st=64, player_1/loss=386.360, player_2/loss=484.238, rew=312.00]


Epoch #3002: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3003: 1025it [00:02, 366.19it/s, env_step=3075072, len=12, n/ep=5, n/st=64, player_1/loss=255.865, player_2/loss=101.454, rew=209.60]


Epoch #3003: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3004: 1025it [00:02, 367.63it/s, env_step=3076096, len=20, n/ep=3, n/st=64, player_1/loss=355.100, player_2/loss=597.151, rew=428.67]


Epoch #3004: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3005: 1025it [00:02, 368.29it/s, env_step=3077120, len=26, n/ep=2, n/st=64, player_1/loss=264.718, player_2/loss=977.308, rew=700.00]


Epoch #3005: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3006: 1025it [00:02, 367.76it/s, env_step=3078144, len=19, n/ep=4, n/st=64, player_1/loss=180.314, player_2/loss=462.684, rew=411.00]


Epoch #3006: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3007: 1025it [00:02, 361.54it/s, env_step=3079168, len=34, n/ep=2, n/st=64, player_1/loss=167.209, player_2/loss=57.810, rew=1189.00]


Epoch #3007: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3008: 1025it [00:02, 362.31it/s, env_step=3080192, len=14, n/ep=4, n/st=64, player_1/loss=249.003, player_2/loss=141.538, rew=217.00]


Epoch #3008: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3009: 1025it [00:02, 366.32it/s, env_step=3081216, len=20, n/ep=2, n/st=64, player_1/loss=247.358, player_2/loss=807.939, rew=469.00]


Epoch #3009: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3010: 1025it [00:02, 365.41it/s, env_step=3082240, len=32, n/ep=2, n/st=64, player_1/loss=40.359, player_2/loss=889.865, rew=1089.00]


Epoch #3010: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3011: 1025it [00:02, 362.69it/s, env_step=3083264, len=22, n/ep=2, n/st=64, player_1/loss=334.076, player_2/loss=383.162, rew=533.00]


Epoch #3011: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3012: 1025it [00:02, 365.80it/s, env_step=3084288, len=30, n/ep=2, n/st=64, player_1/loss=626.186, player_2/loss=241.741, rew=953.00]


Epoch #3012: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3013: 1025it [00:02, 363.72it/s, env_step=3085312, len=28, n/ep=2, n/st=64, player_1/loss=367.003, player_2/loss=217.410, rew=845.00]


Epoch #3013: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3014: 1025it [00:02, 365.28it/s, env_step=3086336, len=29, n/ep=2, n/st=64, player_1/loss=384.224, player_2/loss=523.388, rew=904.00]


Epoch #3014: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3015: 1025it [00:02, 366.06it/s, env_step=3087360, len=12, n/ep=3, n/st=64, player_1/loss=636.172, player_2/loss=426.955, rew=158.67]


Epoch #3015: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3016: 1025it [00:02, 366.58it/s, env_step=3088384, len=30, n/ep=2, n/st=64, player_1/loss=487.351, player_2/loss=293.391, rew=971.00]


Epoch #3016: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3017: 1025it [00:02, 365.28it/s, env_step=3089408, len=33, n/ep=2, n/st=64, player_1/loss=115.571, player_2/loss=303.268, rew=1120.00]


Epoch #3017: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3018: 1025it [00:02, 365.15it/s, env_step=3090432, len=10, n/ep=7, n/st=64, player_1/loss=242.594, player_2/loss=336.652, rew=130.29]


Epoch #3018: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3019: 1025it [00:02, 362.82it/s, env_step=3091456, len=7, n/ep=8, n/st=64, player_1/loss=250.401, player_2/loss=390.920, rew=66.50]


Epoch #3019: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3020: 1025it [00:02, 366.58it/s, env_step=3092480, len=26, n/ep=2, n/st=64, player_1/loss=628.171, player_2/loss=278.800, rew=837.00]


Epoch #3020: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3021: 1025it [00:02, 363.34it/s, env_step=3093504, len=31, n/ep=2, n/st=64, player_1/loss=672.654, player_2/loss=169.890, rew=1054.00]


Epoch #3021: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3022: 1025it [00:02, 367.50it/s, env_step=3094528, len=39, n/ep=2, n/st=64, player_2/loss=258.269, rew=1559.00] 


Epoch #3022: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3023: 1025it [00:02, 364.50it/s, env_step=3095552, len=26, n/ep=2, n/st=64, player_1/loss=746.996, player_2/loss=172.072, rew=837.00]


Epoch #3023: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3024: 1025it [00:02, 362.95it/s, env_step=3096576, len=20, n/ep=3, n/st=64, player_1/loss=335.721, player_2/loss=387.567, rew=432.67]


Epoch #3024: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3025: 1025it [00:02, 364.89it/s, env_step=3097600, len=20, n/ep=3, n/st=64, player_1/loss=238.209, rew=440.00]  


Epoch #3025: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3026: 1025it [00:02, 365.28it/s, env_step=3098624, len=22, n/ep=3, n/st=64, player_1/loss=216.625, player_2/loss=285.791, rew=504.67]


Epoch #3026: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3027: 1025it [00:02, 368.82it/s, env_step=3099648, len=12, n/ep=4, n/st=64, player_1/loss=172.127, player_2/loss=469.540, rew=178.50]


Epoch #3027: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3028: 1025it [00:02, 363.46it/s, env_step=3100672, len=36, n/ep=2, n/st=64, player_1/loss=281.706, player_2/loss=517.862, rew=1330.00]


Epoch #3028: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3029: 1025it [00:02, 365.67it/s, env_step=3101696, len=28, n/ep=2, n/st=64, player_1/loss=348.017, player_2/loss=461.099, rew=881.00]


Epoch #3029: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3030: 1025it [00:02, 364.89it/s, env_step=3102720, len=27, n/ep=2, n/st=64, player_1/loss=307.794, player_2/loss=379.494, rew=758.00]


Epoch #3030: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3031: 1025it [00:02, 366.06it/s, env_step=3103744, len=30, n/ep=2, n/st=64, player_1/loss=268.121, player_2/loss=374.490, rew=932.00]


Epoch #3031: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3032: 1025it [00:02, 366.32it/s, env_step=3104768, len=26, n/ep=2, n/st=64, player_1/loss=169.993, player_2/loss=237.178, rew=727.00]


Epoch #3032: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3033: 1025it [00:02, 362.69it/s, env_step=3105792, len=10, n/ep=7, n/st=64, player_1/loss=385.312, player_2/loss=370.613, rew=116.57]


Epoch #3033: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3034: 1025it [00:02, 366.85it/s, env_step=3106816, len=13, n/ep=5, n/st=64, player_1/loss=361.387, player_2/loss=673.311, rew=263.20]


Epoch #3034: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3035: 1025it [00:02, 364.37it/s, env_step=3107840, len=11, n/ep=6, n/st=64, player_1/loss=106.049, player_2/loss=699.622, rew=137.33]


Epoch #3035: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3036: 1025it [00:02, 368.03it/s, env_step=3108864, len=20, n/ep=3, n/st=64, player_1/loss=126.917, player_2/loss=441.156, rew=670.00]


Epoch #3036: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3037: 1025it [00:02, 364.63it/s, env_step=3109888, len=14, n/ep=5, n/st=64, player_1/loss=172.881, player_2/loss=600.979, rew=210.40]


Epoch #3037: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3038: 1025it [00:02, 367.24it/s, env_step=3110912, len=21, n/ep=3, n/st=64, player_1/loss=197.882, player_2/loss=1024.823, rew=476.00]


Epoch #3038: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3039: 1025it [00:02, 366.06it/s, env_step=3111936, len=21, n/ep=3, n/st=64, player_1/loss=97.415, player_2/loss=934.173, rew=460.67]


Epoch #3039: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3040: 1025it [00:02, 366.06it/s, env_step=3112960, len=19, n/ep=3, n/st=64, player_1/loss=85.717, player_2/loss=377.871, rew=406.00]


Epoch #3040: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3041: 1025it [00:02, 361.41it/s, env_step=3113984, len=25, n/ep=2, n/st=64, player_1/loss=163.764, player_2/loss=560.981, rew=680.00]


Epoch #3041: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3042: 1025it [00:02, 366.19it/s, env_step=3115008, len=7, n/ep=8, n/st=64, player_1/loss=305.376, player_2/loss=459.026, rew=62.50]


Epoch #3042: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3043: 1025it [00:02, 362.69it/s, env_step=3116032, len=21, n/ep=2, n/st=64, player_1/loss=275.972, player_2/loss=432.358, rew=638.00]


Epoch #3043: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3044: 1025it [00:02, 366.98it/s, env_step=3117056, len=32, n/ep=2, n/st=64, player_1/loss=175.972, player_2/loss=346.634, rew=1087.00]


Epoch #3044: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3045: 1025it [00:02, 363.72it/s, env_step=3118080, len=21, n/ep=3, n/st=64, player_1/loss=125.980, player_2/loss=335.499, rew=476.00]


Epoch #3045: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3046: 1025it [00:02, 363.98it/s, env_step=3119104, len=28, n/ep=2, n/st=64, player_1/loss=116.006, player_2/loss=317.251, rew=851.00]


Epoch #3046: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3047: 1025it [00:02, 363.72it/s, env_step=3120128, len=23, n/ep=3, n/st=64, player_1/loss=474.811, player_2/loss=440.802, rew=552.67]


Epoch #3047: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3048: 1025it [00:02, 368.43it/s, env_step=3121152, len=24, n/ep=3, n/st=64, player_1/loss=720.927, player_2/loss=505.232, rew=710.67]


Epoch #3048: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3049: 1025it [00:02, 362.05it/s, env_step=3122176, len=27, n/ep=3, n/st=64, player_1/loss=490.253, player_2/loss=457.941, rew=795.33]


Epoch #3049: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3050: 1025it [00:02, 360.27it/s, env_step=3123200, len=31, n/ep=2, n/st=64, player_1/loss=179.531, player_2/loss=484.828, rew=990.00]


Epoch #3050: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3051: 1025it [00:02, 366.19it/s, env_step=3124224, len=23, n/ep=3, n/st=64, player_1/loss=114.139, player_2/loss=517.924, rew=710.67]


Epoch #3051: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3052: 1025it [00:02, 361.03it/s, env_step=3125248, len=28, n/ep=2, n/st=64, player_1/loss=202.098, player_2/loss=502.947, rew=839.00]


Epoch #3052: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3053: 1025it [00:02, 365.15it/s, env_step=3126272, len=28, n/ep=3, n/st=64, player_1/loss=205.955, player_2/loss=488.960, rew=818.00]


Epoch #3053: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3054: 1025it [00:02, 366.45it/s, env_step=3127296, len=26, n/ep=3, n/st=64, player_1/loss=205.601, player_2/loss=206.254, rew=700.67]


Epoch #3054: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3055: 1025it [00:02, 363.46it/s, env_step=3128320, len=22, n/ep=3, n/st=64, player_1/loss=188.667, player_2/loss=100.355, rew=581.33]


Epoch #3055: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3056: 1025it [00:02, 368.29it/s, env_step=3129344, len=20, n/ep=3, n/st=64, player_1/loss=183.455, player_2/loss=225.460, rew=640.00]


Epoch #3056: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3057: 1025it [00:02, 362.44it/s, env_step=3130368, len=34, n/ep=2, n/st=64, player_1/loss=211.884, player_2/loss=71.993, rew=1204.00]


Epoch #3057: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3058: 1025it [00:02, 365.80it/s, env_step=3131392, len=22, n/ep=3, n/st=64, player_1/loss=220.163, player_2/loss=159.310, rew=652.67]


Epoch #3058: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3059: 1025it [00:02, 364.50it/s, env_step=3132416, len=13, n/ep=5, n/st=64, player_1/loss=356.650, player_2/loss=282.737, rew=274.00]


Epoch #3059: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3060: 1025it [00:02, 368.56it/s, env_step=3133440, len=30, n/ep=2, n/st=64, player_1/loss=306.322, rew=928.00]  


Epoch #3060: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3061: 1025it [00:02, 365.15it/s, env_step=3134464, len=26, n/ep=2, n/st=64, player_1/loss=195.905, player_2/loss=171.244, rew=749.00]


Epoch #3061: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3062: 1025it [00:02, 361.54it/s, env_step=3135488, len=24, n/ep=3, n/st=64, player_1/loss=91.249, player_2/loss=225.154, rew=658.00]


Epoch #3062: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3063: 1025it [00:02, 365.80it/s, env_step=3136512, len=23, n/ep=3, n/st=64, player_1/loss=86.424, player_2/loss=405.573, rew=626.00]


Epoch #3063: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3064: 1025it [00:02, 362.82it/s, env_step=3137536, len=24, n/ep=3, n/st=64, player_1/loss=214.768, player_2/loss=544.449, rew=670.67]


Epoch #3064: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3065: 1025it [00:02, 366.06it/s, env_step=3138560, len=21, n/ep=3, n/st=64, player_1/loss=380.947, player_2/loss=281.593, rew=462.67]


Epoch #3065: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3066: 1025it [00:02, 361.29it/s, env_step=3139584, len=23, n/ep=3, n/st=64, player_1/loss=381.560, player_2/loss=253.204, rew=623.33]


Epoch #3066: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3067: 1025it [00:02, 365.93it/s, env_step=3140608, len=32, n/ep=2, n/st=64, player_1/loss=282.718, player_2/loss=313.771, rew=1087.00]


Epoch #3067: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3068: 1025it [00:02, 361.67it/s, env_step=3141632, len=16, n/ep=4, n/st=64, player_1/loss=300.026, player_2/loss=161.270, rew=293.50]


Epoch #3068: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3069: 1025it [00:02, 366.19it/s, env_step=3142656, len=20, n/ep=3, n/st=64, player_1/loss=328.934, player_2/loss=193.658, rew=563.33]


Epoch #3069: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3070: 1025it [00:02, 361.80it/s, env_step=3143680, len=24, n/ep=3, n/st=64, player_1/loss=247.047, player_2/loss=220.687, rew=600.00]


Epoch #3070: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3071: 1025it [00:02, 367.24it/s, env_step=3144704, len=33, n/ep=2, n/st=64, player_1/loss=205.583, player_2/loss=207.081, rew=1121.00]


Epoch #3071: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3072: 1025it [00:02, 365.67it/s, env_step=3145728, len=21, n/ep=3, n/st=64, player_1/loss=152.104, player_2/loss=459.014, rew=525.33]


Epoch #3072: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3073: 1025it [00:02, 363.85it/s, env_step=3146752, len=38, n/ep=1, n/st=64, player_1/loss=331.817, player_2/loss=522.670, rew=1480.00]


Epoch #3073: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3074: 1025it [00:02, 365.67it/s, env_step=3147776, len=22, n/ep=3, n/st=64, player_1/loss=337.931, player_2/loss=140.903, rew=549.33]


Epoch #3074: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3075: 1025it [00:02, 362.44it/s, env_step=3148800, len=24, n/ep=3, n/st=64, player_1/loss=113.909, player_2/loss=191.104, rew=696.00]


Epoch #3075: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3076: 1025it [00:02, 366.98it/s, env_step=3149824, len=26, n/ep=3, n/st=64, player_1/loss=193.673, player_2/loss=298.209, rew=727.33]


Epoch #3076: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3077: 1025it [00:02, 361.16it/s, env_step=3150848, len=21, n/ep=2, n/st=64, player_1/loss=336.825, player_2/loss=277.937, rew=524.00]


Epoch #3077: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3078: 1025it [00:02, 365.80it/s, env_step=3151872, len=18, n/ep=2, n/st=64, player_1/loss=319.480, player_2/loss=232.021, rew=349.00]


Epoch #3078: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3079: 1025it [00:02, 363.85it/s, env_step=3152896, len=30, n/ep=3, n/st=64, player_1/loss=196.781, player_2/loss=240.660, rew=1032.00]


Epoch #3079: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3080: 1025it [00:02, 366.71it/s, env_step=3153920, len=32, n/ep=2, n/st=64, player_1/loss=437.893, player_2/loss=236.005, rew=1079.00]


Epoch #3080: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3081: 1025it [00:02, 361.29it/s, env_step=3154944, len=21, n/ep=4, n/st=64, player_1/loss=557.384, player_2/loss=158.417, rew=513.50]


Epoch #3081: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3082: 1025it [00:02, 365.67it/s, env_step=3155968, len=21, n/ep=3, n/st=64, player_1/loss=413.823, player_2/loss=122.915, rew=509.33]


Epoch #3082: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3083: 1025it [00:02, 363.46it/s, env_step=3156992, len=42, n/ep=1, n/st=64, player_1/loss=307.855, player_2/loss=149.335, rew=1834.00]


Epoch #3083: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3084: 1025it [00:02, 365.54it/s, env_step=3158016, len=26, n/ep=2, n/st=64, player_1/loss=293.125, player_2/loss=124.657, rew=869.00]


Epoch #3084: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3085: 1025it [00:02, 365.15it/s, env_step=3159040, len=31, n/ep=2, n/st=64, player_1/loss=141.388, player_2/loss=167.363, rew=1034.00]


Epoch #3085: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3086: 1025it [00:02, 365.67it/s, env_step=3160064, len=21, n/ep=3, n/st=64, player_1/loss=106.339, player_2/loss=323.564, rew=462.67]


Epoch #3086: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3087: 1025it [00:02, 363.21it/s, env_step=3161088, len=26, n/ep=2, n/st=64, player_1/loss=308.827, player_2/loss=504.060, rew=709.00]


Epoch #3087: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3088: 1025it [00:02, 365.67it/s, env_step=3162112, len=28, n/ep=2, n/st=64, player_1/loss=407.526, player_2/loss=573.681, rew=910.00]


Epoch #3088: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3089: 1025it [00:02, 362.95it/s, env_step=3163136, len=30, n/ep=2, n/st=64, player_1/loss=150.447, player_2/loss=587.055, rew=928.00]


Epoch #3089: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3090: 1025it [00:02, 368.16it/s, env_step=3164160, len=34, n/ep=2, n/st=64, player_1/loss=380.148, rew=1189.00] 


Epoch #3090: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3091: 1025it [00:02, 362.31it/s, env_step=3165184, len=17, n/ep=4, n/st=64, player_1/loss=504.308, player_2/loss=474.687, rew=331.00]


Epoch #3091: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3092: 1025it [00:02, 350.78it/s, env_step=3166208, len=32, n/ep=2, n/st=64, player_1/loss=249.017, player_2/loss=458.603, rew=1055.00]


Epoch #3092: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3093: 1025it [00:02, 363.21it/s, env_step=3167232, len=30, n/ep=2, n/st=64, player_1/loss=270.002, player_2/loss=274.098, rew=928.00]


Epoch #3093: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3094: 1025it [00:02, 362.05it/s, env_step=3168256, len=24, n/ep=3, n/st=64, player_1/loss=408.706, player_2/loss=590.757, rew=612.00]


Epoch #3094: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3095: 1025it [00:02, 368.03it/s, env_step=3169280, len=26, n/ep=3, n/st=64, player_1/loss=328.124, player_2/loss=583.067, rew=712.67]


Epoch #3095: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3096: 1025it [00:02, 364.89it/s, env_step=3170304, len=32, n/ep=2, n/st=64, player_1/loss=463.095, player_2/loss=629.948, rew=1103.00]


Epoch #3096: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3097: 1025it [00:02, 362.56it/s, env_step=3171328, len=24, n/ep=2, n/st=64, player_1/loss=547.063, rew=719.00]  


Epoch #3097: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3098: 1025it [00:02, 368.03it/s, env_step=3172352, len=42, n/ep=1, n/st=64, player_1/loss=492.609, player_2/loss=113.344, rew=1834.00]


Epoch #3098: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3099: 1025it [00:02, 363.21it/s, env_step=3173376, len=16, n/ep=3, n/st=64, player_1/loss=269.321, player_2/loss=493.671, rew=282.67]


Epoch #3099: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3100: 1025it [00:02, 364.76it/s, env_step=3174400, len=38, n/ep=2, n/st=64, player_1/loss=75.428, player_2/loss=675.924, rew=1511.00]


Epoch #3100: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3101: 1025it [00:02, 362.05it/s, env_step=3175424, len=31, n/ep=2, n/st=64, player_1/loss=318.325, player_2/loss=469.914, rew=1039.00]


Epoch #3101: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3102: 1025it [00:02, 367.90it/s, env_step=3176448, len=33, n/ep=2, n/st=64, player_1/loss=474.338, player_2/loss=372.687, rew=1160.00]


Epoch #3102: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3103: 1025it [00:02, 355.40it/s, env_step=3177472, len=22, n/ep=3, n/st=64, player_1/loss=356.774, player_2/loss=189.090, rew=504.67]


Epoch #3103: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3104: 1025it [00:02, 356.76it/s, env_step=3178496, len=15, n/ep=4, n/st=64, player_1/loss=291.645, player_2/loss=574.908, rew=438.50]


Epoch #3104: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3105: 1025it [00:02, 352.23it/s, env_step=3179520, len=29, n/ep=3, n/st=64, player_1/loss=242.964, player_2/loss=928.946, rew=978.67]


Epoch #3105: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3106: 1025it [00:02, 358.88it/s, env_step=3180544, len=16, n/ep=3, n/st=64, player_1/loss=299.319, player_2/loss=648.175, rew=349.33]


Epoch #3106: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3107: 1025it [00:02, 352.95it/s, env_step=3181568, len=37, n/ep=1, n/st=64, player_1/loss=140.357, player_2/loss=513.474, rew=1404.00]


Epoch #3107: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3108: 1025it [00:02, 360.40it/s, env_step=3182592, len=30, n/ep=2, n/st=64, player_1/loss=112.480, player_2/loss=423.194, rew=1001.00]


Epoch #3108: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3109: 1025it [00:02, 357.51it/s, env_step=3183616, len=15, n/ep=4, n/st=64, player_1/loss=229.227, player_2/loss=530.548, rew=264.00]


Epoch #3109: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3110: 1025it [00:02, 355.03it/s, env_step=3184640, len=32, n/ep=2, n/st=64, player_1/loss=213.701, player_2/loss=685.523, rew=1103.00]


Epoch #3110: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3111: 1025it [00:02, 354.91it/s, env_step=3185664, len=28, n/ep=2, n/st=64, player_1/loss=159.305, player_2/loss=596.031, rew=846.00]


Epoch #3111: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3112: 1025it [00:02, 358.01it/s, env_step=3186688, len=29, n/ep=2, n/st=64, player_1/loss=220.946, player_2/loss=115.906, rew=928.00]


Epoch #3112: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3113: 1025it [00:02, 355.16it/s, env_step=3187712, len=33, n/ep=2, n/st=64, player_1/loss=169.767, player_2/loss=237.111, rew=1121.00]


Epoch #3113: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3114: 1025it [00:02, 360.02it/s, env_step=3188736, len=25, n/ep=3, n/st=64, player_1/loss=248.872, player_2/loss=271.040, rew=706.67]


Epoch #3114: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3115: 1025it [00:02, 356.89it/s, env_step=3189760, len=22, n/ep=3, n/st=64, player_1/loss=319.203, player_2/loss=194.157, rew=522.00]


Epoch #3115: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3116: 1025it [00:02, 358.38it/s, env_step=3190784, len=27, n/ep=2, n/st=64, player_1/loss=451.674, player_2/loss=320.577, rew=755.00]


Epoch #3116: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3117: 1025it [00:02, 354.05it/s, env_step=3191808, len=23, n/ep=2, n/st=64, player_1/loss=497.363, player_2/loss=428.707, rew=554.00]


Epoch #3117: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3118: 1025it [00:02, 358.76it/s, env_step=3192832, len=26, n/ep=2, n/st=64, player_1/loss=402.631, player_2/loss=878.394, rew=709.00]


Epoch #3118: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3119: 1025it [00:02, 360.02it/s, env_step=3193856, len=34, n/ep=2, n/st=64, player_1/loss=147.867, player_2/loss=1006.260, rew=1204.00]


Epoch #3119: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3120: 1025it [00:02, 354.91it/s, env_step=3194880, len=31, n/ep=2, n/st=64, player_1/loss=338.441, player_2/loss=697.525, rew=1006.00]


Epoch #3120: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3121: 1025it [00:02, 358.38it/s, env_step=3195904, len=22, n/ep=3, n/st=64, player_1/loss=630.055, player_2/loss=658.814, rew=516.67]


Epoch #3121: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3122: 1025it [00:02, 354.78it/s, env_step=3196928, len=24, n/ep=2, n/st=64, player_1/loss=586.819, player_2/loss=780.251, rew=599.00]


Epoch #3122: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3123: 1025it [00:02, 356.02it/s, env_step=3197952, len=24, n/ep=2, n/st=64, player_1/loss=358.878, player_2/loss=372.453, rew=635.00]


Epoch #3123: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3124: 1025it [00:02, 355.89it/s, env_step=3198976, len=29, n/ep=2, n/st=64, player_1/loss=377.009, player_2/loss=667.696, rew=872.00]


Epoch #3124: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3125: 1025it [00:02, 358.51it/s, env_step=3200000, len=31, n/ep=2, n/st=64, player_1/loss=342.849, player_2/loss=677.469, rew=1024.00]


Epoch #3125: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3126: 1025it [00:02, 355.28it/s, env_step=3201024, len=33, n/ep=2, n/st=64, player_1/loss=537.247, player_2/loss=314.111, rew=1124.00]


Epoch #3126: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3127: 1025it [00:02, 357.38it/s, env_step=3202048, len=9, n/ep=7, n/st=64, player_1/loss=649.935, player_2/loss=207.106, rew=98.29]


Epoch #3127: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3128: 1025it [00:02, 356.14it/s, env_step=3203072, len=25, n/ep=2, n/st=64, player_1/loss=319.646, player_2/loss=260.800, rew=704.00]


Epoch #3128: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3129: 1025it [00:02, 359.14it/s, env_step=3204096, len=8, n/ep=8, n/st=64, player_1/loss=161.810, player_2/loss=587.602, rew=71.00]


Epoch #3129: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3130: 1025it [00:02, 352.47it/s, env_step=3205120, len=19, n/ep=3, n/st=64, player_1/loss=396.753, player_2/loss=621.708, rew=493.33]


Epoch #3130: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3131: 1025it [00:02, 355.65it/s, env_step=3206144, len=35, n/ep=2, n/st=64, player_1/loss=743.566, player_2/loss=525.908, rew=1267.00]


Epoch #3131: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3132: 1025it [00:02, 356.39it/s, env_step=3207168, len=30, n/ep=2, n/st=64, player_1/loss=381.267, player_2/loss=280.836, rew=929.00]


Epoch #3132: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3133: 1025it [00:02, 353.81it/s, env_step=3208192, len=29, n/ep=3, n/st=64, player_1/loss=235.661, player_2/loss=439.408, rew=917.33]


Epoch #3133: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3134: 1025it [00:02, 359.77it/s, env_step=3209216, len=29, n/ep=2, n/st=64, player_1/loss=216.849, player_2/loss=1284.537, rew=884.00]


Epoch #3134: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3135: 1025it [00:02, 360.14it/s, env_step=3210240, len=23, n/ep=3, n/st=64, player_1/loss=292.355, player_2/loss=976.436, rew=680.67]


Epoch #3135: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3136: 1025it [00:02, 363.08it/s, env_step=3211264, len=21, n/ep=2, n/st=64, player_1/loss=292.838, player_2/loss=127.409, rew=476.00]


Epoch #3136: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3137: 1025it [00:02, 366.85it/s, env_step=3212288, len=24, n/ep=2, n/st=64, player_1/loss=84.557, player_2/loss=85.115, rew=623.00]


Epoch #3137: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3138: 1025it [00:02, 362.44it/s, env_step=3213312, len=16, n/ep=4, n/st=64, player_1/loss=383.937, player_2/loss=102.178, rew=271.50]


Epoch #3138: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3139: 1025it [00:02, 366.32it/s, env_step=3214336, len=33, n/ep=2, n/st=64, player_1/loss=391.167, player_2/loss=228.204, rew=1124.00]


Epoch #3139: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3140: 1025it [00:02, 361.41it/s, env_step=3215360, len=15, n/ep=5, n/st=64, player_1/loss=138.373, player_2/loss=271.872, rew=247.20]


Epoch #3140: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3141: 1025it [00:02, 365.54it/s, env_step=3216384, len=27, n/ep=2, n/st=64, player_1/loss=154.215, player_2/loss=309.351, rew=782.00]


Epoch #3141: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3142: 1025it [00:02, 363.98it/s, env_step=3217408, len=16, n/ep=4, n/st=64, player_1/loss=216.136, player_2/loss=244.624, rew=393.00]


Epoch #3142: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3143: 1025it [00:02, 367.24it/s, env_step=3218432, len=30, n/ep=3, n/st=64, player_1/loss=541.142, player_2/loss=558.967, rew=977.33]


Epoch #3143: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3144: 1025it [00:02, 364.11it/s, env_step=3219456, len=39, n/ep=2, n/st=64, player_1/loss=434.181, player_2/loss=1066.298, rew=1598.00]


Epoch #3144: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3145: 1025it [00:02, 366.71it/s, env_step=3220480, len=20, n/ep=3, n/st=64, player_1/loss=118.148, player_2/loss=863.103, rew=455.33]


Epoch #3145: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3146: 1025it [00:02, 362.82it/s, env_step=3221504, len=27, n/ep=2, n/st=64, player_1/loss=363.149, player_2/loss=544.602, rew=803.00]


Epoch #3146: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3147: 1025it [00:02, 363.59it/s, env_step=3222528, len=24, n/ep=2, n/st=64, player_1/loss=524.954, player_2/loss=468.156, rew=805.00]


Epoch #3147: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3148: 1025it [00:02, 362.05it/s, env_step=3223552, len=42, n/ep=1, n/st=64, player_1/loss=484.002, player_2/loss=958.105, rew=1834.00]


Epoch #3148: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3149: 1025it [00:02, 369.22it/s, env_step=3224576, len=8, n/ep=8, n/st=64, player_1/loss=406.156, player_2/loss=1130.238, rew=78.25]


Epoch #3149: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3150: 1025it [00:02, 361.29it/s, env_step=3225600, len=37, n/ep=2, n/st=64, player_1/loss=435.339, player_2/loss=293.408, rew=1444.00]


Epoch #3150: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3151: 1025it [00:02, 369.36it/s, env_step=3226624, len=32, n/ep=2, n/st=64, player_1/loss=186.999, player_2/loss=95.863, rew=1079.00]


Epoch #3151: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3152: 1025it [00:02, 361.67it/s, env_step=3227648, len=14, n/ep=5, n/st=64, player_1/loss=196.946, player_2/loss=76.786, rew=380.00]


Epoch #3152: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3153: 1025it [00:02, 366.71it/s, env_step=3228672, len=8, n/ep=7, n/st=64, player_1/loss=261.364, player_2/loss=66.544, rew=90.57]


Epoch #3153: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3154: 1025it [00:02, 364.76it/s, env_step=3229696, len=34, n/ep=2, n/st=64, player_1/loss=437.259, rew=1225.00] 


Epoch #3154: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3155: 1025it [00:02, 361.16it/s, env_step=3230720, len=27, n/ep=3, n/st=64, player_1/loss=340.196, player_2/loss=77.610, rew=797.33]


Epoch #3155: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3156: 1025it [00:02, 366.58it/s, env_step=3231744, len=33, n/ep=2, n/st=64, player_1/loss=739.674, player_2/loss=73.282, rew=1154.00]


Epoch #3156: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3157: 1025it [00:02, 364.37it/s, env_step=3232768, len=28, n/ep=3, n/st=64, player_1/loss=828.486, player_2/loss=337.303, rew=855.33]


Epoch #3157: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3158: 1025it [00:02, 363.34it/s, env_step=3233792, len=10, n/ep=6, n/st=64, player_1/loss=341.651, player_2/loss=417.851, rew=141.33]


Epoch #3158: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3159: 1025it [00:02, 366.06it/s, env_step=3234816, len=21, n/ep=3, n/st=64, player_1/loss=187.551, player_2/loss=444.611, rew=522.00]


Epoch #3159: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3160: 1025it [00:02, 367.37it/s, env_step=3235840, len=28, n/ep=3, n/st=64, player_1/loss=139.785, player_2/loss=747.124, rew=824.00]


Epoch #3160: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3161: 1025it [00:02, 362.44it/s, env_step=3236864, len=13, n/ep=5, n/st=64, player_1/loss=350.289, player_2/loss=1036.522, rew=259.60]


Epoch #3161: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3162: 1025it [00:02, 365.93it/s, env_step=3237888, len=34, n/ep=2, n/st=64, player_1/loss=586.442, player_2/loss=1110.088, rew=1223.00]


Epoch #3162: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3163: 1025it [00:02, 365.54it/s, env_step=3238912, len=32, n/ep=2, n/st=64, player_1/loss=377.733, player_2/loss=1137.279, rew=1089.00]


Epoch #3163: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3164: 1025it [00:02, 364.50it/s, env_step=3239936, len=39, n/ep=1, n/st=64, player_1/loss=281.590, player_2/loss=834.682, rew=1558.00]


Epoch #3164: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3165: 1025it [00:02, 360.78it/s, env_step=3240960, len=18, n/ep=4, n/st=64, player_1/loss=336.913, player_2/loss=460.066, rew=405.00]


Epoch #3165: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3166: 1025it [00:02, 363.46it/s, env_step=3241984, len=22, n/ep=3, n/st=64, player_1/loss=210.645, player_2/loss=491.246, rew=535.33]


Epoch #3166: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3167: 1025it [00:02, 365.02it/s, env_step=3243008, len=33, n/ep=2, n/st=64, player_1/loss=337.606, player_2/loss=1129.570, rew=1154.00]


Epoch #3167: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3168: 1025it [00:02, 361.54it/s, env_step=3244032, len=16, n/ep=4, n/st=64, player_2/loss=1540.054, rew=278.50] 


Epoch #3168: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3169: 1025it [00:02, 367.50it/s, env_step=3245056, len=19, n/ep=3, n/st=64, player_1/loss=565.127, player_2/loss=1147.797, rew=469.33]


Epoch #3169: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3170: 1025it [00:02, 361.54it/s, env_step=3246080, len=37, n/ep=2, n/st=64, player_1/loss=568.258, player_2/loss=649.311, rew=1442.00]


Epoch #3170: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3171: 1025it [00:02, 367.11it/s, env_step=3247104, len=33, n/ep=2, n/st=64, player_1/loss=260.623, player_2/loss=503.173, rew=1216.00]


Epoch #3171: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3172: 1025it [00:02, 359.64it/s, env_step=3248128, len=33, n/ep=2, n/st=64, player_1/loss=92.660, player_2/loss=368.734, rew=1136.00]


Epoch #3172: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3173: 1025it [00:02, 367.77it/s, env_step=3249152, len=22, n/ep=2, n/st=64, player_1/loss=432.368, player_2/loss=252.214, rew=527.00]


Epoch #3173: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3174: 1025it [00:02, 360.78it/s, env_step=3250176, len=34, n/ep=2, n/st=64, player_1/loss=595.636, player_2/loss=120.395, rew=1235.00]


Epoch #3174: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3175: 1025it [00:02, 361.41it/s, env_step=3251200, len=38, n/ep=2, n/st=64, player_1/loss=461.027, player_2/loss=322.637, rew=1519.00]


Epoch #3175: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3176: 1025it [00:02, 365.02it/s, env_step=3252224, len=36, n/ep=2, n/st=64, player_1/loss=318.463, player_2/loss=319.295, rew=1339.00]


Epoch #3176: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3177: 1025it [00:02, 363.72it/s, env_step=3253248, len=8, n/ep=7, n/st=64, player_1/loss=926.249, player_2/loss=208.092, rew=80.57]


Epoch #3177: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3178: 1025it [00:02, 367.50it/s, env_step=3254272, len=31, n/ep=2, n/st=64, player_1/loss=1225.814, player_2/loss=344.923, rew=999.00]


Epoch #3178: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3179: 1025it [00:02, 362.31it/s, env_step=3255296, len=37, n/ep=2, n/st=64, player_1/loss=395.269, player_2/loss=279.904, rew=1413.00]


Epoch #3179: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3180: 1025it [00:02, 368.03it/s, env_step=3256320, len=30, n/ep=2, n/st=64, player_1/loss=48.891, player_2/loss=83.676, rew=959.00]


Epoch #3180: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3181: 1025it [00:02, 362.18it/s, env_step=3257344, len=25, n/ep=2, n/st=64, player_1/loss=51.837, player_2/loss=273.408, rew=674.00]


Epoch #3181: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3182: 1025it [00:02, 365.54it/s, env_step=3258368, len=30, n/ep=2, n/st=64, player_1/loss=42.470, player_2/loss=268.760, rew=961.00]


Epoch #3182: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3183: 1025it [00:02, 362.69it/s, env_step=3259392, len=22, n/ep=3, n/st=64, player_1/loss=39.779, player_2/loss=71.209, rew=548.67]


Epoch #3183: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3184: 1025it [00:02, 367.63it/s, env_step=3260416, len=24, n/ep=3, n/st=64, player_1/loss=350.784, player_2/loss=84.992, rew=622.67]


Epoch #3184: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3185: 1025it [00:02, 363.72it/s, env_step=3261440, len=29, n/ep=3, n/st=64, player_1/loss=645.564, player_2/loss=87.570, rew=902.67]


Epoch #3185: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3186: 1025it [00:02, 364.24it/s, env_step=3262464, len=33, n/ep=2, n/st=64, player_1/loss=886.516, player_2/loss=451.427, rew=1156.00]


Epoch #3186: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3187: 1025it [00:02, 366.19it/s, env_step=3263488, len=35, n/ep=2, n/st=64, player_1/loss=749.178, player_2/loss=475.097, rew=1322.00]


Epoch #3187: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3188: 1025it [00:02, 362.44it/s, env_step=3264512, len=26, n/ep=2, n/st=64, player_1/loss=240.710, player_2/loss=215.245, rew=869.00]


Epoch #3188: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3189: 1025it [00:02, 365.28it/s, env_step=3265536, len=25, n/ep=2, n/st=64, player_1/loss=86.480, player_2/loss=221.366, rew=680.00]


Epoch #3189: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3190: 1025it [00:02, 362.69it/s, env_step=3266560, len=32, n/ep=1, n/st=64, player_1/loss=100.502, player_2/loss=610.750, rew=1054.00]


Epoch #3190: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3191: 1025it [00:02, 365.15it/s, env_step=3267584, len=21, n/ep=3, n/st=64, player_1/loss=267.075, player_2/loss=1321.078, rew=475.33]


Epoch #3191: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3192: 1025it [00:02, 364.24it/s, env_step=3268608, len=27, n/ep=2, n/st=64, player_1/loss=332.288, player_2/loss=1010.110, rew=755.00]


Epoch #3192: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3193: 1025it [00:02, 364.37it/s, env_step=3269632, len=39, n/ep=2, n/st=64, player_1/loss=381.142, player_2/loss=563.714, rew=1619.00]


Epoch #3193: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3194: 1025it [00:02, 368.69it/s, env_step=3270656, len=39, n/ep=2, n/st=64, player_1/loss=227.176, player_2/loss=555.667, rew=1558.00]


Epoch #3194: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3195: 1025it [00:02, 357.51it/s, env_step=3271680, len=34, n/ep=2, n/st=64, player_1/loss=52.779, player_2/loss=637.545, rew=1243.00]


Epoch #3195: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3196: 1025it [00:02, 365.67it/s, env_step=3272704, len=35, n/ep=2, n/st=64, player_1/loss=61.609, player_2/loss=512.829, rew=1259.00]


Epoch #3196: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3197: 1025it [00:02, 367.11it/s, env_step=3273728, len=35, n/ep=2, n/st=64, player_1/loss=66.880, player_2/loss=548.164, rew=1267.00]


Epoch #3197: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3198: 1025it [00:02, 359.51it/s, env_step=3274752, len=30, n/ep=2, n/st=64, player_1/loss=402.977, player_2/loss=1026.179, rew=965.00]


Epoch #3198: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3199: 1025it [00:02, 369.22it/s, env_step=3275776, len=26, n/ep=2, n/st=64, player_1/loss=497.410, player_2/loss=1331.804, rew=749.00]


Epoch #3199: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3200: 1025it [00:02, 362.56it/s, env_step=3276800, len=33, n/ep=2, n/st=64, player_1/loss=210.251, player_2/loss=1007.153, rew=1184.00]


Epoch #3200: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3201: 1025it [00:02, 369.89it/s, env_step=3277824, len=32, n/ep=2, n/st=64, player_1/loss=186.470, player_2/loss=532.165, rew=1093.00]


Epoch #3201: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3202: 1025it [00:02, 362.05it/s, env_step=3278848, len=33, n/ep=1, n/st=64, player_2/loss=326.111, rew=1120.00] 


Epoch #3202: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3203: 1025it [00:02, 366.32it/s, env_step=3279872, len=29, n/ep=3, n/st=64, player_1/loss=435.485, player_2/loss=489.272, rew=1046.67]


Epoch #3203: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3204: 1025it [00:02, 363.59it/s, env_step=3280896, len=38, n/ep=2, n/st=64, player_1/loss=153.424, player_2/loss=311.390, rew=1481.00]


Epoch #3204: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3205: 1025it [00:02, 366.45it/s, env_step=3281920, len=32, n/ep=2, n/st=64, player_1/loss=328.336, player_2/loss=797.604, rew=1058.00]


Epoch #3205: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3206: 1025it [00:02, 366.19it/s, env_step=3282944, len=36, n/ep=2, n/st=64, player_1/loss=220.152, player_2/loss=929.756, rew=1334.00]


Epoch #3206: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3207: 1025it [00:02, 362.05it/s, env_step=3283968, len=27, n/ep=2, n/st=64, player_1/loss=133.251, player_2/loss=985.735, rew=812.00]


Epoch #3207: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3208: 1025it [00:02, 366.45it/s, env_step=3284992, len=28, n/ep=2, n/st=64, player_1/loss=689.465, player_2/loss=1166.812, rew=841.00]


Epoch #3208: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3209: 1025it [00:02, 365.15it/s, env_step=3286016, len=37, n/ep=1, n/st=64, player_1/loss=646.413, player_2/loss=720.997, rew=1404.00]


Epoch #3209: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3210: 1025it [00:02, 365.80it/s, env_step=3287040, len=31, n/ep=2, n/st=64, player_1/loss=396.080, player_2/loss=807.656, rew=1022.00]


Epoch #3210: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3211: 1025it [00:02, 363.34it/s, env_step=3288064, len=35, n/ep=2, n/st=64, player_1/loss=457.252, player_2/loss=288.585, rew=1258.00]


Epoch #3211: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3212: 1025it [00:02, 366.19it/s, env_step=3289088, len=30, n/ep=2, n/st=64, player_1/loss=459.040, player_2/loss=244.536, rew=961.00]


Epoch #3212: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3213: 1025it [00:02, 366.71it/s, env_step=3290112, len=24, n/ep=2, n/st=64, player_1/loss=521.822, player_2/loss=224.710, rew=647.00]


Epoch #3213: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3214: 1025it [00:02, 362.95it/s, env_step=3291136, len=34, n/ep=2, n/st=64, player_1/loss=383.990, player_2/loss=125.669, rew=1197.00]


Epoch #3214: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3215: 1025it [00:02, 366.19it/s, env_step=3292160, len=26, n/ep=2, n/st=64, player_1/loss=383.971, player_2/loss=304.487, rew=869.00]


Epoch #3215: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3216: 1025it [00:02, 361.92it/s, env_step=3293184, len=36, n/ep=2, n/st=64, player_1/loss=362.652, player_2/loss=585.412, rew=1367.00]


Epoch #3216: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3217: 1025it [00:02, 364.76it/s, env_step=3294208, len=26, n/ep=2, n/st=64, player_1/loss=54.903, player_2/loss=422.289, rew=736.00]


Epoch #3217: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3218: 1025it [00:02, 366.45it/s, env_step=3295232, len=8, n/ep=8, n/st=64, player_1/loss=210.656, player_2/loss=1108.926, rew=71.25]


Epoch #3218: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3219: 1025it [00:02, 365.41it/s, env_step=3296256, len=17, n/ep=2, n/st=64, player_1/loss=466.024, player_2/loss=1636.464, rew=308.00]


Epoch #3219: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3220: 1025it [00:02, 364.50it/s, env_step=3297280, len=21, n/ep=3, n/st=64, player_1/loss=358.524, player_2/loss=1008.943, rew=472.67]


Epoch #3220: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3221: 1025it [00:02, 364.76it/s, env_step=3298304, len=35, n/ep=2, n/st=64, player_1/loss=335.215, player_2/loss=634.668, rew=1300.00]


Epoch #3221: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3222: 1025it [00:02, 365.67it/s, env_step=3299328, len=21, n/ep=2, n/st=64, player_1/loss=1063.740, player_2/loss=1018.582, rew=502.00]


Epoch #3222: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3223: 1025it [00:02, 362.56it/s, env_step=3300352, len=38, n/ep=2, n/st=64, player_1/loss=923.467, player_2/loss=1034.371, rew=1480.00]


Epoch #3223: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3224: 1025it [00:02, 365.93it/s, env_step=3301376, len=31, n/ep=2, n/st=64, player_1/loss=109.078, player_2/loss=338.164, rew=1052.00]


Epoch #3224: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3225: 1025it [00:02, 365.02it/s, env_step=3302400, len=25, n/ep=3, n/st=64, player_1/loss=153.376, player_2/loss=659.795, rew=652.67]


Epoch #3225: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3226: 1025it [00:02, 367.24it/s, env_step=3303424, len=21, n/ep=3, n/st=64, player_1/loss=240.832, player_2/loss=960.715, rew=489.33]


Epoch #3226: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3227: 1025it [00:02, 368.29it/s, env_step=3304448, len=32, n/ep=2, n/st=64, player_1/loss=278.246, player_2/loss=917.638, rew=1089.00]


Epoch #3227: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3228: 1025it [00:02, 361.67it/s, env_step=3305472, len=22, n/ep=2, n/st=64, player_1/loss=517.841, player_2/loss=920.392, rew=529.00]


Epoch #3228: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3229: 1025it [00:02, 366.19it/s, env_step=3306496, len=19, n/ep=3, n/st=64, player_1/loss=498.004, player_2/loss=731.058, rew=430.00]


Epoch #3229: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3230: 1025it [00:02, 363.72it/s, env_step=3307520, len=27, n/ep=2, n/st=64, player_1/loss=149.804, player_2/loss=896.810, rew=755.00]


Epoch #3230: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3231: 1025it [00:02, 364.76it/s, env_step=3308544, len=29, n/ep=2, n/st=64, player_1/loss=785.018, player_2/loss=931.033, rew=918.00]


Epoch #3231: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3232: 1025it [00:02, 363.21it/s, env_step=3309568, len=31, n/ep=2, n/st=64, player_1/loss=1013.411, player_2/loss=429.647, rew=1042.00]


Epoch #3232: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3233: 1025it [00:02, 365.54it/s, env_step=3310592, len=40, n/ep=1, n/st=64, player_2/loss=291.201, rew=1638.00] 


Epoch #3233: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3234: 1025it [00:02, 364.76it/s, env_step=3311616, len=30, n/ep=2, n/st=64, player_1/loss=544.477, player_2/loss=143.666, rew=1009.00]


Epoch #3234: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3235: 1025it [00:02, 365.93it/s, env_step=3312640, len=40, n/ep=2, n/st=64, player_1/loss=300.429, player_2/loss=102.283, rew=1696.00]


Epoch #3235: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3236: 1025it [00:02, 361.54it/s, env_step=3313664, len=25, n/ep=2, n/st=64, player_1/loss=691.438, rew=686.00]  


Epoch #3236: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3237: 1025it [00:02, 366.71it/s, env_step=3314688, len=29, n/ep=2, n/st=64, player_1/loss=421.741, player_2/loss=336.760, rew=928.00]


Epoch #3237: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3238: 1025it [00:02, 367.24it/s, env_step=3315712, len=37, n/ep=2, n/st=64, player_1/loss=58.239, player_2/loss=498.553, rew=1444.00]


Epoch #3238: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3239: 1025it [00:02, 363.08it/s, env_step=3316736, len=24, n/ep=2, n/st=64, player_1/loss=56.472, player_2/loss=495.335, rew=634.00]


Epoch #3239: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3240: 1025it [00:02, 368.03it/s, env_step=3317760, len=42, n/ep=1, n/st=64, player_1/loss=96.105, player_2/loss=440.687, rew=1834.00]


Epoch #3240: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3241: 1025it [00:02, 366.98it/s, env_step=3318784, len=42, n/ep=1, n/st=64, player_1/loss=262.957, player_2/loss=408.738, rew=1834.00]


Epoch #3241: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3242: 1025it [00:02, 363.34it/s, env_step=3319808, len=35, n/ep=2, n/st=64, player_1/loss=522.042, player_2/loss=577.225, rew=1306.00]


Epoch #3242: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3243: 1025it [00:02, 363.85it/s, env_step=3320832, len=37, n/ep=1, n/st=64, player_1/loss=706.476, player_2/loss=396.886, rew=1404.00]


Epoch #3243: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3244: 1025it [00:02, 366.32it/s, env_step=3321856, len=32, n/ep=2, n/st=64, player_1/loss=425.924, player_2/loss=140.065, rew=1117.00]


Epoch #3244: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3245: 1025it [00:02, 361.29it/s, env_step=3322880, len=38, n/ep=2, n/st=64, player_1/loss=74.687, player_2/loss=162.298, rew=1546.00]


Epoch #3245: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3246: 1025it [00:02, 364.11it/s, env_step=3323904, len=34, n/ep=2, n/st=64, player_1/loss=255.989, player_2/loss=171.938, rew=1229.00]


Epoch #3246: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3247: 1025it [00:02, 368.29it/s, env_step=3324928, len=28, n/ep=3, n/st=64, player_1/loss=262.495, player_2/loss=136.267, rew=884.00]


Epoch #3247: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3248: 1025it [00:02, 359.89it/s, env_step=3325952, len=34, n/ep=2, n/st=64, player_1/loss=231.166, player_2/loss=112.798, rew=1197.00]


Epoch #3248: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3249: 1025it [00:02, 365.80it/s, env_step=3326976, len=33, n/ep=2, n/st=64, player_1/loss=190.168, player_2/loss=351.134, rew=1145.00]


Epoch #3249: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3250: 1025it [00:02, 363.08it/s, env_step=3328000, len=30, n/ep=2, n/st=64, player_1/loss=115.294, player_2/loss=386.589, rew=964.00]


Epoch #3250: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3251: 1025it [00:02, 365.41it/s, env_step=3329024, len=14, n/ep=4, n/st=64, player_1/loss=129.467, player_2/loss=714.567, rew=232.50]


Epoch #3251: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3252: 1025it [00:02, 360.02it/s, env_step=3330048, len=38, n/ep=2, n/st=64, player_1/loss=268.033, player_2/loss=473.814, rew=1481.00]


Epoch #3252: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3253: 1025it [00:02, 365.28it/s, env_step=3331072, len=8, n/ep=8, n/st=64, player_1/loss=259.603, player_2/loss=495.990, rew=82.00]


Epoch #3253: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3254: 1025it [00:02, 366.84it/s, env_step=3332096, len=21, n/ep=2, n/st=64, player_1/loss=266.249, player_2/loss=667.095, rew=554.00]


Epoch #3254: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3255: 1025it [00:02, 364.11it/s, env_step=3333120, len=30, n/ep=2, n/st=64, player_1/loss=324.488, player_2/loss=1012.022, rew=953.00]


Epoch #3255: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3256: 1025it [00:02, 367.63it/s, env_step=3334144, len=35, n/ep=2, n/st=64, player_1/loss=116.279, player_2/loss=623.222, rew=1322.00]


Epoch #3256: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3257: 1025it [00:02, 362.56it/s, env_step=3335168, len=30, n/ep=2, n/st=64, player_1/loss=420.777, player_2/loss=354.739, rew=989.00]


Epoch #3257: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3258: 1025it [00:02, 365.41it/s, env_step=3336192, len=29, n/ep=2, n/st=64, player_1/loss=691.959, player_2/loss=401.589, rew=884.00]


Epoch #3258: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3259: 1025it [00:02, 359.77it/s, env_step=3337216, len=37, n/ep=2, n/st=64, player_1/loss=352.922, player_2/loss=1554.708, rew=1413.00]


Epoch #3259: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3260: 1025it [00:02, 364.11it/s, env_step=3338240, len=34, n/ep=2, n/st=64, player_1/loss=214.724, player_2/loss=1624.161, rew=1197.00]


Epoch #3260: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3261: 1025it [00:02, 363.21it/s, env_step=3339264, len=30, n/ep=2, n/st=64, player_1/loss=290.097, player_2/loss=503.890, rew=971.00]


Epoch #3261: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3262: 1025it [00:02, 366.19it/s, env_step=3340288, len=15, n/ep=4, n/st=64, player_1/loss=662.566, player_2/loss=512.196, rew=240.00]


Epoch #3262: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3263: 1025it [00:02, 362.31it/s, env_step=3341312, len=19, n/ep=4, n/st=64, player_1/loss=740.885, player_2/loss=1262.773, rew=471.00]


Epoch #3263: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3264: 1025it [00:02, 367.77it/s, env_step=3342336, len=14, n/ep=5, n/st=64, player_1/loss=310.518, player_2/loss=1536.023, rew=230.80]


Epoch #3264: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3265: 1025it [00:02, 362.95it/s, env_step=3343360, len=29, n/ep=2, n/st=64, player_1/loss=446.356, player_2/loss=1043.763, rew=898.00]


Epoch #3265: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3266: 1025it [00:02, 365.93it/s, env_step=3344384, len=29, n/ep=3, n/st=64, player_1/loss=490.504, player_2/loss=1037.006, rew=946.00]


Epoch #3266: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3267: 1025it [00:02, 361.67it/s, env_step=3345408, len=31, n/ep=2, n/st=64, player_2/loss=1153.556, rew=1024.00]


Epoch #3267: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3268: 1025it [00:02, 366.98it/s, env_step=3346432, len=31, n/ep=2, n/st=64, player_1/loss=187.593, player_2/loss=620.292, rew=1022.00]


Epoch #3268: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3269: 1025it [00:02, 361.03it/s, env_step=3347456, len=27, n/ep=2, n/st=64, player_1/loss=235.445, player_2/loss=314.219, rew=758.00]


Epoch #3269: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3270: 1025it [00:02, 369.09it/s, env_step=3348480, len=21, n/ep=2, n/st=64, player_1/loss=308.652, player_2/loss=122.637, rew=496.00]


Epoch #3270: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3271: 1025it [00:02, 366.45it/s, env_step=3349504, len=8, n/ep=7, n/st=64, player_1/loss=520.455, player_2/loss=78.195, rew=88.57]


Epoch #3271: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3272: 1025it [00:02, 362.95it/s, env_step=3350528, len=31, n/ep=2, n/st=64, player_1/loss=439.444, player_2/loss=85.373, rew=1054.00]


Epoch #3272: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3273: 1025it [00:02, 366.98it/s, env_step=3351552, len=36, n/ep=2, n/st=64, player_1/loss=145.279, player_2/loss=72.869, rew=1379.00]


Epoch #3273: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3274: 1025it [00:02, 361.80it/s, env_step=3352576, len=31, n/ep=3, n/st=64, player_1/loss=152.069, player_2/loss=94.538, rew=1087.33]


Epoch #3274: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3275: 1025it [00:02, 361.29it/s, env_step=3353600, len=24, n/ep=3, n/st=64, player_1/loss=395.220, player_2/loss=877.720, rew=602.67]


Epoch #3275: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3276: 1025it [00:02, 365.28it/s, env_step=3354624, len=28, n/ep=2, n/st=64, player_1/loss=497.653, player_2/loss=1106.972, rew=811.00]


Epoch #3276: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3277: 1025it [00:02, 362.05it/s, env_step=3355648, len=32, n/ep=2, n/st=64, player_1/loss=431.036, player_2/loss=975.588, rew=1070.00]


Epoch #3277: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3278: 1025it [00:02, 368.03it/s, env_step=3356672, len=33, n/ep=2, n/st=64, player_1/loss=359.985, player_2/loss=232.608, rew=1166.00]


Epoch #3278: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3279: 1025it [00:02, 362.05it/s, env_step=3357696, len=23, n/ep=3, n/st=64, player_1/loss=99.490, player_2/loss=483.081, rew=632.67]


Epoch #3279: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3280: 1025it [00:02, 365.80it/s, env_step=3358720, len=30, n/ep=2, n/st=64, player_1/loss=49.495, player_2/loss=487.090, rew=1015.00]


Epoch #3280: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3281: 1025it [00:02, 362.82it/s, env_step=3359744, len=31, n/ep=2, n/st=64, player_1/loss=229.413, player_2/loss=408.151, rew=991.00]


Epoch #3281: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3282: 1025it [00:02, 369.22it/s, env_step=3360768, len=26, n/ep=3, n/st=64, player_1/loss=464.225, player_2/loss=76.651, rew=745.33]


Epoch #3282: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3283: 1025it [00:02, 363.72it/s, env_step=3361792, len=25, n/ep=2, n/st=64, player_1/loss=302.348, player_2/loss=165.065, rew=686.00]


Epoch #3283: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3284: 1025it [00:02, 364.76it/s, env_step=3362816, len=28, n/ep=2, n/st=64, player_1/loss=154.763, player_2/loss=885.127, rew=851.00]


Epoch #3284: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3285: 1025it [00:02, 362.31it/s, env_step=3363840, len=21, n/ep=3, n/st=64, player_1/loss=382.611, player_2/loss=1199.974, rew=462.00]


Epoch #3285: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3286: 1025it [00:02, 366.58it/s, env_step=3364864, len=30, n/ep=2, n/st=64, player_1/loss=298.222, player_2/loss=715.202, rew=1015.00]


Epoch #3286: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3287: 1025it [00:02, 363.98it/s, env_step=3365888, len=38, n/ep=2, n/st=64, player_1/loss=154.037, player_2/loss=493.054, rew=1521.00]


Epoch #3287: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3288: 1025it [00:02, 365.41it/s, env_step=3366912, len=22, n/ep=2, n/st=64, player_1/loss=277.832, rew=513.00]  


Epoch #3288: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3289: 1025it [00:02, 365.54it/s, env_step=3367936, len=14, n/ep=4, n/st=64, player_1/loss=141.152, player_2/loss=1095.112, rew=217.00]


Epoch #3289: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3290: 1025it [00:02, 361.54it/s, env_step=3368960, len=19, n/ep=4, n/st=64, player_1/loss=168.274, player_2/loss=902.623, rew=471.00]


Epoch #3290: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3291: 1025it [00:02, 367.77it/s, env_step=3369984, len=24, n/ep=2, n/st=64, player_1/loss=407.543, player_2/loss=343.677, rew=598.00]


Epoch #3291: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3292: 1025it [00:02, 361.80it/s, env_step=3371008, len=21, n/ep=3, n/st=64, player_1/loss=512.833, player_2/loss=451.281, rew=462.00]


Epoch #3292: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3293: 1025it [00:02, 367.11it/s, env_step=3372032, len=23, n/ep=3, n/st=64, player_1/loss=366.909, player_2/loss=535.149, rew=552.67]


Epoch #3293: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3294: 1025it [00:02, 367.63it/s, env_step=3373056, len=23, n/ep=3, n/st=64, player_1/loss=546.498, player_2/loss=299.776, rew=603.33]


Epoch #3294: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3295: 1025it [00:02, 362.82it/s, env_step=3374080, len=15, n/ep=4, n/st=64, player_1/loss=558.533, player_2/loss=477.763, rew=238.50]


Epoch #3295: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3296: 1025it [00:02, 365.80it/s, env_step=3375104, len=34, n/ep=2, n/st=64, player_1/loss=367.916, player_2/loss=565.667, rew=1237.00]


Epoch #3296: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3297: 1025it [00:02, 364.24it/s, env_step=3376128, len=21, n/ep=3, n/st=64, player_1/loss=583.251, player_2/loss=172.085, rew=532.00]


Epoch #3297: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3298: 1025it [00:02, 366.58it/s, env_step=3377152, len=25, n/ep=2, n/st=64, player_1/loss=866.929, player_2/loss=203.435, rew=674.00]


Epoch #3298: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3299: 1025it [00:02, 360.78it/s, env_step=3378176, len=26, n/ep=2, n/st=64, player_1/loss=775.254, player_2/loss=281.732, rew=727.00]


Epoch #3299: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3300: 1025it [00:02, 357.63it/s, env_step=3379200, len=28, n/ep=2, n/st=64, player_1/loss=588.186, player_2/loss=206.753, rew=839.00]


Epoch #3300: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3301: 1025it [00:02, 362.43it/s, env_step=3380224, len=22, n/ep=3, n/st=64, player_1/loss=437.932, player_2/loss=274.439, rew=576.00]


Epoch #3301: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3302: 1025it [00:02, 365.80it/s, env_step=3381248, len=27, n/ep=2, n/st=64, player_1/loss=518.001, player_2/loss=356.004, rew=784.00]


Epoch #3302: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3303: 1025it [00:02, 365.28it/s, env_step=3382272, len=33, n/ep=2, n/st=64, player_1/loss=330.487, player_2/loss=311.103, rew=1196.00]


Epoch #3303: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3304: 1025it [00:02, 362.44it/s, env_step=3383296, len=24, n/ep=3, n/st=64, player_1/loss=123.447, player_2/loss=239.989, rew=672.67]


Epoch #3304: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3305: 1025it [00:02, 366.71it/s, env_step=3384320, len=35, n/ep=1, n/st=64, player_1/loss=126.794, player_2/loss=227.628, rew=1258.00]


Epoch #3305: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3306: 1025it [00:02, 361.54it/s, env_step=3385344, len=28, n/ep=2, n/st=64, player_1/loss=301.869, player_2/loss=213.463, rew=869.00]


Epoch #3306: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3307: 1025it [00:02, 366.06it/s, env_step=3386368, len=23, n/ep=2, n/st=64, player_1/loss=660.327, player_2/loss=286.339, rew=575.00]


Epoch #3307: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3308: 1025it [00:02, 362.18it/s, env_step=3387392, len=20, n/ep=3, n/st=64, player_1/loss=926.045, player_2/loss=346.866, rew=473.33]


Epoch #3308: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3309: 1025it [00:02, 365.15it/s, env_step=3388416, len=25, n/ep=2, n/st=64, player_1/loss=535.716, player_2/loss=392.041, rew=676.00]


Epoch #3309: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3310: 1025it [00:02, 365.67it/s, env_step=3389440, len=27, n/ep=2, n/st=64, player_1/loss=222.891, player_2/loss=589.132, rew=782.00]


Epoch #3310: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3311: 1025it [00:02, 362.31it/s, env_step=3390464, len=27, n/ep=3, n/st=64, player_1/loss=168.097, player_2/loss=794.543, rew=799.33]


Epoch #3311: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3312: 1025it [00:02, 366.58it/s, env_step=3391488, len=39, n/ep=1, n/st=64, player_1/loss=376.912, player_2/loss=461.616, rew=1558.00]


Epoch #3312: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3313: 1025it [00:02, 365.80it/s, env_step=3392512, len=33, n/ep=2, n/st=64, player_1/loss=527.740, player_2/loss=284.958, rew=1145.00]


Epoch #3313: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3314: 1025it [00:02, 366.06it/s, env_step=3393536, len=29, n/ep=2, n/st=64, player_1/loss=320.564, player_2/loss=276.460, rew=884.00]


Epoch #3314: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3315: 1025it [00:02, 366.32it/s, env_step=3394560, len=31, n/ep=2, n/st=64, player_1/loss=378.863, player_2/loss=500.874, rew=1042.00]


Epoch #3315: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3316: 1025it [00:02, 361.41it/s, env_step=3395584, len=29, n/ep=3, n/st=64, player_1/loss=385.223, player_2/loss=375.129, rew=892.00]


Epoch #3316: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3317: 1025it [00:02, 365.80it/s, env_step=3396608, len=28, n/ep=3, n/st=64, player_1/loss=368.020, player_2/loss=366.407, rew=949.33]


Epoch #3317: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3318: 1025it [00:02, 363.46it/s, env_step=3397632, len=24, n/ep=3, n/st=64, player_1/loss=241.092, player_2/loss=356.979, rew=686.67]


Epoch #3318: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3319: 1025it [00:02, 364.11it/s, env_step=3398656, len=31, n/ep=2, n/st=64, player_1/loss=121.537, player_2/loss=262.158, rew=1126.00]


Epoch #3319: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3320: 1025it [00:02, 363.85it/s, env_step=3399680, len=29, n/ep=2, n/st=64, player_1/loss=197.233, player_2/loss=390.341, rew=970.00]


Epoch #3320: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3321: 1025it [00:02, 368.82it/s, env_step=3400704, len=38, n/ep=2, n/st=64, player_1/loss=247.612, player_2/loss=445.585, rew=1480.00]


Epoch #3321: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3322: 1025it [00:02, 362.18it/s, env_step=3401728, len=28, n/ep=2, n/st=64, player_1/loss=240.829, player_2/loss=225.318, rew=819.00]


Epoch #3322: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3323: 1025it [00:02, 362.56it/s, env_step=3402752, len=32, n/ep=2, n/st=64, player_1/loss=301.720, player_2/loss=185.636, rew=1169.00]


Epoch #3323: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3324: 1025it [00:02, 363.46it/s, env_step=3403776, len=32, n/ep=2, n/st=64, player_1/loss=209.046, player_2/loss=294.183, rew=1103.00]


Epoch #3324: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3325: 1025it [00:02, 365.15it/s, env_step=3404800, len=26, n/ep=2, n/st=64, player_1/loss=110.962, player_2/loss=366.863, rew=727.00]


Epoch #3325: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3326: 1025it [00:02, 366.85it/s, env_step=3405824, len=13, n/ep=3, n/st=64, player_1/loss=144.437, player_2/loss=259.626, rew=227.33]


Epoch #3326: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3327: 1025it [00:02, 360.52it/s, env_step=3406848, len=30, n/ep=3, n/st=64, player_1/loss=169.811, player_2/loss=502.783, rew=970.67]


Epoch #3327: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3328: 1025it [00:02, 366.85it/s, env_step=3407872, len=25, n/ep=3, n/st=64, player_2/loss=895.054, rew=714.67]  


Epoch #3328: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3329: 1025it [00:02, 364.50it/s, env_step=3408896, len=32, n/ep=3, n/st=64, player_1/loss=266.412, player_2/loss=800.026, rew=1127.33]


Epoch #3329: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3330: 1025it [00:02, 364.76it/s, env_step=3409920, len=38, n/ep=1, n/st=64, player_1/loss=436.001, player_2/loss=351.343, rew=1480.00]


Epoch #3330: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3331: 1025it [00:02, 365.02it/s, env_step=3410944, len=21, n/ep=3, n/st=64, player_1/loss=569.471, player_2/loss=337.164, rew=559.33]


Epoch #3331: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3332: 1025it [00:02, 362.18it/s, env_step=3411968, len=25, n/ep=2, n/st=64, player_1/loss=453.467, rew=716.00]  


Epoch #3332: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3333: 1025it [00:02, 367.77it/s, env_step=3412992, len=26, n/ep=2, n/st=64, player_1/loss=182.098, player_2/loss=259.178, rew=747.00]


Epoch #3333: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3334: 1025it [00:02, 362.18it/s, env_step=3414016, len=19, n/ep=3, n/st=64, player_1/loss=638.242, player_2/loss=384.437, rew=469.33]


Epoch #3334: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3335: 1025it [00:02, 365.80it/s, env_step=3415040, len=22, n/ep=2, n/st=64, player_1/loss=723.406, player_2/loss=712.812, rew=505.00]


Epoch #3335: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3336: 1025it [00:02, 360.40it/s, env_step=3416064, len=23, n/ep=3, n/st=64, player_1/loss=439.321, player_2/loss=977.869, rew=566.00]


Epoch #3336: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3337: 1025it [00:02, 363.34it/s, env_step=3417088, len=34, n/ep=2, n/st=64, player_1/loss=500.623, player_2/loss=836.547, rew=1235.00]


Epoch #3337: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3338: 1025it [00:02, 366.19it/s, env_step=3418112, len=30, n/ep=2, n/st=64, player_1/loss=561.602, player_2/loss=566.165, rew=929.00]


Epoch #3338: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3339: 1025it [00:02, 361.16it/s, env_step=3419136, len=19, n/ep=2, n/st=64, player_1/loss=480.959, player_2/loss=314.705, rew=382.00]


Epoch #3339: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3340: 1025it [00:02, 368.03it/s, env_step=3420160, len=26, n/ep=2, n/st=64, player_1/loss=263.434, player_2/loss=414.168, rew=739.00]


Epoch #3340: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3341: 1025it [00:02, 357.51it/s, env_step=3421184, len=22, n/ep=3, n/st=64, player_1/loss=364.605, rew=705.33]  


Epoch #3341: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3342: 1025it [00:02, 364.89it/s, env_step=3422208, len=27, n/ep=2, n/st=64, player_1/loss=318.691, player_2/loss=232.433, rew=794.00]


Epoch #3342: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3343: 1025it [00:02, 363.46it/s, env_step=3423232, len=34, n/ep=2, n/st=64, player_1/loss=187.545, player_2/loss=101.473, rew=1223.00]


Epoch #3343: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3344: 1025it [00:02, 365.02it/s, env_step=3424256, len=23, n/ep=3, n/st=64, player_1/loss=314.846, player_2/loss=329.407, rew=570.67]


Epoch #3344: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3345: 1025it [00:02, 363.21it/s, env_step=3425280, len=27, n/ep=3, n/st=64, player_1/loss=422.053, player_2/loss=519.699, rew=794.67]


Epoch #3345: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3346: 1025it [00:02, 367.50it/s, env_step=3426304, len=35, n/ep=2, n/st=64, player_1/loss=550.369, player_2/loss=715.210, rew=1267.00]


Epoch #3346: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3347: 1025it [00:02, 362.69it/s, env_step=3427328, len=29, n/ep=3, n/st=64, player_1/loss=194.301, player_2/loss=436.548, rew=882.00]


Epoch #3347: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3348: 1025it [00:02, 365.28it/s, env_step=3428352, len=22, n/ep=2, n/st=64, player_1/loss=106.557, player_2/loss=60.137, rew=527.00]


Epoch #3348: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3349: 1025it [00:02, 360.91it/s, env_step=3429376, len=33, n/ep=2, n/st=64, player_1/loss=393.314, player_2/loss=283.016, rew=1121.00]


Epoch #3349: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3350: 1025it [00:02, 366.32it/s, env_step=3430400, len=28, n/ep=3, n/st=64, player_1/loss=398.149, player_2/loss=420.689, rew=860.00]


Epoch #3350: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3351: 1025it [00:02, 361.03it/s, env_step=3431424, len=35, n/ep=2, n/st=64, player_1/loss=122.065, player_2/loss=422.274, rew=1267.00]


Epoch #3351: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3352: 1025it [00:02, 365.02it/s, env_step=3432448, len=33, n/ep=2, n/st=64, player_1/loss=123.029, player_2/loss=252.996, rew=1121.00]


Epoch #3352: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3353: 1025it [00:02, 366.58it/s, env_step=3433472, len=37, n/ep=2, n/st=64, player_1/loss=164.438, player_2/loss=445.487, rew=1405.00]


Epoch #3353: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3354: 1025it [00:02, 362.44it/s, env_step=3434496, len=32, n/ep=2, n/st=64, player_1/loss=226.383, player_2/loss=381.616, rew=1107.00]


Epoch #3354: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3355: 1025it [00:02, 363.72it/s, env_step=3435520, len=37, n/ep=2, n/st=64, player_1/loss=171.450, player_2/loss=188.564, rew=1413.00]


Epoch #3355: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3356: 1025it [00:02, 367.37it/s, env_step=3436544, len=33, n/ep=2, n/st=64, player_1/loss=267.821, player_2/loss=119.219, rew=1184.00]


Epoch #3356: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3357: 1025it [00:02, 362.56it/s, env_step=3437568, len=32, n/ep=2, n/st=64, player_1/loss=149.699, player_2/loss=327.259, rew=1093.00]


Epoch #3357: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3358: 1025it [00:02, 364.50it/s, env_step=3438592, len=34, n/ep=2, n/st=64, player_1/loss=144.786, player_2/loss=299.839, rew=1192.00]


Epoch #3358: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3359: 1025it [00:02, 366.19it/s, env_step=3439616, len=34, n/ep=2, n/st=64, player_1/loss=192.469, player_2/loss=66.044, rew=1253.00]


Epoch #3359: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3360: 1025it [00:02, 364.37it/s, env_step=3440640, len=33, n/ep=2, n/st=64, player_1/loss=212.042, player_2/loss=208.957, rew=1145.00]


Epoch #3360: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3361: 1025it [00:02, 366.84it/s, env_step=3441664, len=38, n/ep=1, n/st=64, player_1/loss=248.766, player_2/loss=389.789, rew=1480.00]


Epoch #3361: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3362: 1025it [00:02, 363.21it/s, env_step=3442688, len=33, n/ep=2, n/st=64, player_1/loss=87.493, player_2/loss=266.723, rew=1184.00]


Epoch #3362: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3363: 1025it [00:02, 367.50it/s, env_step=3443712, len=36, n/ep=2, n/st=64, player_1/loss=78.018, player_2/loss=572.166, rew=1334.00]


Epoch #3363: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3364: 1025it [00:02, 362.44it/s, env_step=3444736, len=19, n/ep=3, n/st=64, player_1/loss=91.345, player_2/loss=763.299, rew=404.67]


Epoch #3364: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3365: 1025it [00:02, 367.77it/s, env_step=3445760, len=9, n/ep=7, n/st=64, player_1/loss=365.665, rew=107.71]   


Epoch #3365: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3366: 1025it [00:02, 363.72it/s, env_step=3446784, len=16, n/ep=3, n/st=64, player_1/loss=386.325, player_2/loss=198.313, rew=376.00]


Epoch #3366: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3367: 1025it [00:02, 366.19it/s, env_step=3447808, len=28, n/ep=3, n/st=64, player_1/loss=305.626, player_2/loss=540.811, rew=928.67]


Epoch #3367: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3368: 1025it [00:02, 367.77it/s, env_step=3448832, len=21, n/ep=3, n/st=64, player_1/loss=374.889, player_2/loss=404.419, rew=460.67]


Epoch #3368: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3369: 1025it [00:02, 358.13it/s, env_step=3449856, len=34, n/ep=2, n/st=64, player_1/loss=389.539, player_2/loss=375.337, rew=1253.00]


Epoch #3369: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3370: 1025it [00:02, 367.11it/s, env_step=3450880, len=34, n/ep=1, n/st=64, player_1/loss=846.177, player_2/loss=484.840, rew=1188.00]


Epoch #3370: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3371: 1025it [00:02, 362.69it/s, env_step=3451904, len=38, n/ep=1, n/st=64, player_1/loss=757.894, player_2/loss=214.243, rew=1480.00]


Epoch #3371: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3372: 1025it [00:02, 365.15it/s, env_step=3452928, len=34, n/ep=1, n/st=64, player_1/loss=566.298, player_2/loss=184.468, rew=1188.00]


Epoch #3372: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3373: 1025it [00:02, 362.95it/s, env_step=3453952, len=27, n/ep=1, n/st=64, player_1/loss=539.272, player_2/loss=492.974, rew=754.00]


Epoch #3373: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3374: 1025it [00:02, 365.80it/s, env_step=3454976, len=15, n/ep=4, n/st=64, player_1/loss=167.852, player_2/loss=413.133, rew=260.50]


Epoch #3374: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3375: 1025it [00:02, 362.44it/s, env_step=3456000, len=33, n/ep=2, n/st=64, player_1/loss=553.984, player_2/loss=395.481, rew=1174.00]


Epoch #3375: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3376: 1025it [00:02, 365.41it/s, env_step=3457024, len=19, n/ep=3, n/st=64, player_1/loss=712.710, player_2/loss=323.328, rew=378.67]


Epoch #3376: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3377: 1025it [00:02, 362.82it/s, env_step=3458048, len=15, n/ep=5, n/st=64, player_1/loss=302.965, player_2/loss=690.068, rew=245.60]


Epoch #3377: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3378: 1025it [00:02, 365.28it/s, env_step=3459072, len=17, n/ep=3, n/st=64, player_1/loss=26.854, player_2/loss=568.371, rew=312.67]


Epoch #3378: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3379: 1025it [00:02, 363.34it/s, env_step=3460096, len=21, n/ep=3, n/st=64, player_2/loss=641.767, rew=476.00]  


Epoch #3379: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3380: 1025it [00:02, 367.11it/s, env_step=3461120, len=14, n/ep=4, n/st=64, player_1/loss=212.927, player_2/loss=499.892, rew=216.00]


Epoch #3380: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3381: 1025it [00:02, 365.28it/s, env_step=3462144, len=22, n/ep=3, n/st=64, player_1/loss=245.260, player_2/loss=824.391, rew=540.00]


Epoch #3381: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3382: 1025it [00:02, 361.29it/s, env_step=3463168, len=26, n/ep=3, n/st=64, player_1/loss=163.606, player_2/loss=812.531, rew=731.33]


Epoch #3382: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3383: 1025it [00:02, 364.76it/s, env_step=3464192, len=13, n/ep=4, n/st=64, player_1/loss=25.876, player_2/loss=781.027, rew=187.00]


Epoch #3383: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3384: 1025it [00:02, 362.82it/s, env_step=3465216, len=15, n/ep=4, n/st=64, player_1/loss=227.094, player_2/loss=689.527, rew=262.50]


Epoch #3384: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3385: 1025it [00:02, 364.76it/s, env_step=3466240, len=14, n/ep=4, n/st=64, player_1/loss=353.541, player_2/loss=314.700, rew=236.50]


Epoch #3385: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3386: 1025it [00:02, 359.64it/s, env_step=3467264, len=32, n/ep=2, n/st=64, player_1/loss=674.819, rew=1089.00] 


Epoch #3386: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3387: 1025it [00:02, 369.49it/s, env_step=3468288, len=27, n/ep=2, n/st=64, player_1/loss=616.061, player_2/loss=752.134, rew=788.00]


Epoch #3387: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3388: 1025it [00:02, 362.56it/s, env_step=3469312, len=30, n/ep=2, n/st=64, player_1/loss=288.033, player_2/loss=452.066, rew=929.00]


Epoch #3388: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3389: 1025it [00:02, 366.45it/s, env_step=3470336, len=27, n/ep=2, n/st=64, player_1/loss=266.317, player_2/loss=521.689, rew=784.00]


Epoch #3389: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3390: 1025it [00:02, 366.98it/s, env_step=3471360, len=32, n/ep=2, n/st=64, player_1/loss=256.185, player_2/loss=804.182, rew=1117.00]


Epoch #3390: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3391: 1025it [00:02, 364.24it/s, env_step=3472384, len=28, n/ep=2, n/st=64, player_1/loss=255.936, player_2/loss=379.475, rew=839.00]


Epoch #3391: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3392: 1025it [00:02, 364.24it/s, env_step=3473408, len=29, n/ep=2, n/st=64, player_1/loss=185.999, player_2/loss=249.129, rew=898.00]


Epoch #3392: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3393: 1025it [00:02, 363.21it/s, env_step=3474432, len=17, n/ep=4, n/st=64, player_1/loss=264.075, player_2/loss=362.672, rew=380.50]


Epoch #3393: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3394: 1025it [00:02, 367.11it/s, env_step=3475456, len=27, n/ep=2, n/st=64, player_1/loss=218.761, player_2/loss=169.464, rew=784.00]


Epoch #3394: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3395: 1025it [00:02, 362.44it/s, env_step=3476480, len=23, n/ep=3, n/st=64, player_1/loss=250.304, player_2/loss=317.836, rew=609.33]


Epoch #3395: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3396: 1025it [00:02, 365.54it/s, env_step=3477504, len=38, n/ep=1, n/st=64, player_1/loss=133.076, player_2/loss=628.496, rew=1480.00]


Epoch #3396: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3397: 1025it [00:02, 360.40it/s, env_step=3478528, len=39, n/ep=1, n/st=64, player_1/loss=637.688, player_2/loss=390.425, rew=1558.00]


Epoch #3397: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3398: 1025it [00:02, 367.77it/s, env_step=3479552, len=36, n/ep=2, n/st=64, player_1/loss=682.075, player_2/loss=448.630, rew=1373.00]


Epoch #3398: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3399: 1025it [00:02, 364.11it/s, env_step=3480576, len=26, n/ep=1, n/st=64, player_1/loss=415.225, player_2/loss=962.753, rew=700.00]


Epoch #3399: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3400: 1025it [00:02, 358.63it/s, env_step=3481600, len=16, n/ep=3, n/st=64, player_1/loss=344.201, player_2/loss=966.417, rew=414.67]


Epoch #3400: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3401: 1025it [00:02, 364.63it/s, env_step=3482624, len=32, n/ep=2, n/st=64, player_1/loss=359.110, player_2/loss=477.442, rew=1090.00]


Epoch #3401: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3402: 1025it [00:02, 362.95it/s, env_step=3483648, len=24, n/ep=2, n/st=64, player_1/loss=104.049, player_2/loss=236.918, rew=767.00]


Epoch #3402: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3403: 1025it [00:02, 364.50it/s, env_step=3484672, len=16, n/ep=4, n/st=64, player_1/loss=523.431, player_2/loss=159.669, rew=271.00]


Epoch #3403: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3404: 1025it [00:02, 358.01it/s, env_step=3485696, len=17, n/ep=4, n/st=64, player_1/loss=529.235, player_2/loss=275.569, rew=316.00]


Epoch #3404: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3405: 1025it [00:02, 361.16it/s, env_step=3486720, len=21, n/ep=3, n/st=64, player_1/loss=470.285, player_2/loss=587.039, rew=476.00]


Epoch #3405: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3406: 1025it [00:02, 366.71it/s, env_step=3487744, len=22, n/ep=3, n/st=64, player_1/loss=619.669, player_2/loss=525.563, rew=510.00]


Epoch #3406: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3407: 1025it [00:02, 359.89it/s, env_step=3488768, len=42, n/ep=1, n/st=64, player_1/loss=543.549, player_2/loss=648.838, rew=1804.00]


Epoch #3407: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3408: 1025it [00:02, 370.56it/s, env_step=3489792, len=31, n/ep=2, n/st=64, player_1/loss=419.711, player_2/loss=922.975, rew=1034.00]


Epoch #3408: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3409: 1025it [00:02, 365.93it/s, env_step=3490816, len=24, n/ep=2, n/st=64, player_1/loss=62.018, player_2/loss=799.227, rew=679.00]


Epoch #3409: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3410: 1025it [00:02, 360.78it/s, env_step=3491840, len=38, n/ep=2, n/st=64, player_1/loss=155.786, player_2/loss=176.428, rew=1521.00]


Epoch #3410: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3411: 1025it [00:02, 366.06it/s, env_step=3492864, len=32, n/ep=2, n/st=64, player_1/loss=145.150, player_2/loss=199.670, rew=1107.00]


Epoch #3411: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3412: 1025it [00:02, 360.40it/s, env_step=3493888, len=19, n/ep=3, n/st=64, player_1/loss=313.015, player_2/loss=183.320, rew=392.00]


Epoch #3412: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3413: 1025it [00:02, 364.76it/s, env_step=3494912, len=24, n/ep=3, n/st=64, player_1/loss=563.507, player_2/loss=166.893, rew=700.67]


Epoch #3413: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3414: 1025it [00:02, 367.37it/s, env_step=3495936, len=15, n/ep=4, n/st=64, player_1/loss=535.688, player_2/loss=285.274, rew=243.50]


Epoch #3414: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3415: 1025it [00:02, 360.27it/s, env_step=3496960, len=18, n/ep=4, n/st=64, player_1/loss=475.258, player_2/loss=282.292, rew=400.00]


Epoch #3415: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3416: 1025it [00:02, 356.64it/s, env_step=3497984, len=23, n/ep=2, n/st=64, player_1/loss=632.860, player_2/loss=327.295, rew=646.00]


Epoch #3416: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3417: 1025it [00:02, 355.40it/s, env_step=3499008, len=21, n/ep=3, n/st=64, player_1/loss=662.164, player_2/loss=402.508, rew=460.00]


Epoch #3417: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3418: 1025it [00:02, 362.95it/s, env_step=3500032, len=23, n/ep=3, n/st=64, player_1/loss=476.831, player_2/loss=492.064, rew=552.67]


Epoch #3418: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3419: 1025it [00:02, 365.80it/s, env_step=3501056, len=33, n/ep=2, n/st=64, player_1/loss=388.795, player_2/loss=684.832, rew=1121.00]


Epoch #3419: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3420: 1025it [00:02, 362.82it/s, env_step=3502080, len=23, n/ep=2, n/st=64, player_1/loss=277.177, player_2/loss=767.568, rew=614.00]


Epoch #3420: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3421: 1025it [00:02, 365.15it/s, env_step=3503104, len=32, n/ep=2, n/st=64, player_1/loss=308.397, player_2/loss=800.178, rew=1055.00]


Epoch #3421: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3422: 1025it [00:02, 364.37it/s, env_step=3504128, len=14, n/ep=4, n/st=64, player_1/loss=279.562, player_2/loss=944.719, rew=223.50]


Epoch #3422: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3423: 1025it [00:02, 363.59it/s, env_step=3505152, len=27, n/ep=2, n/st=64, player_1/loss=147.618, player_2/loss=768.079, rew=803.00]


Epoch #3423: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3424: 1025it [00:02, 363.59it/s, env_step=3506176, len=20, n/ep=3, n/st=64, player_1/loss=120.432, player_2/loss=666.301, rew=420.00]


Epoch #3424: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3425: 1025it [00:02, 364.24it/s, env_step=3507200, len=32, n/ep=2, n/st=64, player_1/loss=218.735, player_2/loss=675.506, rew=1054.00]


Epoch #3425: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3426: 1025it [00:02, 366.06it/s, env_step=3508224, len=30, n/ep=2, n/st=64, player_1/loss=173.227, player_2/loss=517.275, rew=961.00]


Epoch #3426: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3427: 1025it [00:02, 361.16it/s, env_step=3509248, len=23, n/ep=2, n/st=64, player_1/loss=127.724, player_2/loss=103.612, rew=594.00]


Epoch #3427: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3428: 1025it [00:02, 363.85it/s, env_step=3510272, len=22, n/ep=2, n/st=64, player_1/loss=117.112, player_2/loss=247.820, rew=568.00]


Epoch #3428: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3429: 1025it [00:02, 364.89it/s, env_step=3511296, len=32, n/ep=2, n/st=64, player_1/loss=75.952, player_2/loss=256.880, rew=1107.00]


Epoch #3429: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3430: 1025it [00:02, 362.31it/s, env_step=3512320, len=32, n/ep=2, n/st=64, player_1/loss=144.619, player_2/loss=261.895, rew=1079.00]


Epoch #3430: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3431: 1025it [00:02, 366.58it/s, env_step=3513344, len=29, n/ep=3, n/st=64, player_1/loss=340.124, player_2/loss=177.589, rew=922.00]


Epoch #3431: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3432: 1025it [00:02, 361.92it/s, env_step=3514368, len=31, n/ep=2, n/st=64, player_1/loss=492.291, player_2/loss=514.168, rew=1024.00]


Epoch #3432: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3433: 1025it [00:02, 364.24it/s, env_step=3515392, len=22, n/ep=3, n/st=64, player_1/loss=698.820, player_2/loss=409.821, rew=618.00]


Epoch #3433: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3434: 1025it [00:02, 367.63it/s, env_step=3516416, len=34, n/ep=2, n/st=64, player_1/loss=579.837, player_2/loss=873.813, rew=1294.00]


Epoch #3434: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3435: 1025it [00:02, 361.67it/s, env_step=3517440, len=27, n/ep=3, n/st=64, player_1/loss=396.847, player_2/loss=787.323, rew=782.67]


Epoch #3435: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3436: 1025it [00:02, 364.50it/s, env_step=3518464, len=31, n/ep=2, n/st=64, player_1/loss=262.884, player_2/loss=310.622, rew=1064.00]


Epoch #3436: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3437: 1025it [00:02, 360.40it/s, env_step=3519488, len=24, n/ep=3, n/st=64, player_1/loss=291.889, player_2/loss=151.398, rew=606.67]


Epoch #3437: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3438: 1025it [00:02, 365.28it/s, env_step=3520512, len=22, n/ep=3, n/st=64, player_1/loss=622.826, player_2/loss=217.046, rew=541.33]


Epoch #3438: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3439: 1025it [00:02, 366.45it/s, env_step=3521536, len=37, n/ep=2, n/st=64, player_1/loss=595.624, player_2/loss=156.767, rew=1408.00]


Epoch #3439: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3440: 1025it [00:02, 363.21it/s, env_step=3522560, len=30, n/ep=2, n/st=64, player_1/loss=434.924, player_2/loss=90.225, rew=1015.00]


Epoch #3440: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3441: 1025it [00:02, 366.45it/s, env_step=3523584, len=26, n/ep=2, n/st=64, player_1/loss=114.909, player_2/loss=86.286, rew=729.00]


Epoch #3441: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3442: 1025it [00:02, 366.71it/s, env_step=3524608, len=38, n/ep=2, n/st=64, player_1/loss=134.544, player_2/loss=188.576, rew=1480.00]


Epoch #3442: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3443: 1025it [00:02, 363.98it/s, env_step=3525632, len=26, n/ep=2, n/st=64, player_1/loss=54.371, player_2/loss=191.596, rew=747.00]


Epoch #3443: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3444: 1025it [00:02, 360.91it/s, env_step=3526656, len=18, n/ep=4, n/st=64, player_1/loss=147.366, player_2/loss=297.686, rew=364.00]


Epoch #3444: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3445: 1025it [00:02, 367.11it/s, env_step=3527680, len=22, n/ep=3, n/st=64, player_1/loss=561.789, player_2/loss=349.050, rew=537.33]


Epoch #3445: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3446: 1025it [00:02, 362.69it/s, env_step=3528704, len=38, n/ep=2, n/st=64, player_1/loss=573.772, player_2/loss=769.978, rew=1511.00]


Epoch #3446: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3447: 1025it [00:02, 365.41it/s, env_step=3529728, len=26, n/ep=3, n/st=64, player_1/loss=299.129, player_2/loss=684.408, rew=779.33]


Epoch #3447: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3448: 1025it [00:02, 364.37it/s, env_step=3530752, len=7, n/ep=8, n/st=64, player_1/loss=228.147, player_2/loss=238.334, rew=62.25]


Epoch #3448: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3449: 1025it [00:02, 361.41it/s, env_step=3531776, len=8, n/ep=7, n/st=64, player_1/loss=193.298, player_2/loss=231.271, rew=79.71]


Epoch #3449: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3450: 1025it [00:02, 365.15it/s, env_step=3532800, len=8, n/ep=7, n/st=64, player_1/loss=70.841, player_2/loss=518.248, rew=86.29]


Epoch #3450: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3451: 1025it [00:02, 365.02it/s, env_step=3533824, len=29, n/ep=2, n/st=64, player_1/loss=138.396, player_2/loss=431.884, rew=954.00]


Epoch #3451: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3452: 1025it [00:02, 362.44it/s, env_step=3534848, len=29, n/ep=2, n/st=64, player_1/loss=134.662, player_2/loss=532.832, rew=918.00]


Epoch #3452: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3453: 1025it [00:02, 365.02it/s, env_step=3535872, len=22, n/ep=4, n/st=64, player_1/loss=64.067, player_2/loss=1258.765, rew=626.50]


Epoch #3453: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3454: 1025it [00:02, 363.98it/s, env_step=3536896, len=15, n/ep=4, n/st=64, player_1/loss=108.156, player_2/loss=1330.857, rew=249.50]


Epoch #3454: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3455: 1025it [00:02, 362.31it/s, env_step=3537920, len=13, n/ep=5, n/st=64, player_1/loss=130.067, player_2/loss=674.260, rew=203.60]


Epoch #3455: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3456: 1025it [00:02, 365.41it/s, env_step=3538944, len=14, n/ep=4, n/st=64, player_1/loss=318.353, player_2/loss=433.921, rew=232.00]


Epoch #3456: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3457: 1025it [00:02, 366.45it/s, env_step=3539968, len=33, n/ep=1, n/st=64, player_1/loss=605.554, player_2/loss=429.652, rew=1120.00]


Epoch #3457: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3458: 1025it [00:02, 363.46it/s, env_step=3540992, len=27, n/ep=3, n/st=64, player_1/loss=494.177, player_2/loss=760.293, rew=829.33]


Epoch #3458: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3459: 1025it [00:02, 366.06it/s, env_step=3542016, len=15, n/ep=4, n/st=64, player_1/loss=262.975, player_2/loss=662.317, rew=342.50]


Epoch #3459: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3460: 1025it [00:02, 360.14it/s, env_step=3543040, len=30, n/ep=2, n/st=64, player_1/loss=140.873, player_2/loss=728.581, rew=932.00]


Epoch #3460: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3461: 1025it [00:02, 363.08it/s, env_step=3544064, len=20, n/ep=3, n/st=64, player_1/loss=123.349, rew=422.67]  


Epoch #3461: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3462: 1025it [00:02, 360.65it/s, env_step=3545088, len=17, n/ep=5, n/st=64, player_1/loss=209.079, player_2/loss=940.418, rew=378.40]


Epoch #3462: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3463: 1025it [00:02, 364.24it/s, env_step=3546112, len=30, n/ep=2, n/st=64, player_1/loss=217.364, player_2/loss=654.949, rew=932.00]


Epoch #3463: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3464: 1025it [00:02, 364.11it/s, env_step=3547136, len=25, n/ep=3, n/st=64, player_1/loss=128.905, player_2/loss=613.031, rew=790.67]


Epoch #3464: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3465: 1025it [00:02, 362.82it/s, env_step=3548160, len=31, n/ep=3, n/st=64, player_1/loss=563.289, player_2/loss=345.298, rew=1052.00]


Epoch #3465: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3466: 1025it [00:02, 364.63it/s, env_step=3549184, len=26, n/ep=3, n/st=64, player_1/loss=550.542, player_2/loss=318.069, rew=917.33]


Epoch #3466: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3467: 1025it [00:02, 360.40it/s, env_step=3550208, len=24, n/ep=3, n/st=64, player_1/loss=170.212, rew=705.33]  


Epoch #3467: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3468: 1025it [00:02, 365.02it/s, env_step=3551232, len=27, n/ep=2, n/st=64, player_1/loss=168.969, player_2/loss=220.063, rew=824.00]


Epoch #3468: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3469: 1025it [00:02, 364.50it/s, env_step=3552256, len=19, n/ep=3, n/st=64, player_1/loss=179.718, player_2/loss=159.355, rew=380.67]


Epoch #3469: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3470: 1025it [00:02, 363.34it/s, env_step=3553280, len=28, n/ep=1, n/st=64, player_1/loss=157.839, player_2/loss=447.935, rew=810.00]


Epoch #3470: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3471: 1025it [00:02, 363.21it/s, env_step=3554304, len=21, n/ep=4, n/st=64, player_1/loss=272.432, player_2/loss=442.101, rew=506.50]


Epoch #3471: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3472: 1025it [00:02, 364.63it/s, env_step=3555328, len=8, n/ep=9, n/st=64, player_1/loss=258.655, player_2/loss=269.268, rew=73.56]


Epoch #3472: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3473: 1025it [00:02, 363.85it/s, env_step=3556352, len=10, n/ep=6, n/st=64, player_1/loss=174.134, player_2/loss=294.817, rew=129.33]


Epoch #3473: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3474: 1025it [00:02, 363.08it/s, env_step=3557376, len=14, n/ep=5, n/st=64, player_1/loss=292.075, player_2/loss=361.436, rew=216.40]


Epoch #3474: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3475: 1025it [00:02, 362.56it/s, env_step=3558400, len=23, n/ep=2, n/st=64, player_1/loss=307.583, player_2/loss=498.782, rew=551.00]


Epoch #3475: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3476: 1025it [00:02, 363.85it/s, env_step=3559424, len=31, n/ep=2, n/st=64, player_1/loss=459.247, player_2/loss=518.793, rew=994.00]


Epoch #3476: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3477: 1025it [00:02, 365.02it/s, env_step=3560448, len=23, n/ep=3, n/st=64, player_1/loss=334.542, player_2/loss=328.171, rew=564.00]


Epoch #3477: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3478: 1025it [00:02, 362.18it/s, env_step=3561472, len=24, n/ep=3, n/st=64, player_1/loss=189.186, player_2/loss=167.162, rew=616.67]


Epoch #3478: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3479: 1025it [00:02, 364.76it/s, env_step=3562496, len=14, n/ep=4, n/st=64, player_2/loss=369.599, rew=209.00]  


Epoch #3479: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3480: 1025it [00:02, 366.06it/s, env_step=3563520, len=18, n/ep=4, n/st=64, player_1/loss=171.318, player_2/loss=512.839, rew=469.00]


Epoch #3480: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3481: 1025it [00:02, 360.27it/s, env_step=3564544, len=14, n/ep=4, n/st=64, player_1/loss=343.766, player_2/loss=299.601, rew=218.00]


Epoch #3481: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3482: 1025it [00:02, 363.85it/s, env_step=3565568, len=28, n/ep=2, n/st=64, player_1/loss=359.465, player_2/loss=200.672, rew=841.00]


Epoch #3482: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3483: 1025it [00:02, 364.89it/s, env_step=3566592, len=24, n/ep=2, n/st=64, player_1/loss=165.648, player_2/loss=452.731, rew=629.00]


Epoch #3483: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3484: 1025it [00:02, 359.26it/s, env_step=3567616, len=22, n/ep=3, n/st=64, player_1/loss=70.803, player_2/loss=354.550, rew=612.67]


Epoch #3484: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3485: 1025it [00:02, 362.95it/s, env_step=3568640, len=22, n/ep=3, n/st=64, player_1/loss=328.568, player_2/loss=299.478, rew=694.67]


Epoch #3485: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3486: 1025it [00:02, 364.50it/s, env_step=3569664, len=31, n/ep=2, n/st=64, player_1/loss=516.210, player_2/loss=546.761, rew=1147.00]


Epoch #3486: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3487: 1025it [00:02, 361.29it/s, env_step=3570688, len=16, n/ep=4, n/st=64, player_1/loss=594.141, player_2/loss=771.245, rew=374.00]


Epoch #3487: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3488: 1025it [00:02, 363.72it/s, env_step=3571712, len=17, n/ep=4, n/st=64, player_1/loss=475.519, player_2/loss=532.649, rew=437.50]


Epoch #3488: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3489: 1025it [00:02, 360.91it/s, env_step=3572736, len=15, n/ep=4, n/st=64, player_1/loss=430.414, player_2/loss=291.917, rew=359.00]


Epoch #3489: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3490: 1025it [00:02, 364.11it/s, env_step=3573760, len=30, n/ep=2, n/st=64, player_1/loss=213.900, player_2/loss=63.875, rew=989.00]


Epoch #3490: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3491: 1025it [00:02, 366.71it/s, env_step=3574784, len=28, n/ep=2, n/st=64, player_1/loss=121.093, player_2/loss=148.766, rew=826.00]


Epoch #3491: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3492: 1025it [00:02, 363.72it/s, env_step=3575808, len=39, n/ep=2, n/st=64, player_1/loss=200.104, player_2/loss=246.491, rew=1562.00]


Epoch #3492: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3493: 1025it [00:02, 365.28it/s, env_step=3576832, len=7, n/ep=6, n/st=64, player_1/loss=202.026, player_2/loss=170.546, rew=62.33]


Epoch #3493: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3494: 1025it [00:02, 366.06it/s, env_step=3577856, len=30, n/ep=2, n/st=64, player_1/loss=239.187, player_2/loss=78.226, rew=929.00]


Epoch #3494: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3495: 1025it [00:02, 364.50it/s, env_step=3578880, len=38, n/ep=2, n/st=64, player_1/loss=509.704, player_2/loss=66.327, rew=1481.00]


Epoch #3495: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3496: 1025it [00:02, 363.21it/s, env_step=3579904, len=14, n/ep=4, n/st=64, player_1/loss=451.182, player_2/loss=65.230, rew=354.50]


Epoch #3496: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3497: 1025it [00:02, 363.34it/s, env_step=3580928, len=31, n/ep=2, n/st=64, player_1/loss=174.412, player_2/loss=181.775, rew=1006.00]


Epoch #3497: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3498: 1025it [00:02, 366.85it/s, env_step=3581952, len=38, n/ep=2, n/st=64, player_1/loss=41.075, player_2/loss=712.793, rew=1480.00]


Epoch #3498: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3499: 1025it [00:02, 361.16it/s, env_step=3582976, len=32, n/ep=2, n/st=64, player_1/loss=183.634, player_2/loss=627.386, rew=1107.00]


Epoch #3499: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3500: 1025it [00:02, 365.41it/s, env_step=3584000, len=13, n/ep=4, n/st=64, player_1/loss=236.876, player_2/loss=658.120, rew=259.50]


Epoch #3500: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3501: 1025it [00:02, 365.02it/s, env_step=3585024, len=24, n/ep=3, n/st=64, player_1/loss=137.028, player_2/loss=322.896, rew=757.33]


Epoch #3501: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3502: 1025it [00:02, 361.54it/s, env_step=3586048, len=34, n/ep=2, n/st=64, player_1/loss=152.090, player_2/loss=580.824, rew=1294.00]


Epoch #3502: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3503: 1025it [00:02, 364.37it/s, env_step=3587072, len=20, n/ep=3, n/st=64, player_1/loss=240.816, player_2/loss=400.639, rew=518.67]


Epoch #3503: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3504: 1025it [00:02, 365.15it/s, env_step=3588096, len=20, n/ep=3, n/st=64, player_1/loss=416.222, player_2/loss=225.022, rew=454.00]


Epoch #3504: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3505: 1025it [00:02, 360.78it/s, env_step=3589120, len=22, n/ep=3, n/st=64, player_1/loss=444.923, player_2/loss=171.178, rew=536.00]


Epoch #3505: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3506: 1025it [00:02, 366.06it/s, env_step=3590144, len=26, n/ep=2, n/st=64, player_1/loss=324.782, player_2/loss=709.654, rew=799.00]


Epoch #3506: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3507: 1025it [00:02, 362.05it/s, env_step=3591168, len=14, n/ep=4, n/st=64, player_1/loss=224.841, player_2/loss=1133.256, rew=218.00]


Epoch #3507: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3508: 1025it [00:02, 356.14it/s, env_step=3592192, len=34, n/ep=2, n/st=64, player_1/loss=237.597, player_2/loss=934.097, rew=1223.00]


Epoch #3508: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3509: 1025it [00:02, 357.51it/s, env_step=3593216, len=19, n/ep=3, n/st=64, player_1/loss=604.759, player_2/loss=691.208, rew=406.00]


Epoch #3509: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3510: 1025it [00:02, 363.34it/s, env_step=3594240, len=12, n/ep=6, n/st=64, player_1/loss=812.445, player_2/loss=661.577, rew=163.33]


Epoch #3510: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3511: 1025it [00:02, 365.80it/s, env_step=3595264, len=38, n/ep=2, n/st=64, player_1/loss=394.502, player_2/loss=517.632, rew=1519.00]


Epoch #3511: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3512: 1025it [00:02, 360.40it/s, env_step=3596288, len=30, n/ep=2, n/st=64, player_1/loss=508.396, player_2/loss=423.325, rew=977.00]


Epoch #3512: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3513: 1025it [00:02, 363.98it/s, env_step=3597312, len=25, n/ep=3, n/st=64, player_1/loss=473.392, player_2/loss=224.580, rew=748.00]


Epoch #3513: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3514: 1025it [00:02, 361.80it/s, env_step=3598336, len=31, n/ep=2, n/st=64, player_1/loss=483.703, player_2/loss=342.568, rew=1022.00]


Epoch #3514: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3515: 1025it [00:02, 364.50it/s, env_step=3599360, len=23, n/ep=3, n/st=64, player_1/loss=767.810, player_2/loss=327.908, rew=635.33]


Epoch #3515: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3516: 1025it [00:02, 363.59it/s, env_step=3600384, len=38, n/ep=2, n/st=64, player_1/loss=497.756, player_2/loss=407.011, rew=1481.00]


Epoch #3516: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3517: 1025it [00:02, 360.14it/s, env_step=3601408, len=22, n/ep=3, n/st=64, player_1/loss=224.370, player_2/loss=473.181, rew=522.67]


Epoch #3517: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3518: 1025it [00:02, 365.15it/s, env_step=3602432, len=23, n/ep=3, n/st=64, player_1/loss=162.636, player_2/loss=164.539, rew=634.67]


Epoch #3518: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3519: 1025it [00:02, 363.46it/s, env_step=3603456, len=32, n/ep=2, n/st=64, player_1/loss=222.039, player_2/loss=324.165, rew=1107.00]


Epoch #3519: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3520: 1025it [00:02, 362.69it/s, env_step=3604480, len=16, n/ep=4, n/st=64, player_1/loss=253.974, player_2/loss=512.800, rew=287.00]


Epoch #3520: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3521: 1025it [00:02, 365.93it/s, env_step=3605504, len=28, n/ep=2, n/st=64, player_1/loss=294.071, player_2/loss=448.433, rew=895.00]


Epoch #3521: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3522: 1025it [00:02, 365.28it/s, env_step=3606528, len=12, n/ep=5, n/st=64, player_1/loss=321.250, player_2/loss=535.052, rew=179.60]


Epoch #3522: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3523: 1025it [00:02, 361.29it/s, env_step=3607552, len=34, n/ep=2, n/st=64, player_1/loss=277.998, player_2/loss=517.136, rew=1229.00]


Epoch #3523: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3524: 1025it [00:02, 363.85it/s, env_step=3608576, len=13, n/ep=5, n/st=64, player_1/loss=363.650, player_2/loss=483.334, rew=210.00]


Epoch #3524: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3525: 1025it [00:02, 364.76it/s, env_step=3609600, len=22, n/ep=3, n/st=64, player_1/loss=346.748, player_2/loss=846.721, rew=522.67]


Epoch #3525: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3526: 1025it [00:02, 362.05it/s, env_step=3610624, len=24, n/ep=2, n/st=64, player_1/loss=306.762, player_2/loss=736.043, rew=623.00]


Epoch #3526: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3527: 1025it [00:02, 367.37it/s, env_step=3611648, len=22, n/ep=3, n/st=64, player_2/loss=99.653, rew=525.33]   


Epoch #3527: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3528: 1025it [00:02, 366.58it/s, env_step=3612672, len=22, n/ep=3, n/st=64, player_1/loss=352.622, player_2/loss=75.754, rew=506.67]


Epoch #3528: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3529: 1025it [00:02, 362.82it/s, env_step=3613696, len=22, n/ep=3, n/st=64, player_1/loss=250.296, player_2/loss=678.162, rew=668.67]


Epoch #3529: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3530: 1025it [00:02, 366.45it/s, env_step=3614720, len=21, n/ep=2, n/st=64, player_1/loss=324.884, player_2/loss=817.535, rew=482.00]


Epoch #3530: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3531: 1025it [00:02, 360.27it/s, env_step=3615744, len=26, n/ep=3, n/st=64, player_1/loss=531.520, player_2/loss=358.237, rew=804.67]


Epoch #3531: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3532: 1025it [00:02, 362.44it/s, env_step=3616768, len=30, n/ep=2, n/st=64, player_1/loss=399.065, player_2/loss=473.490, rew=929.00]


Epoch #3532: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3533: 1025it [00:02, 364.50it/s, env_step=3617792, len=34, n/ep=2, n/st=64, player_1/loss=245.980, player_2/loss=591.400, rew=1204.00]


Epoch #3533: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3534: 1025it [00:02, 359.51it/s, env_step=3618816, len=19, n/ep=3, n/st=64, player_1/loss=403.903, player_2/loss=586.776, rew=554.67]


Epoch #3534: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3535: 1025it [00:02, 365.93it/s, env_step=3619840, len=8, n/ep=7, n/st=64, player_1/loss=310.835, player_2/loss=1095.103, rew=78.29]


Epoch #3535: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3536: 1025it [00:02, 364.37it/s, env_step=3620864, len=15, n/ep=4, n/st=64, player_1/loss=150.070, player_2/loss=1194.262, rew=244.00]


Epoch #3536: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3537: 1025it [00:02, 361.54it/s, env_step=3621888, len=29, n/ep=3, n/st=64, player_1/loss=264.162, player_2/loss=1362.150, rew=876.67]


Epoch #3537: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3538: 1025it [00:02, 365.80it/s, env_step=3622912, len=33, n/ep=2, n/st=64, player_1/loss=339.907, player_2/loss=1084.398, rew=1166.00]


Epoch #3538: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3539: 1025it [00:02, 362.82it/s, env_step=3623936, len=17, n/ep=4, n/st=64, player_1/loss=195.699, player_2/loss=924.836, rew=310.50]


Epoch #3539: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3540: 1025it [00:02, 361.16it/s, env_step=3624960, len=12, n/ep=6, n/st=64, player_1/loss=163.529, player_2/loss=564.548, rew=181.67]


Epoch #3540: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3541: 1025it [00:02, 363.34it/s, env_step=3625984, len=15, n/ep=5, n/st=64, player_1/loss=118.885, player_2/loss=471.785, rew=238.40]


Epoch #3541: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3542: 1025it [00:02, 366.19it/s, env_step=3627008, len=37, n/ep=2, n/st=64, player_1/loss=142.298, player_2/loss=214.153, rew=1444.00]


Epoch #3542: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3543: 1025it [00:02, 363.08it/s, env_step=3628032, len=15, n/ep=3, n/st=64, player_1/loss=153.423, player_2/loss=94.189, rew=240.67]


Epoch #3543: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3544: 1025it [00:02, 366.06it/s, env_step=3629056, len=26, n/ep=3, n/st=64, player_1/loss=306.417, player_2/loss=202.225, rew=805.33]


Epoch #3544: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3545: 1025it [00:02, 363.08it/s, env_step=3630080, len=29, n/ep=2, n/st=64, player_1/loss=736.325, player_2/loss=415.111, rew=970.00]


Epoch #3545: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3546: 1025it [00:02, 361.80it/s, env_step=3631104, len=20, n/ep=3, n/st=64, player_1/loss=955.625, player_2/loss=787.902, rew=432.67]


Epoch #3546: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3547: 1025it [00:02, 364.76it/s, env_step=3632128, len=22, n/ep=3, n/st=64, player_1/loss=666.290, player_2/loss=661.947, rew=504.00]


Epoch #3547: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3548: 1025it [00:02, 364.50it/s, env_step=3633152, len=28, n/ep=3, n/st=64, player_1/loss=289.027, player_2/loss=421.388, rew=814.67]


Epoch #3548: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3549: 1025it [00:02, 362.18it/s, env_step=3634176, len=26, n/ep=2, n/st=64, player_1/loss=187.520, player_2/loss=560.246, rew=701.00]


Epoch #3549: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3550: 1025it [00:02, 365.02it/s, env_step=3635200, len=36, n/ep=2, n/st=64, player_1/loss=133.350, player_2/loss=646.379, rew=1334.00]


Epoch #3550: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3551: 1025it [00:02, 366.06it/s, env_step=3636224, len=30, n/ep=2, n/st=64, player_1/loss=55.344, player_2/loss=471.856, rew=1015.00]


Epoch #3551: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3552: 1025it [00:02, 360.14it/s, env_step=3637248, len=34, n/ep=2, n/st=64, player_1/loss=163.793, player_2/loss=130.929, rew=1223.00]


Epoch #3552: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3553: 1025it [00:02, 364.24it/s, env_step=3638272, len=26, n/ep=2, n/st=64, player_1/loss=191.656, player_2/loss=166.180, rew=757.00]


Epoch #3553: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3554: 1025it [00:02, 364.11it/s, env_step=3639296, len=14, n/ep=6, n/st=64, player_1/loss=249.198, player_2/loss=225.270, rew=284.67]


Epoch #3554: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3555: 1025it [00:02, 361.16it/s, env_step=3640320, len=15, n/ep=5, n/st=64, player_1/loss=521.226, player_2/loss=200.599, rew=239.60]


Epoch #3555: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3556: 1025it [00:02, 364.76it/s, env_step=3641344, len=37, n/ep=2, n/st=64, player_1/loss=337.042, player_2/loss=176.670, rew=1404.00]


Epoch #3556: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3557: 1025it [00:02, 365.54it/s, env_step=3642368, len=36, n/ep=2, n/st=64, player_1/loss=108.479, player_2/loss=158.069, rew=1367.00]


Epoch #3557: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3558: 1025it [00:02, 361.16it/s, env_step=3643392, len=28, n/ep=3, n/st=64, player_1/loss=498.356, player_2/loss=534.149, rew=849.33]


Epoch #3558: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3559: 1025it [00:02, 366.32it/s, env_step=3644416, len=39, n/ep=2, n/st=64, player_1/loss=540.273, player_2/loss=622.206, rew=1559.00]


Epoch #3559: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3560: 1025it [00:02, 361.54it/s, env_step=3645440, len=8, n/ep=7, n/st=64, player_1/loss=242.558, player_2/loss=165.962, rew=79.14]


Epoch #3560: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3561: 1025it [00:02, 361.16it/s, env_step=3646464, len=33, n/ep=2, n/st=64, player_1/loss=411.072, player_2/loss=284.295, rew=1241.00]


Epoch #3561: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3562: 1025it [00:02, 364.11it/s, env_step=3647488, len=30, n/ep=3, n/st=64, player_1/loss=579.131, player_2/loss=662.886, rew=1072.67]


Epoch #3562: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3563: 1025it [00:02, 365.54it/s, env_step=3648512, len=19, n/ep=4, n/st=64, player_1/loss=363.507, player_2/loss=860.747, rew=407.50]


Epoch #3563: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3564: 1025it [00:02, 363.72it/s, env_step=3649536, len=31, n/ep=2, n/st=64, player_1/loss=474.078, player_2/loss=1090.798, rew=1052.00]


Epoch #3564: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3565: 1025it [00:02, 368.03it/s, env_step=3650560, len=25, n/ep=3, n/st=64, player_1/loss=550.727, player_2/loss=1028.183, rew=686.00]


Epoch #3565: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3566: 1025it [00:02, 361.54it/s, env_step=3651584, len=11, n/ep=5, n/st=64, player_1/loss=628.491, player_2/loss=681.992, rew=149.60]


Epoch #3566: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3567: 1025it [00:02, 365.28it/s, env_step=3652608, len=36, n/ep=2, n/st=64, player_1/loss=489.544, rew=1373.00] 


Epoch #3567: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3568: 1025it [00:02, 366.32it/s, env_step=3653632, len=22, n/ep=3, n/st=64, player_1/loss=222.202, player_2/loss=109.677, rew=592.67]


Epoch #3568: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3569: 1025it [00:02, 362.18it/s, env_step=3654656, len=34, n/ep=2, n/st=64, player_1/loss=243.172, player_2/loss=361.136, rew=1243.00]


Epoch #3569: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3570: 1025it [00:02, 365.67it/s, env_step=3655680, len=29, n/ep=2, n/st=64, player_1/loss=299.215, player_2/loss=743.755, rew=1054.00]


Epoch #3570: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3571: 1025it [00:02, 365.54it/s, env_step=3656704, len=22, n/ep=2, n/st=64, player_1/loss=375.239, player_2/loss=970.371, rew=557.00]


Epoch #3571: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3572: 1025it [00:02, 361.80it/s, env_step=3657728, len=13, n/ep=3, n/st=64, player_1/loss=567.985, player_2/loss=810.622, rew=198.67]


Epoch #3572: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3573: 1025it [00:02, 364.24it/s, env_step=3658752, len=35, n/ep=2, n/st=64, player_1/loss=456.903, player_2/loss=596.804, rew=1294.00]


Epoch #3573: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3574: 1025it [00:02, 358.88it/s, env_step=3659776, len=30, n/ep=2, n/st=64, player_1/loss=316.819, player_2/loss=603.458, rew=1015.00]


Epoch #3574: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3575: 1025it [00:02, 365.28it/s, env_step=3660800, len=33, n/ep=2, n/st=64, player_1/loss=287.447, player_2/loss=618.948, rew=1136.00]


Epoch #3575: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3576: 1025it [00:02, 364.37it/s, env_step=3661824, len=15, n/ep=6, n/st=64, player_1/loss=193.787, player_2/loss=963.327, rew=370.67]


Epoch #3576: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3577: 1025it [00:02, 359.39it/s, env_step=3662848, len=21, n/ep=3, n/st=64, player_1/loss=436.583, player_2/loss=1039.616, rew=585.33]


Epoch #3577: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3578: 1025it [00:02, 365.28it/s, env_step=3663872, len=25, n/ep=3, n/st=64, player_1/loss=698.513, player_2/loss=1043.412, rew=730.67]


Epoch #3578: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3579: 1025it [00:02, 362.05it/s, env_step=3664896, len=33, n/ep=2, n/st=64, player_1/loss=481.330, player_2/loss=976.931, rew=1156.00]


Epoch #3579: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3580: 1025it [00:02, 363.21it/s, env_step=3665920, len=25, n/ep=3, n/st=64, player_2/loss=234.506, rew=698.67]  


Epoch #3580: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3581: 1025it [00:02, 364.24it/s, env_step=3666944, len=33, n/ep=2, n/st=64, player_1/loss=391.450, player_2/loss=412.878, rew=1124.00]


Epoch #3581: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3582: 1025it [00:02, 364.37it/s, env_step=3667968, len=28, n/ep=2, n/st=64, player_1/loss=766.127, player_2/loss=603.624, rew=841.00]


Epoch #3582: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3583: 1025it [00:02, 362.82it/s, env_step=3668992, len=26, n/ep=2, n/st=64, player_1/loss=719.276, player_2/loss=666.390, rew=869.00]


Epoch #3583: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3584: 1025it [00:02, 363.08it/s, env_step=3670016, len=21, n/ep=3, n/st=64, player_1/loss=278.204, player_2/loss=452.392, rew=553.33]


Epoch #3584: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3585: 1025it [00:02, 360.91it/s, env_step=3671040, len=23, n/ep=2, n/st=64, player_1/loss=408.658, rew=566.00]  


Epoch #3585: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3586: 1025it [00:02, 365.54it/s, env_step=3672064, len=31, n/ep=2, n/st=64, player_1/loss=547.965, player_2/loss=304.197, rew=1071.00]


Epoch #3586: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3587: 1025it [00:02, 365.02it/s, env_step=3673088, len=32, n/ep=2, n/st=64, player_1/loss=303.019, player_2/loss=575.512, rew=1063.00]


Epoch #3587: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3588: 1025it [00:02, 359.64it/s, env_step=3674112, len=30, n/ep=3, n/st=64, player_1/loss=197.907, player_2/loss=612.058, rew=946.67]


Epoch #3588: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3589: 1025it [00:02, 363.21it/s, env_step=3675136, len=34, n/ep=2, n/st=64, player_1/loss=318.449, player_2/loss=555.637, rew=1192.00]


Epoch #3589: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3590: 1025it [00:02, 360.52it/s, env_step=3676160, len=13, n/ep=4, n/st=64, player_1/loss=249.979, player_2/loss=514.107, rew=202.50]


Epoch #3590: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3591: 1025it [00:02, 363.21it/s, env_step=3677184, len=31, n/ep=3, n/st=64, player_1/loss=309.272, player_2/loss=151.362, rew=1038.67]


Epoch #3591: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3592: 1025it [00:02, 361.03it/s, env_step=3678208, len=25, n/ep=2, n/st=64, player_1/loss=255.909, player_2/loss=165.761, rew=748.00]


Epoch #3592: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3593: 1025it [00:02, 364.24it/s, env_step=3679232, len=40, n/ep=2, n/st=64, player_1/loss=215.148, player_2/loss=152.332, rew=1696.00]


Epoch #3593: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3594: 1025it [00:02, 361.29it/s, env_step=3680256, len=39, n/ep=2, n/st=64, player_1/loss=360.729, player_2/loss=166.636, rew=1600.00]


Epoch #3594: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3595: 1025it [00:02, 363.21it/s, env_step=3681280, len=33, n/ep=2, n/st=64, player_1/loss=244.731, player_2/loss=141.918, rew=1136.00]


Epoch #3595: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3596: 1025it [00:02, 364.11it/s, env_step=3682304, len=19, n/ep=3, n/st=64, player_1/loss=403.266, player_2/loss=115.772, rew=452.67]


Epoch #3596: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3597: 1025it [00:02, 360.65it/s, env_step=3683328, len=21, n/ep=3, n/st=64, player_1/loss=767.321, player_2/loss=102.583, rew=503.33]


Epoch #3597: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3598: 1025it [00:02, 364.11it/s, env_step=3684352, len=21, n/ep=2, n/st=64, player_1/loss=570.719, player_2/loss=305.306, rew=482.00]


Epoch #3598: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3599: 1025it [00:02, 365.67it/s, env_step=3685376, len=33, n/ep=2, n/st=64, player_1/loss=356.634, player_2/loss=594.661, rew=1174.00]


Epoch #3599: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3600: 1025it [00:02, 362.05it/s, env_step=3686400, len=33, n/ep=2, n/st=64, player_1/loss=493.860, player_2/loss=392.339, rew=1145.00]


Epoch #3600: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3601: 1025it [00:02, 363.85it/s, env_step=3687424, len=25, n/ep=3, n/st=64, player_1/loss=651.616, player_2/loss=138.019, rew=741.33]


Epoch #3601: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3602: 1025it [00:02, 363.85it/s, env_step=3688448, len=36, n/ep=2, n/st=64, player_1/loss=795.237, player_2/loss=893.246, rew=1381.00]


Epoch #3602: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3603: 1025it [00:02, 359.64it/s, env_step=3689472, len=15, n/ep=4, n/st=64, player_1/loss=514.342, player_2/loss=1158.445, rew=241.00]


Epoch #3603: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3604: 1025it [00:02, 365.41it/s, env_step=3690496, len=26, n/ep=2, n/st=64, player_1/loss=554.419, player_2/loss=657.112, rew=837.00]


Epoch #3604: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3605: 1025it [00:02, 364.63it/s, env_step=3691520, len=23, n/ep=3, n/st=64, player_1/loss=454.320, player_2/loss=708.038, rew=650.00]


Epoch #3605: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3606: 1025it [00:02, 364.63it/s, env_step=3692544, len=37, n/ep=1, n/st=64, player_1/loss=409.155, player_2/loss=318.818, rew=1404.00]


Epoch #3606: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3607: 1025it [00:02, 364.24it/s, env_step=3693568, len=26, n/ep=3, n/st=64, player_1/loss=479.008, player_2/loss=301.103, rew=777.33]


Epoch #3607: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3608: 1025it [00:02, 362.69it/s, env_step=3694592, len=16, n/ep=3, n/st=64, player_1/loss=292.331, player_2/loss=708.895, rew=294.00]


Epoch #3608: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3609: 1025it [00:02, 364.89it/s, env_step=3695616, len=24, n/ep=3, n/st=64, player_1/loss=389.646, player_2/loss=649.176, rew=618.67]


Epoch #3609: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3610: 1025it [00:02, 361.54it/s, env_step=3696640, len=37, n/ep=1, n/st=64, player_1/loss=878.134, player_2/loss=459.609, rew=1404.00]


Epoch #3610: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3611: 1025it [00:02, 365.15it/s, env_step=3697664, len=27, n/ep=3, n/st=64, player_1/loss=946.480, player_2/loss=429.940, rew=780.00]


Epoch #3611: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3612: 1025it [00:02, 356.14it/s, env_step=3698688, len=16, n/ep=4, n/st=64, player_1/loss=801.341, player_2/loss=772.141, rew=286.00]


Epoch #3612: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3613: 1025it [00:02, 364.63it/s, env_step=3699712, len=8, n/ep=7, n/st=64, player_1/loss=397.928, player_2/loss=830.428, rew=90.57]


Epoch #3613: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3614: 1025it [00:02, 363.98it/s, env_step=3700736, len=34, n/ep=1, n/st=64, player_1/loss=227.917, player_2/loss=1133.116, rew=1188.00]


Epoch #3614: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3615: 1025it [00:02, 360.78it/s, env_step=3701760, len=30, n/ep=3, n/st=64, player_1/loss=274.498, player_2/loss=778.847, rew=1099.33]


Epoch #3615: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3616: 1025it [00:02, 365.54it/s, env_step=3702784, len=34, n/ep=2, n/st=64, player_1/loss=254.528, player_2/loss=579.229, rew=1229.00]


Epoch #3616: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3617: 1025it [00:02, 367.24it/s, env_step=3703808, len=20, n/ep=3, n/st=64, player_1/loss=350.892, player_2/loss=364.362, rew=434.67]


Epoch #3617: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3618: 1025it [00:02, 362.05it/s, env_step=3704832, len=37, n/ep=2, n/st=64, player_1/loss=553.313, player_2/loss=603.484, rew=1442.00]


Epoch #3618: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3619: 1025it [00:02, 364.50it/s, env_step=3705856, len=15, n/ep=4, n/st=64, player_1/loss=347.746, player_2/loss=507.509, rew=246.00]


Epoch #3619: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3620: 1025it [00:02, 364.63it/s, env_step=3706880, len=36, n/ep=2, n/st=64, player_1/loss=229.431, player_2/loss=385.545, rew=1339.00]


Epoch #3620: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3621: 1025it [00:02, 363.85it/s, env_step=3707904, len=38, n/ep=1, n/st=64, player_1/loss=430.093, player_2/loss=253.401, rew=1480.00]


Epoch #3621: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3622: 1025it [00:02, 363.08it/s, env_step=3708928, len=17, n/ep=4, n/st=64, player_1/loss=691.185, player_2/loss=98.521, rew=351.50]


Epoch #3622: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3623: 1025it [00:02, 359.39it/s, env_step=3709952, len=38, n/ep=2, n/st=64, player_1/loss=912.446, player_2/loss=258.795, rew=1481.00]


Epoch #3623: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3624: 1025it [00:02, 366.32it/s, env_step=3710976, len=23, n/ep=3, n/st=64, player_1/loss=603.533, player_2/loss=547.979, rew=591.33]


Epoch #3624: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3625: 1025it [00:02, 363.72it/s, env_step=3712000, len=38, n/ep=2, n/st=64, player_1/loss=442.779, player_2/loss=413.663, rew=1481.00]


Epoch #3625: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3626: 1025it [00:02, 356.76it/s, env_step=3713024, len=30, n/ep=2, n/st=64, player_1/loss=816.795, player_2/loss=131.647, rew=977.00]


Epoch #3626: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3627: 1025it [00:02, 365.15it/s, env_step=3714048, len=30, n/ep=2, n/st=64, player_1/loss=1276.815, player_2/loss=187.791, rew=961.00]


Epoch #3627: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3628: 1025it [00:02, 362.82it/s, env_step=3715072, len=31, n/ep=2, n/st=64, player_1/loss=637.920, player_2/loss=240.427, rew=994.00]


Epoch #3628: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3629: 1025it [00:02, 364.11it/s, env_step=3716096, len=26, n/ep=3, n/st=64, player_1/loss=47.768, player_2/loss=200.742, rew=806.67]


Epoch #3629: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3630: 1025it [00:02, 362.82it/s, env_step=3717120, len=32, n/ep=2, n/st=64, player_1/loss=118.273, player_2/loss=371.900, rew=1089.00]


Epoch #3630: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3631: 1025it [00:02, 363.21it/s, env_step=3718144, len=25, n/ep=2, n/st=64, player_1/loss=157.797, player_2/loss=540.485, rew=657.00]


Epoch #3631: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3632: 1025it [00:02, 363.85it/s, env_step=3719168, len=35, n/ep=2, n/st=64, player_1/loss=88.104, player_2/loss=1500.700, rew=1283.00]


Epoch #3632: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3633: 1025it [00:02, 359.26it/s, env_step=3720192, len=27, n/ep=3, n/st=64, player_1/loss=362.272, player_2/loss=1436.555, rew=829.33]


Epoch #3633: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3634: 1025it [00:02, 362.69it/s, env_step=3721216, len=20, n/ep=3, n/st=64, player_1/loss=412.761, player_2/loss=566.695, rew=448.67]


Epoch #3634: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3635: 1025it [00:02, 364.63it/s, env_step=3722240, len=31, n/ep=3, n/st=64, player_1/loss=227.073, player_2/loss=598.300, rew=1039.33]


Epoch #3635: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3636: 1025it [00:02, 360.27it/s, env_step=3723264, len=15, n/ep=4, n/st=64, player_1/loss=240.660, player_2/loss=511.397, rew=248.50]


Epoch #3636: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3637: 1025it [00:02, 362.95it/s, env_step=3724288, len=15, n/ep=4, n/st=64, player_1/loss=337.446, player_2/loss=853.816, rew=259.00]


Epoch #3637: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3638: 1025it [00:02, 366.06it/s, env_step=3725312, len=28, n/ep=2, n/st=64, player_1/loss=691.830, player_2/loss=1010.636, rew=971.00]


Epoch #3638: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3639: 1025it [00:02, 362.18it/s, env_step=3726336, len=28, n/ep=2, n/st=64, player_1/loss=632.779, player_2/loss=568.660, rew=881.00]


Epoch #3639: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3640: 1025it [00:02, 363.59it/s, env_step=3727360, len=38, n/ep=2, n/st=64, player_1/loss=267.909, player_2/loss=380.032, rew=1519.00]


Epoch #3640: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3641: 1025it [00:02, 360.02it/s, env_step=3728384, len=25, n/ep=3, n/st=64, player_1/loss=434.768, player_2/loss=247.936, rew=748.00]


Epoch #3641: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3642: 1025it [00:02, 365.67it/s, env_step=3729408, len=16, n/ep=3, n/st=64, player_1/loss=692.505, player_2/loss=848.615, rew=300.67]


Epoch #3642: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3643: 1025it [00:02, 365.41it/s, env_step=3730432, len=20, n/ep=3, n/st=64, player_1/loss=386.911, player_2/loss=1228.576, rew=480.00]


Epoch #3643: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3644: 1025it [00:02, 361.03it/s, env_step=3731456, len=29, n/ep=2, n/st=64, player_1/loss=316.753, player_2/loss=1686.529, rew=988.00]


Epoch #3644: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3645: 1025it [00:02, 364.37it/s, env_step=3732480, len=30, n/ep=3, n/st=64, player_1/loss=334.328, player_2/loss=1497.154, rew=954.00]


Epoch #3645: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3646: 1025it [00:02, 358.51it/s, env_step=3733504, len=14, n/ep=5, n/st=64, player_1/loss=166.462, player_2/loss=438.781, rew=208.40]


Epoch #3646: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3647: 1025it [00:02, 361.41it/s, env_step=3734528, len=14, n/ep=4, n/st=64, player_1/loss=427.949, player_2/loss=161.438, rew=234.00]


Epoch #3647: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3648: 1025it [00:02, 366.45it/s, env_step=3735552, len=26, n/ep=3, n/st=64, player_1/loss=504.469, player_2/loss=471.314, rew=756.00]


Epoch #3648: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3649: 1025it [00:02, 360.40it/s, env_step=3736576, len=29, n/ep=2, n/st=64, player_1/loss=231.708, player_2/loss=724.212, rew=932.00]


Epoch #3649: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3650: 1025it [00:02, 362.05it/s, env_step=3737600, len=20, n/ep=3, n/st=64, player_1/loss=43.237, player_2/loss=441.029, rew=446.00]


Epoch #3650: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3651: 1025it [00:02, 365.93it/s, env_step=3738624, len=13, n/ep=6, n/st=64, player_1/loss=123.177, player_2/loss=433.655, rew=296.33]


Epoch #3651: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3652: 1025it [00:02, 360.14it/s, env_step=3739648, len=37, n/ep=2, n/st=64, player_1/loss=267.819, player_2/loss=244.122, rew=1404.00]


Epoch #3652: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3653: 1025it [00:02, 363.34it/s, env_step=3740672, len=31, n/ep=2, n/st=64, player_1/loss=369.060, player_2/loss=250.680, rew=999.00]


Epoch #3653: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3654: 1025it [00:02, 361.80it/s, env_step=3741696, len=31, n/ep=2, n/st=64, player_1/loss=553.679, player_2/loss=84.532, rew=1039.00]


Epoch #3654: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3655: 1025it [00:02, 361.92it/s, env_step=3742720, len=22, n/ep=3, n/st=64, player_1/loss=1165.270, player_2/loss=86.292, rew=508.67]


Epoch #3655: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3656: 1025it [00:02, 365.80it/s, env_step=3743744, len=38, n/ep=2, n/st=64, player_1/loss=853.203, player_2/loss=94.040, rew=1519.00]


Epoch #3656: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3657: 1025it [00:02, 363.46it/s, env_step=3744768, len=33, n/ep=2, n/st=64, player_1/loss=253.178, player_2/loss=582.808, rew=1154.00]


Epoch #3657: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3658: 1025it [00:02, 362.69it/s, env_step=3745792, len=30, n/ep=2, n/st=64, player_1/loss=319.774, player_2/loss=504.307, rew=937.00]


Epoch #3658: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3659: 1025it [00:02, 364.50it/s, env_step=3746816, len=26, n/ep=3, n/st=64, player_1/loss=386.312, player_2/loss=220.588, rew=812.00]


Epoch #3659: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3660: 1025it [00:02, 365.15it/s, env_step=3747840, len=19, n/ep=3, n/st=64, player_1/loss=340.844, player_2/loss=480.500, rew=448.00]


Epoch #3660: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3661: 1025it [00:02, 359.51it/s, env_step=3748864, len=30, n/ep=2, n/st=64, player_1/loss=168.692, player_2/loss=433.633, rew=977.00]


Epoch #3661: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3662: 1025it [00:02, 363.46it/s, env_step=3749888, len=27, n/ep=3, n/st=64, player_1/loss=599.233, player_2/loss=219.129, rew=984.67]


Epoch #3662: test_reward: 1804.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3663: 1025it [00:02, 365.41it/s, env_step=3750912, len=25, n/ep=2, n/st=64, player_1/loss=572.069, rew=676.00]  


Epoch #3663: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3664: 1025it [00:02, 355.40it/s, env_step=3751936, len=22, n/ep=3, n/st=64, player_1/loss=241.672, player_2/loss=411.757, rew=622.67]


Epoch #3664: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3665: 1025it [00:02, 363.85it/s, env_step=3752960, len=21, n/ep=3, n/st=64, player_1/loss=360.322, player_2/loss=655.032, rew=475.33]


Epoch #3665: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3666: 1025it [00:02, 365.80it/s, env_step=3753984, len=33, n/ep=2, n/st=64, player_1/loss=527.612, player_2/loss=739.559, rew=1145.00]


Epoch #3666: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3667: 1025it [00:02, 360.52it/s, env_step=3755008, len=8, n/ep=8, n/st=64, player_1/loss=275.893, player_2/loss=1278.579, rew=82.75]


Epoch #3667: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3668: 1025it [00:02, 363.72it/s, env_step=3756032, len=8, n/ep=8, n/st=64, player_1/loss=196.439, player_2/loss=896.116, rew=78.75]


Epoch #3668: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3669: 1025it [00:02, 360.78it/s, env_step=3757056, len=23, n/ep=2, n/st=64, player_1/loss=286.997, player_2/loss=767.330, rew=574.00]


Epoch #3669: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3670: 1025it [00:02, 363.46it/s, env_step=3758080, len=8, n/ep=6, n/st=64, player_1/loss=547.063, player_2/loss=260.375, rew=71.00]


Epoch #3670: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3671: 1025it [00:02, 361.67it/s, env_step=3759104, len=41, n/ep=1, n/st=64, player_1/loss=318.634, player_2/loss=287.971, rew=1720.00]


Epoch #3671: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3672: 1025it [00:02, 362.05it/s, env_step=3760128, len=38, n/ep=2, n/st=64, player_1/loss=275.678, player_2/loss=156.332, rew=1480.00]


Epoch #3672: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3673: 1025it [00:02, 364.50it/s, env_step=3761152, len=21, n/ep=3, n/st=64, player_1/loss=557.006, player_2/loss=167.394, rew=476.00]


Epoch #3673: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3674: 1025it [00:02, 360.52it/s, env_step=3762176, len=31, n/ep=2, n/st=64, player_1/loss=597.673, player_2/loss=506.361, rew=1039.00]


Epoch #3674: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3675: 1025it [00:02, 360.91it/s, env_step=3763200, len=38, n/ep=2, n/st=64, player_1/loss=353.083, player_2/loss=432.168, rew=1481.00]


Epoch #3675: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3676: 1025it [00:02, 365.54it/s, env_step=3764224, len=39, n/ep=2, n/st=64, player_1/loss=611.994, player_2/loss=435.906, rew=1582.00]


Epoch #3676: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3677: 1025it [00:02, 362.18it/s, env_step=3765248, len=33, n/ep=2, n/st=64, player_2/loss=529.473, rew=1166.00] 


Epoch #3677: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3678: 1025it [00:02, 361.92it/s, env_step=3766272, len=30, n/ep=3, n/st=64, player_1/loss=203.185, player_2/loss=979.017, rew=1003.33]


Epoch #3678: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3679: 1025it [00:02, 364.24it/s, env_step=3767296, len=39, n/ep=1, n/st=64, player_1/loss=143.103, player_2/loss=625.929, rew=1558.00]


Epoch #3679: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3680: 1025it [00:02, 362.18it/s, env_step=3768320, len=29, n/ep=2, n/st=64, player_1/loss=159.359, player_2/loss=337.356, rew=900.00]


Epoch #3680: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3681: 1025it [00:02, 364.50it/s, env_step=3769344, len=19, n/ep=4, n/st=64, player_1/loss=129.832, player_2/loss=102.481, rew=419.50]


Epoch #3681: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3682: 1025it [00:02, 361.92it/s, env_step=3770368, len=35, n/ep=2, n/st=64, player_1/loss=455.643, player_2/loss=64.420, rew=1267.00]


Epoch #3682: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3683: 1025it [00:02, 359.39it/s, env_step=3771392, len=17, n/ep=4, n/st=64, player_1/loss=658.371, player_2/loss=332.878, rew=315.00]


Epoch #3683: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3684: 1025it [00:02, 364.11it/s, env_step=3772416, len=22, n/ep=3, n/st=64, player_1/loss=452.995, player_2/loss=588.189, rew=526.00]


Epoch #3684: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3685: 1025it [00:02, 363.21it/s, env_step=3773440, len=40, n/ep=1, n/st=64, player_1/loss=515.804, player_2/loss=557.163, rew=1638.00]


Epoch #3685: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3686: 1025it [00:02, 360.78it/s, env_step=3774464, len=34, n/ep=2, n/st=64, player_1/loss=528.106, player_2/loss=270.104, rew=1235.00]


Epoch #3686: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3687: 1025it [00:02, 365.02it/s, env_step=3775488, len=39, n/ep=2, n/st=64, player_1/loss=324.103, player_2/loss=241.588, rew=1598.00]


Epoch #3687: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3688: 1025it [00:02, 365.93it/s, env_step=3776512, len=34, n/ep=2, n/st=64, player_1/loss=228.366, player_2/loss=538.114, rew=1213.00]


Epoch #3688: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3689: 1025it [00:02, 359.77it/s, env_step=3777536, len=30, n/ep=3, n/st=64, player_1/loss=357.739, player_2/loss=523.507, rew=1017.33]


Epoch #3689: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3690: 1025it [00:02, 363.98it/s, env_step=3778560, len=28, n/ep=2, n/st=64, player_1/loss=529.421, player_2/loss=312.795, rew=839.00]


Epoch #3690: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3691: 1025it [00:02, 365.67it/s, env_step=3779584, len=28, n/ep=2, n/st=64, player_1/loss=667.764, player_2/loss=293.112, rew=810.00]


Epoch #3691: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3692: 1025it [00:02, 360.65it/s, env_step=3780608, len=15, n/ep=4, n/st=64, player_1/loss=600.734, player_2/loss=433.130, rew=262.50]


Epoch #3692: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3693: 1025it [00:02, 365.28it/s, env_step=3781632, len=21, n/ep=2, n/st=64, player_1/loss=319.131, player_2/loss=366.836, rew=476.00]


Epoch #3693: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3694: 1025it [00:02, 360.40it/s, env_step=3782656, len=28, n/ep=1, n/st=64, player_1/loss=127.860, player_2/loss=222.498, rew=810.00]


Epoch #3694: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3695: 1025it [00:02, 363.59it/s, env_step=3783680, len=39, n/ep=1, n/st=64, player_1/loss=205.101, player_2/loss=67.197, rew=1558.00]


Epoch #3695: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3696: 1025it [00:02, 363.85it/s, env_step=3784704, len=31, n/ep=2, n/st=64, player_1/loss=403.933, player_2/loss=205.796, rew=1064.00]


Epoch #3696: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3697: 1025it [00:02, 361.80it/s, env_step=3785728, len=10, n/ep=6, n/st=64, player_1/loss=255.651, player_2/loss=186.424, rew=155.67]


Epoch #3697: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3698: 1025it [00:02, 361.54it/s, env_step=3786752, len=27, n/ep=2, n/st=64, player_1/loss=135.706, player_2/loss=392.505, rew=758.00]


Epoch #3698: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3699: 1025it [00:02, 364.24it/s, env_step=3787776, len=14, n/ep=4, n/st=64, player_1/loss=203.799, player_2/loss=1120.732, rew=209.50]


Epoch #3699: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3700: 1025it [00:02, 360.27it/s, env_step=3788800, len=30, n/ep=2, n/st=64, player_1/loss=216.166, player_2/loss=850.969, rew=989.00]


Epoch #3700: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3701: 1025it [00:02, 366.98it/s, env_step=3789824, len=38, n/ep=1, n/st=64, player_2/loss=688.300, rew=1480.00] 


Epoch #3701: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3702: 1025it [00:02, 364.50it/s, env_step=3790848, len=25, n/ep=2, n/st=64, player_1/loss=415.533, player_2/loss=806.059, rew=730.00]


Epoch #3702: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3703: 1025it [00:02, 358.51it/s, env_step=3791872, len=19, n/ep=2, n/st=64, player_1/loss=442.304, player_2/loss=302.676, rew=554.00]


Epoch #3703: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3704: 1025it [00:02, 364.76it/s, env_step=3792896, len=36, n/ep=2, n/st=64, player_1/loss=215.820, player_2/loss=267.001, rew=1346.00]


Epoch #3704: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3705: 1025it [00:02, 361.92it/s, env_step=3793920, len=15, n/ep=4, n/st=64, player_1/loss=224.851, player_2/loss=458.247, rew=306.00]


Epoch #3705: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3706: 1025it [00:02, 361.16it/s, env_step=3794944, len=33, n/ep=2, n/st=64, player_1/loss=532.736, player_2/loss=506.024, rew=1121.00]


Epoch #3706: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3707: 1025it [00:02, 364.63it/s, env_step=3795968, len=14, n/ep=4, n/st=64, player_1/loss=921.922, player_2/loss=815.802, rew=232.50]


Epoch #3707: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3708: 1025it [00:02, 363.72it/s, env_step=3796992, len=9, n/ep=7, n/st=64, player_1/loss=516.510, player_2/loss=995.287, rew=110.00]


Epoch #3708: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3709: 1025it [00:02, 364.89it/s, env_step=3798016, len=33, n/ep=2, n/st=64, player_1/loss=205.143, player_2/loss=1003.128, rew=1156.00]


Epoch #3709: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3710: 1025it [00:02, 364.89it/s, env_step=3799040, len=32, n/ep=2, n/st=64, player_1/loss=375.755, player_2/loss=547.947, rew=1079.00]


Epoch #3710: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3711: 1025it [00:02, 359.89it/s, env_step=3800064, len=32, n/ep=2, n/st=64, player_1/loss=517.067, player_2/loss=367.173, rew=1089.00]


Epoch #3711: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3712: 1025it [00:02, 361.67it/s, env_step=3801088, len=32, n/ep=2, n/st=64, player_1/loss=399.785, player_2/loss=744.502, rew=1079.00]


Epoch #3712: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3713: 1025it [00:02, 362.18it/s, env_step=3802112, len=11, n/ep=6, n/st=64, player_1/loss=209.896, player_2/loss=584.234, rew=188.33]


Epoch #3713: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3714: 1025it [00:02, 359.64it/s, env_step=3803136, len=34, n/ep=1, n/st=64, player_1/loss=242.673, player_2/loss=192.069, rew=1188.00]


Epoch #3714: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3715: 1025it [00:02, 365.54it/s, env_step=3804160, len=20, n/ep=3, n/st=64, player_1/loss=456.460, player_2/loss=239.838, rew=452.67]


Epoch #3715: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3716: 1025it [00:02, 356.14it/s, env_step=3805184, len=21, n/ep=2, n/st=64, player_1/loss=353.491, player_2/loss=914.389, rew=604.00]


Epoch #3716: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3717: 1025it [00:02, 357.88it/s, env_step=3806208, len=13, n/ep=5, n/st=64, player_1/loss=331.518, player_2/loss=1088.567, rew=271.60]


Epoch #3717: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3718: 1025it [00:02, 362.69it/s, env_step=3807232, len=24, n/ep=2, n/st=64, player_1/loss=301.045, player_2/loss=524.875, rew=653.00]


Epoch #3718: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3719: 1025it [00:02, 358.13it/s, env_step=3808256, len=19, n/ep=4, n/st=64, player_1/loss=330.176, player_2/loss=557.374, rew=466.50]


Epoch #3719: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3720: 1025it [00:02, 362.31it/s, env_step=3809280, len=10, n/ep=5, n/st=64, player_1/loss=479.494, player_2/loss=412.656, rew=122.00]


Epoch #3720: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3721: 1025it [00:02, 364.37it/s, env_step=3810304, len=15, n/ep=3, n/st=64, player_1/loss=225.993, player_2/loss=338.528, rew=238.67]


Epoch #3721: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3722: 1025it [00:02, 358.88it/s, env_step=3811328, len=24, n/ep=2, n/st=64, player_1/loss=507.222, player_2/loss=467.280, rew=767.00]


Epoch #3722: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3723: 1025it [00:02, 366.85it/s, env_step=3812352, len=39, n/ep=1, n/st=64, player_1/loss=655.586, player_2/loss=524.508, rew=1558.00]


Epoch #3723: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3724: 1025it [00:02, 358.63it/s, env_step=3813376, len=38, n/ep=2, n/st=64, player_1/loss=533.673, player_2/loss=450.353, rew=1546.00]


Epoch #3724: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3725: 1025it [00:02, 358.38it/s, env_step=3814400, len=26, n/ep=2, n/st=64, player_1/loss=360.695, player_2/loss=152.646, rew=799.00]


Epoch #3725: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3726: 1025it [00:02, 364.37it/s, env_step=3815424, len=28, n/ep=2, n/st=64, player_1/loss=98.364, player_2/loss=324.177, rew=819.00]


Epoch #3726: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3727: 1025it [00:02, 358.26it/s, env_step=3816448, len=31, n/ep=2, n/st=64, player_1/loss=620.913, player_2/loss=938.971, rew=990.00]


Epoch #3727: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3728: 1025it [00:02, 365.93it/s, env_step=3817472, len=20, n/ep=3, n/st=64, player_1/loss=811.125, player_2/loss=690.074, rew=432.00]


Epoch #3728: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3729: 1025it [00:02, 365.28it/s, env_step=3818496, len=15, n/ep=4, n/st=64, player_1/loss=402.226, player_2/loss=486.808, rew=298.50]


Epoch #3729: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3730: 1025it [00:02, 344.42it/s, env_step=3819520, len=22, n/ep=3, n/st=64, player_1/loss=425.492, player_2/loss=454.249, rew=534.67]


Epoch #3730: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3731: 1025it [00:02, 363.72it/s, env_step=3820544, len=29, n/ep=2, n/st=64, player_1/loss=257.946, player_2/loss=451.342, rew=904.00]


Epoch #3731: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3732: 1025it [00:02, 366.06it/s, env_step=3821568, len=37, n/ep=2, n/st=64, player_1/loss=139.344, player_2/loss=711.265, rew=1442.00]


Epoch #3732: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3733: 1025it [00:02, 358.38it/s, env_step=3822592, len=23, n/ep=3, n/st=64, player_1/loss=200.643, player_2/loss=473.692, rew=602.00]


Epoch #3733: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3734: 1025it [00:02, 361.92it/s, env_step=3823616, len=26, n/ep=3, n/st=64, player_1/loss=163.770, player_2/loss=161.729, rew=854.67]


Epoch #3734: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3735: 1025it [00:02, 365.02it/s, env_step=3824640, len=27, n/ep=2, n/st=64, player_1/loss=234.580, player_2/loss=232.757, rew=803.00]


Epoch #3735: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3736: 1025it [00:02, 362.56it/s, env_step=3825664, len=30, n/ep=2, n/st=64, player_1/loss=343.608, player_2/loss=441.475, rew=929.00]


Epoch #3736: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3737: 1025it [00:02, 362.56it/s, env_step=3826688, len=19, n/ep=4, n/st=64, player_1/loss=266.209, player_2/loss=520.534, rew=379.50]


Epoch #3737: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3738: 1025it [00:02, 365.15it/s, env_step=3827712, len=18, n/ep=4, n/st=64, player_1/loss=99.701, player_2/loss=563.756, rew=433.00]


Epoch #3738: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3739: 1025it [00:02, 360.65it/s, env_step=3828736, len=20, n/ep=3, n/st=64, player_1/loss=32.218, player_2/loss=520.567, rew=424.00]


Epoch #3739: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3740: 1025it [00:02, 367.37it/s, env_step=3829760, len=23, n/ep=2, n/st=64, player_1/loss=495.204, player_2/loss=627.319, rew=574.00]


Epoch #3740: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3741: 1025it [00:02, 364.37it/s, env_step=3830784, len=20, n/ep=4, n/st=64, player_1/loss=794.161, player_2/loss=902.387, rew=503.00]


Epoch #3741: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3742: 1025it [00:02, 356.89it/s, env_step=3831808, len=34, n/ep=2, n/st=64, player_1/loss=886.741, player_2/loss=612.502, rew=1223.00]


Epoch #3742: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3743: 1025it [00:02, 362.95it/s, env_step=3832832, len=33, n/ep=2, n/st=64, player_1/loss=667.998, player_2/loss=582.754, rew=1121.00]


Epoch #3743: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3744: 1025it [00:02, 359.51it/s, env_step=3833856, len=26, n/ep=3, n/st=64, player_1/loss=270.642, player_2/loss=869.701, rew=710.67]


Epoch #3744: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3745: 1025it [00:02, 366.06it/s, env_step=3834880, len=33, n/ep=2, n/st=64, player_1/loss=183.081, player_2/loss=477.302, rew=1154.00]


Epoch #3745: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3746: 1025it [00:02, 365.67it/s, env_step=3835904, len=32, n/ep=2, n/st=64, player_1/loss=355.546, player_2/loss=338.246, rew=1055.00]


Epoch #3746: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3747: 1025it [00:02, 362.31it/s, env_step=3836928, len=25, n/ep=2, n/st=64, player_1/loss=396.333, player_2/loss=326.213, rew=730.00]


Epoch #3747: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3748: 1025it [00:02, 363.08it/s, env_step=3837952, len=36, n/ep=2, n/st=64, player_1/loss=179.629, player_2/loss=99.183, rew=1412.00]


Epoch #3748: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3749: 1025it [00:02, 359.14it/s, env_step=3838976, len=32, n/ep=2, n/st=64, player_1/loss=156.467, player_2/loss=122.578, rew=1087.00]


Epoch #3749: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3750: 1025it [00:02, 363.08it/s, env_step=3840000, len=37, n/ep=2, n/st=64, player_1/loss=122.134, player_2/loss=241.237, rew=1448.00]


Epoch #3750: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3751: 1025it [00:02, 364.11it/s, env_step=3841024, len=14, n/ep=4, n/st=64, player_1/loss=214.913, player_2/loss=220.912, rew=224.00]


Epoch #3751: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3752: 1025it [00:02, 357.88it/s, env_step=3842048, len=39, n/ep=2, n/st=64, player_1/loss=147.085, player_2/loss=362.125, rew=1567.00]


Epoch #3752: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3753: 1025it [00:02, 363.21it/s, env_step=3843072, len=10, n/ep=6, n/st=64, player_1/loss=58.812, player_2/loss=698.206, rew=117.33]


Epoch #3753: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3754: 1025it [00:02, 363.72it/s, env_step=3844096, len=32, n/ep=2, n/st=64, player_1/loss=580.504, player_2/loss=868.814, rew=1087.00]


Epoch #3754: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3755: 1025it [00:02, 362.44it/s, env_step=3845120, len=19, n/ep=4, n/st=64, player_1/loss=689.526, player_2/loss=711.729, rew=451.00]


Epoch #3755: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3756: 1025it [00:02, 363.34it/s, env_step=3846144, len=20, n/ep=3, n/st=64, player_1/loss=497.125, player_2/loss=697.777, rew=510.67]


Epoch #3756: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3757: 1025it [00:02, 359.89it/s, env_step=3847168, len=24, n/ep=3, n/st=64, player_1/loss=331.681, player_2/loss=465.083, rew=624.00]


Epoch #3757: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3758: 1025it [00:02, 362.82it/s, env_step=3848192, len=30, n/ep=2, n/st=64, player_1/loss=482.909, player_2/loss=229.089, rew=959.00]


Epoch #3758: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3759: 1025it [00:02, 361.92it/s, env_step=3849216, len=18, n/ep=3, n/st=64, player_1/loss=388.020, player_2/loss=415.524, rew=366.00]


Epoch #3759: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3760: 1025it [00:02, 364.24it/s, env_step=3850240, len=25, n/ep=3, n/st=64, player_1/loss=311.682, player_2/loss=698.337, rew=745.33]


Epoch #3760: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3761: 1025it [00:02, 363.21it/s, env_step=3851264, len=11, n/ep=8, n/st=64, player_1/loss=219.969, player_2/loss=1243.277, rew=237.50]


Epoch #3761: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3762: 1025it [00:02, 363.72it/s, env_step=3852288, len=38, n/ep=2, n/st=64, player_1/loss=196.431, player_2/loss=943.023, rew=1480.00]


Epoch #3762: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3763: 1025it [00:02, 359.89it/s, env_step=3853312, len=7, n/ep=7, n/st=64, player_1/loss=265.320, player_2/loss=783.457, rew=68.29]


Epoch #3763: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3764: 1025it [00:02, 365.28it/s, env_step=3854336, len=26, n/ep=3, n/st=64, player_1/loss=598.311, player_2/loss=875.670, rew=893.33]


Epoch #3764: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3765: 1025it [00:02, 363.34it/s, env_step=3855360, len=24, n/ep=2, n/st=64, player_1/loss=559.085, player_2/loss=740.815, rew=598.00]


Epoch #3765: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3766: 1025it [00:02, 363.98it/s, env_step=3856384, len=9, n/ep=8, n/st=64, player_1/loss=498.222, player_2/loss=797.463, rew=91.25]


Epoch #3766: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3767: 1025it [00:02, 357.76it/s, env_step=3857408, len=9, n/ep=7, n/st=64, player_1/loss=579.842, player_2/loss=720.444, rew=102.29]


Epoch #3767: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3768: 1025it [00:02, 360.78it/s, env_step=3858432, len=8, n/ep=7, n/st=64, player_1/loss=445.143, rew=90.00]    


Epoch #3768: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3769: 1025it [00:02, 360.78it/s, env_step=3859456, len=12, n/ep=5, n/st=64, player_1/loss=243.624, player_2/loss=396.777, rew=168.00]


Epoch #3769: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3770: 1025it [00:02, 360.14it/s, env_step=3860480, len=18, n/ep=3, n/st=64, player_1/loss=504.250, player_2/loss=642.697, rew=366.67]


Epoch #3770: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3771: 1025it [00:02, 363.46it/s, env_step=3861504, len=18, n/ep=3, n/st=64, player_1/loss=695.904, player_2/loss=413.577, rew=340.67]


Epoch #3771: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3772: 1025it [00:02, 364.89it/s, env_step=3862528, len=21, n/ep=4, n/st=64, player_1/loss=732.248, player_2/loss=145.595, rew=518.50]


Epoch #3772: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3773: 1025it [00:02, 359.01it/s, env_step=3863552, len=24, n/ep=2, n/st=64, player_1/loss=1072.519, player_2/loss=173.804, rew=599.00]


Epoch #3773: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3774: 1025it [00:02, 363.85it/s, env_step=3864576, len=21, n/ep=3, n/st=64, player_1/loss=1031.080, player_2/loss=298.643, rew=514.00]


Epoch #3774: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3775: 1025it [00:02, 360.78it/s, env_step=3865600, len=26, n/ep=3, n/st=64, player_1/loss=511.515, player_2/loss=309.151, rew=700.67]


Epoch #3775: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3776: 1025it [00:02, 358.01it/s, env_step=3866624, len=30, n/ep=2, n/st=64, player_1/loss=403.355, player_2/loss=558.077, rew=961.00]


Epoch #3776: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3777: 1025it [00:02, 362.44it/s, env_step=3867648, len=28, n/ep=2, n/st=64, player_1/loss=473.137, player_2/loss=593.839, rew=835.00]


Epoch #3777: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3778: 1025it [00:02, 361.16it/s, env_step=3868672, len=38, n/ep=2, n/st=64, player_1/loss=435.135, player_2/loss=400.014, rew=1519.00]


Epoch #3778: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3779: 1025it [00:02, 363.85it/s, env_step=3869696, len=34, n/ep=2, n/st=64, player_1/loss=344.676, player_2/loss=252.364, rew=1224.00]


Epoch #3779: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3780: 1025it [00:02, 361.03it/s, env_step=3870720, len=30, n/ep=2, n/st=64, player_1/loss=319.202, player_2/loss=440.932, rew=1009.00]


Epoch #3780: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3781: 1025it [00:02, 359.14it/s, env_step=3871744, len=28, n/ep=2, n/st=64, player_1/loss=315.877, player_2/loss=369.900, rew=851.00]


Epoch #3781: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3782: 1025it [00:02, 362.82it/s, env_step=3872768, len=26, n/ep=3, n/st=64, player_2/loss=490.576, rew=700.67]  


Epoch #3782: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3783: 1025it [00:02, 359.89it/s, env_step=3873792, len=20, n/ep=3, n/st=64, player_1/loss=241.844, player_2/loss=794.193, rew=448.67]


Epoch #3783: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3784: 1025it [00:02, 361.03it/s, env_step=3874816, len=24, n/ep=3, n/st=64, player_1/loss=277.261, player_2/loss=507.670, rew=630.67]


Epoch #3784: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3785: 1025it [00:02, 365.28it/s, env_step=3875840, len=20, n/ep=3, n/st=64, player_1/loss=451.536, player_2/loss=369.811, rew=437.33]


Epoch #3785: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3786: 1025it [00:02, 359.76it/s, env_step=3876864, len=25, n/ep=3, n/st=64, player_1/loss=466.514, player_2/loss=142.582, rew=686.67]


Epoch #3786: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3787: 1025it [00:02, 362.05it/s, env_step=3877888, len=27, n/ep=2, n/st=64, player_1/loss=434.080, player_2/loss=128.772, rew=788.00]


Epoch #3787: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3788: 1025it [00:02, 360.52it/s, env_step=3878912, len=24, n/ep=2, n/st=64, player_1/loss=508.078, player_2/loss=99.162, rew=625.00]


Epoch #3788: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3789: 1025it [00:02, 364.76it/s, env_step=3879936, len=34, n/ep=2, n/st=64, player_1/loss=702.143, player_2/loss=306.142, rew=1189.00]


Epoch #3789: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3790: 1025it [00:02, 361.29it/s, env_step=3880960, len=17, n/ep=4, n/st=64, player_2/loss=307.964, rew=321.00]  


Epoch #3790: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3791: 1025it [00:02, 358.38it/s, env_step=3881984, len=21, n/ep=3, n/st=64, player_1/loss=182.637, player_2/loss=351.207, rew=532.00]


Epoch #3791: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3792: 1025it [00:02, 363.08it/s, env_step=3883008, len=28, n/ep=2, n/st=64, player_1/loss=678.089, player_2/loss=591.487, rew=869.00]


Epoch #3792: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3793: 1025it [00:02, 363.85it/s, env_step=3884032, len=28, n/ep=2, n/st=64, player_1/loss=661.362, player_2/loss=443.184, rew=851.00]


Epoch #3793: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3794: 1025it [00:02, 358.01it/s, env_step=3885056, len=24, n/ep=2, n/st=64, player_1/loss=388.929, player_2/loss=923.839, rew=643.00]


Epoch #3794: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3795: 1025it [00:02, 363.46it/s, env_step=3886080, len=27, n/ep=3, n/st=64, player_1/loss=428.058, player_2/loss=932.064, rew=892.67]


Epoch #3795: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3796: 1025it [00:02, 362.56it/s, env_step=3887104, len=16, n/ep=4, n/st=64, player_1/loss=522.702, player_2/loss=518.531, rew=273.00]


Epoch #3796: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3797: 1025it [00:02, 358.51it/s, env_step=3888128, len=15, n/ep=4, n/st=64, player_1/loss=389.792, player_2/loss=523.645, rew=248.00]


Epoch #3797: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3798: 1025it [00:02, 363.72it/s, env_step=3889152, len=15, n/ep=4, n/st=64, player_1/loss=239.068, player_2/loss=393.748, rew=258.50]


Epoch #3798: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3799: 1025it [00:02, 365.67it/s, env_step=3890176, len=15, n/ep=4, n/st=64, player_1/loss=350.020, player_2/loss=426.211, rew=264.00]


Epoch #3799: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3800: 1025it [00:02, 359.51it/s, env_step=3891200, len=22, n/ep=4, n/st=64, player_1/loss=372.150, player_2/loss=457.498, rew=534.50]


Epoch #3800: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3801: 1025it [00:02, 361.16it/s, env_step=3892224, len=27, n/ep=3, n/st=64, player_1/loss=343.369, player_2/loss=274.018, rew=788.67]


Epoch #3801: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3802: 1025it [00:02, 359.26it/s, env_step=3893248, len=26, n/ep=2, n/st=64, player_1/loss=244.759, player_2/loss=183.926, rew=749.00]


Epoch #3802: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3803: 1025it [00:02, 363.34it/s, env_step=3894272, len=21, n/ep=2, n/st=64, player_1/loss=293.564, player_2/loss=81.947, rew=476.00]


Epoch #3803: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3804: 1025it [00:02, 363.98it/s, env_step=3895296, len=23, n/ep=2, n/st=64, player_2/loss=321.026, rew=566.00]  


Epoch #3804: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3805: 1025it [00:02, 358.63it/s, env_step=3896320, len=26, n/ep=2, n/st=64, player_1/loss=369.488, player_2/loss=550.253, rew=727.00]


Epoch #3805: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3806: 1025it [00:02, 362.44it/s, env_step=3897344, len=29, n/ep=2, n/st=64, player_1/loss=488.120, player_2/loss=327.011, rew=869.00]


Epoch #3806: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3807: 1025it [00:02, 363.98it/s, env_step=3898368, len=34, n/ep=2, n/st=64, player_1/loss=484.509, player_2/loss=101.871, rew=1223.00]


Epoch #3807: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3808: 1025it [00:02, 358.63it/s, env_step=3899392, len=34, n/ep=2, n/st=64, player_1/loss=513.331, player_2/loss=72.412, rew=1197.00]


Epoch #3808: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3809: 1025it [00:02, 362.05it/s, env_step=3900416, len=25, n/ep=2, n/st=64, player_1/loss=477.098, player_2/loss=143.530, rew=652.00]


Epoch #3809: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3810: 1025it [00:02, 358.51it/s, env_step=3901440, len=15, n/ep=4, n/st=64, player_1/loss=592.375, player_2/loss=146.433, rew=238.00]


Epoch #3810: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3811: 1025it [00:02, 363.34it/s, env_step=3902464, len=26, n/ep=3, n/st=64, player_1/loss=421.594, player_2/loss=158.540, rew=844.67]


Epoch #3811: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3812: 1025it [00:02, 361.67it/s, env_step=3903488, len=24, n/ep=2, n/st=64, player_1/loss=238.375, player_2/loss=79.004, rew=623.00]


Epoch #3812: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3813: 1025it [00:02, 359.39it/s, env_step=3904512, len=26, n/ep=3, n/st=64, player_1/loss=245.608, player_2/loss=61.158, rew=738.00]


Epoch #3813: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3814: 1025it [00:02, 363.21it/s, env_step=3905536, len=16, n/ep=4, n/st=64, player_1/loss=327.641, player_2/loss=424.151, rew=303.50]


Epoch #3814: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3815: 1025it [00:02, 362.82it/s, env_step=3906560, len=36, n/ep=2, n/st=64, player_1/loss=298.142, player_2/loss=655.864, rew=1339.00]


Epoch #3815: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3816: 1025it [00:02, 360.78it/s, env_step=3907584, len=19, n/ep=3, n/st=64, player_1/loss=205.036, player_2/loss=484.244, rew=528.00]


Epoch #3816: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3817: 1025it [00:02, 365.02it/s, env_step=3908608, len=14, n/ep=4, n/st=64, player_1/loss=233.863, player_2/loss=150.163, rew=242.00]


Epoch #3817: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3818: 1025it [00:02, 360.52it/s, env_step=3909632, len=21, n/ep=3, n/st=64, player_1/loss=363.018, player_2/loss=63.622, rew=565.33]


Epoch #3818: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3819: 1025it [00:02, 359.26it/s, env_step=3910656, len=19, n/ep=3, n/st=64, player_1/loss=347.266, player_2/loss=244.003, rew=462.00]


Epoch #3819: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3820: 1025it [00:02, 359.14it/s, env_step=3911680, len=25, n/ep=2, n/st=64, player_1/loss=216.941, player_2/loss=422.325, rew=648.00]


Epoch #3820: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3821: 1025it [00:02, 365.28it/s, env_step=3912704, len=23, n/ep=3, n/st=64, player_1/loss=229.580, player_2/loss=472.871, rew=560.67]


Epoch #3821: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3822: 1025it [00:02, 355.40it/s, env_step=3913728, len=23, n/ep=3, n/st=64, player_1/loss=357.078, player_2/loss=292.464, rew=552.67]


Epoch #3822: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3823: 1025it [00:02, 361.54it/s, env_step=3914752, len=21, n/ep=3, n/st=64, player_1/loss=238.702, player_2/loss=376.632, rew=555.33]


Epoch #3823: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3824: 1025it [00:02, 361.41it/s, env_step=3915776, len=20, n/ep=3, n/st=64, player_1/loss=248.569, player_2/loss=520.144, rew=450.67]


Epoch #3824: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3825: 1025it [00:02, 359.26it/s, env_step=3916800, len=20, n/ep=4, n/st=64, player_1/loss=380.707, player_2/loss=438.380, rew=552.50]


Epoch #3825: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3826: 1025it [00:02, 361.41it/s, env_step=3917824, len=9, n/ep=7, n/st=64, player_1/loss=342.444, player_2/loss=193.676, rew=90.00]


Epoch #3826: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3827: 1025it [00:02, 363.72it/s, env_step=3918848, len=17, n/ep=4, n/st=64, player_1/loss=198.481, player_2/loss=97.787, rew=326.00]


Epoch #3827: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3828: 1025it [00:02, 359.14it/s, env_step=3919872, len=20, n/ep=3, n/st=64, player_1/loss=274.568, player_2/loss=86.952, rew=443.33]


Epoch #3828: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3829: 1025it [00:02, 363.85it/s, env_step=3920896, len=25, n/ep=3, n/st=64, player_1/loss=218.068, player_2/loss=65.498, rew=786.67]


Epoch #3829: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3830: 1025it [00:02, 364.37it/s, env_step=3921920, len=29, n/ep=3, n/st=64, player_1/loss=66.371, player_2/loss=409.696, rew=979.33]


Epoch #3830: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3831: 1025it [00:02, 361.80it/s, env_step=3922944, len=21, n/ep=2, n/st=64, player_1/loss=79.607, player_2/loss=510.468, rew=464.00]


Epoch #3831: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3832: 1025it [00:02, 362.56it/s, env_step=3923968, len=26, n/ep=2, n/st=64, player_1/loss=270.862, player_2/loss=581.576, rew=781.00]


Epoch #3832: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3833: 1025it [00:02, 357.76it/s, env_step=3924992, len=32, n/ep=2, n/st=64, player_1/loss=457.850, player_2/loss=436.918, rew=1089.00]


Epoch #3833: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3834: 1025it [00:02, 362.31it/s, env_step=3926016, len=30, n/ep=2, n/st=64, player_1/loss=313.982, player_2/loss=231.513, rew=965.00]


Epoch #3834: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3835: 1025it [00:02, 361.92it/s, env_step=3927040, len=35, n/ep=2, n/st=64, player_1/loss=394.359, player_2/loss=215.040, rew=1258.00]


Epoch #3835: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3836: 1025it [00:02, 359.39it/s, env_step=3928064, len=42, n/ep=1, n/st=64, player_1/loss=317.061, player_2/loss=201.605, rew=1834.00]


Epoch #3836: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3837: 1025it [00:02, 361.54it/s, env_step=3929088, len=38, n/ep=2, n/st=64, player_1/loss=356.036, player_2/loss=597.428, rew=1521.00]


Epoch #3837: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3838: 1025it [00:02, 362.56it/s, env_step=3930112, len=21, n/ep=3, n/st=64, player_1/loss=703.160, player_2/loss=636.056, rew=552.00]


Epoch #3838: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3839: 1025it [00:02, 362.82it/s, env_step=3931136, len=32, n/ep=2, n/st=64, player_1/loss=662.176, player_2/loss=135.606, rew=1058.00]


Epoch #3839: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3840: 1025it [00:02, 359.26it/s, env_step=3932160, len=30, n/ep=2, n/st=64, player_1/loss=283.615, player_2/loss=727.861, rew=944.00]


Epoch #3840: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3841: 1025it [00:02, 363.59it/s, env_step=3933184, len=32, n/ep=2, n/st=64, player_1/loss=82.017, player_2/loss=858.805, rew=1054.00]


Epoch #3841: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3842: 1025it [00:02, 366.06it/s, env_step=3934208, len=27, n/ep=2, n/st=64, player_1/loss=128.440, player_2/loss=1015.420, rew=835.00]


Epoch #3842: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3843: 1025it [00:02, 357.01it/s, env_step=3935232, len=32, n/ep=2, n/st=64, player_1/loss=324.786, player_2/loss=883.149, rew=1087.00]


Epoch #3843: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3844: 1025it [00:02, 362.18it/s, env_step=3936256, len=36, n/ep=2, n/st=64, player_1/loss=262.586, player_2/loss=623.245, rew=1334.00]


Epoch #3844: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3845: 1025it [00:02, 363.21it/s, env_step=3937280, len=29, n/ep=3, n/st=64, player_1/loss=252.835, player_2/loss=660.107, rew=868.00]


Epoch #3845: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3846: 1025it [00:02, 359.89it/s, env_step=3938304, len=24, n/ep=3, n/st=64, player_1/loss=393.398, player_2/loss=226.412, rew=626.00]


Epoch #3846: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3847: 1025it [00:02, 357.01it/s, env_step=3939328, len=23, n/ep=3, n/st=64, player_1/loss=400.837, player_2/loss=110.263, rew=554.67]


Epoch #3847: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3848: 1025it [00:02, 364.37it/s, env_step=3940352, len=32, n/ep=2, n/st=64, player_1/loss=398.078, player_2/loss=535.486, rew=1058.00]


Epoch #3848: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3849: 1025it [00:02, 360.65it/s, env_step=3941376, len=33, n/ep=2, n/st=64, player_1/loss=384.085, player_2/loss=719.366, rew=1156.00]


Epoch #3849: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3850: 1025it [00:02, 365.02it/s, env_step=3942400, len=32, n/ep=2, n/st=64, player_1/loss=279.496, player_2/loss=607.183, rew=1087.00]


Epoch #3850: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3851: 1025it [00:02, 362.18it/s, env_step=3943424, len=34, n/ep=2, n/st=64, player_1/loss=99.929, player_2/loss=491.655, rew=1223.00]


Epoch #3851: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3852: 1025it [00:02, 359.51it/s, env_step=3944448, len=25, n/ep=2, n/st=64, player_1/loss=160.595, player_2/loss=165.883, rew=856.00]


Epoch #3852: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3853: 1025it [00:02, 361.03it/s, env_step=3945472, len=37, n/ep=2, n/st=64, player_1/loss=150.826, player_2/loss=136.621, rew=1404.00]


Epoch #3853: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3854: 1025it [00:02, 363.85it/s, env_step=3946496, len=35, n/ep=2, n/st=64, player_1/loss=98.728, player_2/loss=412.200, rew=1351.00]


Epoch #3854: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3855: 1025it [00:02, 359.14it/s, env_step=3947520, len=33, n/ep=2, n/st=64, player_1/loss=104.336, player_2/loss=570.955, rew=1184.00]


Epoch #3855: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3856: 1025it [00:02, 362.18it/s, env_step=3948544, len=24, n/ep=3, n/st=64, player_1/loss=79.612, player_2/loss=412.313, rew=684.00]


Epoch #3856: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3857: 1025it [00:02, 362.82it/s, env_step=3949568, len=22, n/ep=3, n/st=64, player_1/loss=140.822, player_2/loss=1159.451, rew=552.00]


Epoch #3857: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3858: 1025it [00:02, 360.52it/s, env_step=3950592, len=36, n/ep=2, n/st=64, player_1/loss=159.715, player_2/loss=876.771, rew=1346.00]


Epoch #3858: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3859: 1025it [00:02, 363.98it/s, env_step=3951616, len=28, n/ep=2, n/st=64, player_1/loss=172.090, player_2/loss=140.495, rew=811.00]


Epoch #3859: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3860: 1025it [00:02, 359.51it/s, env_step=3952640, len=16, n/ep=4, n/st=64, player_1/loss=238.647, player_2/loss=674.108, rew=270.50]


Epoch #3860: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3861: 1025it [00:02, 363.59it/s, env_step=3953664, len=28, n/ep=2, n/st=64, player_1/loss=237.364, player_2/loss=1402.730, rew=841.00]


Epoch #3861: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3862: 1025it [00:02, 361.80it/s, env_step=3954688, len=25, n/ep=2, n/st=64, player_1/loss=287.325, player_2/loss=870.461, rew=676.00]


Epoch #3862: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3863: 1025it [00:02, 360.65it/s, env_step=3955712, len=32, n/ep=2, n/st=64, player_1/loss=173.758, player_2/loss=396.752, rew=1192.00]


Epoch #3863: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3864: 1025it [00:02, 363.46it/s, env_step=3956736, len=17, n/ep=4, n/st=64, player_1/loss=121.163, player_2/loss=780.703, rew=340.00]


Epoch #3864: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3865: 1025it [00:02, 358.38it/s, env_step=3957760, len=19, n/ep=3, n/st=64, player_1/loss=104.534, player_2/loss=596.972, rew=418.67]


Epoch #3865: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3866: 1025it [00:02, 363.21it/s, env_step=3958784, len=23, n/ep=2, n/st=64, player_1/loss=352.688, player_2/loss=324.259, rew=631.00]


Epoch #3866: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3867: 1025it [00:02, 363.85it/s, env_step=3959808, len=13, n/ep=5, n/st=64, player_1/loss=479.854, player_2/loss=379.478, rew=217.20]


Epoch #3867: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3868: 1025it [00:02, 356.51it/s, env_step=3960832, len=37, n/ep=1, n/st=64, player_1/loss=518.791, player_2/loss=368.696, rew=1404.00]


Epoch #3868: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3869: 1025it [00:02, 364.24it/s, env_step=3961856, len=30, n/ep=2, n/st=64, player_1/loss=440.133, player_2/loss=513.884, rew=928.00]


Epoch #3869: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3870: 1025it [00:02, 363.72it/s, env_step=3962880, len=19, n/ep=2, n/st=64, player_1/loss=275.458, player_2/loss=570.990, rew=379.00]


Epoch #3870: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3871: 1025it [00:02, 359.77it/s, env_step=3963904, len=31, n/ep=2, n/st=64, player_1/loss=38.616, player_2/loss=358.534, rew=999.00]


Epoch #3871: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3872: 1025it [00:02, 361.54it/s, env_step=3964928, len=24, n/ep=2, n/st=64, player_1/loss=270.049, player_2/loss=288.757, rew=625.00]


Epoch #3872: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3873: 1025it [00:02, 361.92it/s, env_step=3965952, len=36, n/ep=2, n/st=64, player_1/loss=296.050, player_2/loss=1079.543, rew=1381.00]


Epoch #3873: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3874: 1025it [00:02, 359.89it/s, env_step=3966976, len=29, n/ep=2, n/st=64, player_1/loss=180.910, player_2/loss=931.587, rew=898.00]


Epoch #3874: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3875: 1025it [00:02, 363.08it/s, env_step=3968000, len=25, n/ep=3, n/st=64, player_1/loss=246.238, player_2/loss=142.222, rew=690.67]


Epoch #3875: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3876: 1025it [00:02, 359.51it/s, env_step=3969024, len=32, n/ep=2, n/st=64, player_1/loss=215.213, player_2/loss=131.003, rew=1093.00]


Epoch #3876: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3877: 1025it [00:02, 364.37it/s, env_step=3970048, len=28, n/ep=2, n/st=64, player_1/loss=427.547, player_2/loss=452.855, rew=835.00]


Epoch #3877: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3878: 1025it [00:02, 361.92it/s, env_step=3971072, len=27, n/ep=2, n/st=64, player_1/loss=499.496, player_2/loss=452.421, rew=914.00]


Epoch #3878: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3879: 1025it [00:02, 360.65it/s, env_step=3972096, len=33, n/ep=2, n/st=64, player_1/loss=249.372, player_2/loss=121.081, rew=1174.00]


Epoch #3879: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3880: 1025it [00:02, 362.44it/s, env_step=3973120, len=34, n/ep=2, n/st=64, player_1/loss=306.364, player_2/loss=103.056, rew=1235.00]


Epoch #3880: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3881: 1025it [00:02, 363.72it/s, env_step=3974144, len=18, n/ep=4, n/st=64, player_1/loss=195.324, player_2/loss=395.056, rew=374.00]


Epoch #3881: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3882: 1025it [00:02, 358.38it/s, env_step=3975168, len=37, n/ep=1, n/st=64, player_1/loss=175.945, player_2/loss=502.600, rew=1404.00]


Epoch #3882: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3883: 1025it [00:02, 362.44it/s, env_step=3976192, len=19, n/ep=4, n/st=64, player_1/loss=288.854, player_2/loss=548.271, rew=416.00]


Epoch #3883: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3884: 1025it [00:02, 358.88it/s, env_step=3977216, len=29, n/ep=2, n/st=64, player_1/loss=367.570, player_2/loss=847.000, rew=868.00]


Epoch #3884: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3885: 1025it [00:02, 362.44it/s, env_step=3978240, len=17, n/ep=4, n/st=64, player_1/loss=380.026, player_2/loss=1142.626, rew=420.50]


Epoch #3885: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3886: 1025it [00:02, 364.89it/s, env_step=3979264, len=32, n/ep=2, n/st=64, player_1/loss=573.665, rew=1055.00] 


Epoch #3886: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3887: 1025it [00:02, 359.77it/s, env_step=3980288, len=21, n/ep=3, n/st=64, player_1/loss=431.203, player_2/loss=86.744, rew=462.67]


Epoch #3887: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3888: 1025it [00:02, 362.18it/s, env_step=3981312, len=25, n/ep=3, n/st=64, player_1/loss=350.163, player_2/loss=114.916, rew=859.33]


Epoch #3888: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3889: 1025it [00:02, 361.67it/s, env_step=3982336, len=18, n/ep=4, n/st=64, player_1/loss=332.343, player_2/loss=377.754, rew=392.00]


Epoch #3889: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3890: 1025it [00:02, 360.52it/s, env_step=3983360, len=21, n/ep=3, n/st=64, player_1/loss=354.843, player_2/loss=819.318, rew=495.33]


Epoch #3890: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3891: 1025it [00:02, 362.44it/s, env_step=3984384, len=22, n/ep=3, n/st=64, player_1/loss=511.939, player_2/loss=712.156, rew=522.00]


Epoch #3891: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3892: 1025it [00:02, 365.80it/s, env_step=3985408, len=28, n/ep=3, n/st=64, player_1/loss=485.614, player_2/loss=319.629, rew=883.33]


Epoch #3892: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3893: 1025it [00:02, 360.02it/s, env_step=3986432, len=22, n/ep=2, n/st=64, player_1/loss=271.124, player_2/loss=174.076, rew=529.00]


Epoch #3893: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3894: 1025it [00:02, 364.89it/s, env_step=3987456, len=19, n/ep=4, n/st=64, player_1/loss=254.633, player_2/loss=171.819, rew=406.50]


Epoch #3894: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3895: 1025it [00:02, 361.03it/s, env_step=3988480, len=24, n/ep=2, n/st=64, player_1/loss=335.038, player_2/loss=155.573, rew=719.00]


Epoch #3895: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3896: 1025it [00:02, 361.92it/s, env_step=3989504, len=29, n/ep=2, n/st=64, player_1/loss=374.006, player_2/loss=163.381, rew=869.00]


Epoch #3896: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3897: 1025it [00:02, 363.85it/s, env_step=3990528, len=26, n/ep=2, n/st=64, player_1/loss=273.583, player_2/loss=153.280, rew=729.00]


Epoch #3897: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3898: 1025it [00:02, 359.26it/s, env_step=3991552, len=20, n/ep=4, n/st=64, player_1/loss=266.970, player_2/loss=296.730, rew=524.00]


Epoch #3898: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3899: 1025it [00:02, 362.31it/s, env_step=3992576, len=20, n/ep=3, n/st=64, player_1/loss=305.489, player_2/loss=606.459, rew=567.33]


Epoch #3899: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3900: 1025it [00:02, 360.91it/s, env_step=3993600, len=24, n/ep=2, n/st=64, player_1/loss=474.491, player_2/loss=679.704, rew=713.00]


Epoch #3900: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3901: 1025it [00:02, 356.02it/s, env_step=3994624, len=28, n/ep=2, n/st=64, player_1/loss=416.698, player_2/loss=535.988, rew=869.00]


Epoch #3901: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3902: 1025it [00:02, 361.92it/s, env_step=3995648, len=42, n/ep=1, n/st=64, player_1/loss=215.758, player_2/loss=610.704, rew=1834.00]


Epoch #3902: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3903: 1025it [00:02, 358.13it/s, env_step=3996672, len=18, n/ep=3, n/st=64, player_1/loss=442.917, player_2/loss=529.622, rew=354.00]


Epoch #3903: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3904: 1025it [00:02, 364.11it/s, env_step=3997696, len=15, n/ep=4, n/st=64, player_1/loss=420.661, player_2/loss=566.226, rew=258.00]


Epoch #3904: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3905: 1025it [00:02, 362.05it/s, env_step=3998720, len=34, n/ep=2, n/st=64, player_1/loss=510.353, player_2/loss=339.194, rew=1197.00]


Epoch #3905: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3906: 1025it [00:02, 357.38it/s, env_step=3999744, len=29, n/ep=3, n/st=64, player_1/loss=609.034, player_2/loss=437.879, rew=962.00]


Epoch #3906: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3907: 1025it [00:02, 364.63it/s, env_step=4000768, len=14, n/ep=3, n/st=64, player_1/loss=351.375, player_2/loss=479.224, rew=242.67]


Epoch #3907: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3908: 1025it [00:02, 364.24it/s, env_step=4001792, len=29, n/ep=2, n/st=64, player_1/loss=142.438, player_2/loss=368.791, rew=904.00]


Epoch #3908: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3909: 1025it [00:02, 359.26it/s, env_step=4002816, len=20, n/ep=4, n/st=64, player_1/loss=369.235, player_2/loss=569.467, rew=579.50]


Epoch #3909: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3910: 1025it [00:02, 362.18it/s, env_step=4003840, len=19, n/ep=3, n/st=64, player_1/loss=412.376, player_2/loss=822.617, rew=508.67]


Epoch #3910: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3911: 1025it [00:02, 363.34it/s, env_step=4004864, len=21, n/ep=3, n/st=64, player_1/loss=225.081, player_2/loss=498.556, rew=460.67]


Epoch #3911: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3912: 1025it [00:02, 360.27it/s, env_step=4005888, len=27, n/ep=2, n/st=64, player_1/loss=436.642, player_2/loss=286.816, rew=818.00]


Epoch #3912: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3913: 1025it [00:02, 364.50it/s, env_step=4006912, len=25, n/ep=3, n/st=64, player_1/loss=601.910, player_2/loss=394.192, rew=718.67]


Epoch #3913: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3914: 1025it [00:02, 359.51it/s, env_step=4007936, len=21, n/ep=4, n/st=64, player_1/loss=271.566, player_2/loss=619.259, rew=521.00]


Epoch #3914: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3915: 1025it [00:02, 363.21it/s, env_step=4008960, len=30, n/ep=2, n/st=64, player_1/loss=358.843, player_2/loss=798.821, rew=944.00]


Epoch #3915: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3916: 1025it [00:02, 364.63it/s, env_step=4009984, len=26, n/ep=3, n/st=64, player_1/loss=480.702, player_2/loss=379.765, rew=874.00]


Epoch #3916: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3917: 1025it [00:02, 357.76it/s, env_step=4011008, len=15, n/ep=5, n/st=64, player_1/loss=374.293, player_2/loss=400.006, rew=294.00]


Epoch #3917: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3918: 1025it [00:02, 363.34it/s, env_step=4012032, len=20, n/ep=3, n/st=64, player_1/loss=280.934, player_2/loss=383.900, rew=472.00]


Epoch #3918: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3919: 1025it [00:02, 364.76it/s, env_step=4013056, len=31, n/ep=3, n/st=64, player_1/loss=306.154, player_2/loss=264.156, rew=1002.67]


Epoch #3919: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3920: 1025it [00:02, 359.89it/s, env_step=4014080, len=35, n/ep=2, n/st=64, player_1/loss=243.678, player_2/loss=215.816, rew=1258.00]


Epoch #3920: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3921: 1025it [00:02, 363.46it/s, env_step=4015104, len=26, n/ep=2, n/st=64, player_1/loss=287.785, player_2/loss=231.914, rew=725.00]


Epoch #3921: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3922: 1025it [00:02, 363.46it/s, env_step=4016128, len=19, n/ep=4, n/st=64, player_1/loss=322.251, player_2/loss=396.737, rew=420.00]


Epoch #3922: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3923: 1025it [00:02, 362.56it/s, env_step=4017152, len=15, n/ep=4, n/st=64, player_1/loss=293.470, player_2/loss=473.722, rew=252.00]


Epoch #3923: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3924: 1025it [00:02, 347.10it/s, env_step=4018176, len=29, n/ep=3, n/st=64, player_1/loss=268.565, player_2/loss=742.217, rew=917.33]


Epoch #3924: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3925: 1025it [00:02, 355.28it/s, env_step=4019200, len=18, n/ep=4, n/st=64, player_1/loss=268.180, player_2/loss=810.077, rew=368.00]


Epoch #3925: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3926: 1025it [00:02, 360.78it/s, env_step=4020224, len=14, n/ep=4, n/st=64, player_1/loss=467.865, player_2/loss=810.002, rew=231.50]


Epoch #3926: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3927: 1025it [00:02, 360.65it/s, env_step=4021248, len=29, n/ep=3, n/st=64, player_1/loss=666.354, player_2/loss=671.972, rew=964.00]


Epoch #3927: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3928: 1025it [00:02, 362.56it/s, env_step=4022272, len=17, n/ep=3, n/st=64, player_1/loss=482.362, player_2/loss=429.873, rew=336.00]


Epoch #3928: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3929: 1025it [00:02, 361.03it/s, env_step=4023296, len=28, n/ep=3, n/st=64, player_1/loss=349.038, player_2/loss=439.424, rew=971.33]


Epoch #3929: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3930: 1025it [00:02, 358.63it/s, env_step=4024320, len=18, n/ep=3, n/st=64, player_1/loss=419.801, player_2/loss=633.391, rew=396.00]


Epoch #3930: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3931: 1025it [00:02, 359.51it/s, env_step=4025344, len=33, n/ep=2, n/st=64, player_1/loss=216.057, player_2/loss=697.403, rew=1160.00]


Epoch #3931: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3932: 1025it [00:02, 357.26it/s, env_step=4026368, len=20, n/ep=3, n/st=64, player_1/loss=412.754, player_2/loss=717.667, rew=418.67]


Epoch #3932: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3933: 1025it [00:02, 363.21it/s, env_step=4027392, len=28, n/ep=2, n/st=64, player_1/loss=499.818, player_2/loss=320.633, rew=810.00]


Epoch #3933: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3934: 1025it [00:02, 362.82it/s, env_step=4028416, len=21, n/ep=3, n/st=64, player_1/loss=236.655, player_2/loss=569.577, rew=492.00]


Epoch #3934: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3935: 1025it [00:02, 356.27it/s, env_step=4029440, len=26, n/ep=2, n/st=64, player_1/loss=145.501, player_2/loss=443.738, rew=727.00]


Epoch #3935: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3936: 1025it [00:02, 362.18it/s, env_step=4030464, len=21, n/ep=3, n/st=64, player_1/loss=169.606, player_2/loss=309.321, rew=506.00]


Epoch #3936: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3937: 1025it [00:02, 361.80it/s, env_step=4031488, len=21, n/ep=3, n/st=64, player_1/loss=355.993, player_2/loss=305.114, rew=492.00]


Epoch #3937: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3938: 1025it [00:02, 357.51it/s, env_step=4032512, len=20, n/ep=2, n/st=64, player_1/loss=367.230, player_2/loss=376.595, rew=539.00]


Epoch #3938: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3939: 1025it [00:02, 363.98it/s, env_step=4033536, len=15, n/ep=4, n/st=64, player_1/loss=230.985, player_2/loss=286.154, rew=252.00]


Epoch #3939: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3940: 1025it [00:02, 364.11it/s, env_step=4034560, len=18, n/ep=4, n/st=64, player_1/loss=378.303, player_2/loss=539.623, rew=376.50]


Epoch #3940: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3941: 1025it [00:02, 357.76it/s, env_step=4035584, len=26, n/ep=2, n/st=64, player_1/loss=521.433, player_2/loss=337.299, rew=729.00]


Epoch #3941: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3942: 1025it [00:02, 361.80it/s, env_step=4036608, len=19, n/ep=3, n/st=64, player_1/loss=676.948, player_2/loss=344.592, rew=405.33]


Epoch #3942: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3943: 1025it [00:02, 362.18it/s, env_step=4037632, len=24, n/ep=2, n/st=64, player_1/loss=768.263, player_2/loss=472.223, rew=623.00]


Epoch #3943: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3944: 1025it [00:02, 356.76it/s, env_step=4038656, len=36, n/ep=2, n/st=64, player_1/loss=435.214, player_2/loss=480.384, rew=1346.00]


Epoch #3944: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3945: 1025it [00:02, 361.29it/s, env_step=4039680, len=35, n/ep=2, n/st=64, player_1/loss=227.545, player_2/loss=363.985, rew=1294.00]


Epoch #3945: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3946: 1025it [00:02, 359.26it/s, env_step=4040704, len=15, n/ep=4, n/st=64, player_1/loss=150.804, player_2/loss=718.680, rew=296.50]


Epoch #3946: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3947: 1025it [00:02, 360.91it/s, env_step=4041728, len=20, n/ep=3, n/st=64, player_1/loss=338.551, player_2/loss=767.281, rew=496.67]


Epoch #3947: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3948: 1025it [00:02, 360.65it/s, env_step=4042752, len=8, n/ep=8, n/st=64, player_1/loss=352.705, player_2/loss=451.432, rew=76.50]


Epoch #3948: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3949: 1025it [00:02, 358.63it/s, env_step=4043776, len=12, n/ep=5, n/st=64, player_1/loss=132.688, player_2/loss=418.885, rew=162.00]


Epoch #3949: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3950: 1025it [00:02, 364.24it/s, env_step=4044800, len=28, n/ep=2, n/st=64, player_1/loss=330.925, player_2/loss=365.524, rew=811.00]


Epoch #3950: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3951: 1025it [00:02, 360.02it/s, env_step=4045824, len=30, n/ep=2, n/st=64, player_1/loss=393.280, player_2/loss=213.617, rew=965.00]


Epoch #3951: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3952: 1025it [00:02, 362.69it/s, env_step=4046848, len=21, n/ep=3, n/st=64, player_1/loss=370.750, player_2/loss=529.428, rew=575.33]


Epoch #3952: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3953: 1025it [00:02, 361.29it/s, env_step=4047872, len=16, n/ep=3, n/st=64, player_1/loss=240.812, player_2/loss=601.925, rew=296.67]


Epoch #3953: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3954: 1025it [00:02, 356.02it/s, env_step=4048896, len=13, n/ep=5, n/st=64, player_1/loss=426.212, player_2/loss=280.883, rew=202.80]


Epoch #3954: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3955: 1025it [00:02, 363.46it/s, env_step=4049920, len=26, n/ep=3, n/st=64, player_1/loss=434.817, player_2/loss=82.217, rew=754.00]


Epoch #3955: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3956: 1025it [00:02, 357.13it/s, env_step=4050944, len=24, n/ep=3, n/st=64, player_1/loss=395.697, player_2/loss=399.658, rew=632.67]


Epoch #3956: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3957: 1025it [00:02, 362.18it/s, env_step=4051968, len=22, n/ep=3, n/st=64, player_1/loss=387.540, player_2/loss=709.106, rew=638.00]


Epoch #3957: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3958: 1025it [00:02, 362.31it/s, env_step=4052992, len=20, n/ep=3, n/st=64, player_1/loss=273.472, player_2/loss=694.388, rew=432.00]


Epoch #3958: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3959: 1025it [00:02, 357.63it/s, env_step=4054016, len=12, n/ep=6, n/st=64, player_1/loss=166.967, player_2/loss=546.732, rew=182.67]


Epoch #3959: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3960: 1025it [00:02, 362.18it/s, env_step=4055040, len=11, n/ep=5, n/st=64, player_1/loss=235.979, player_2/loss=122.525, rew=157.20]


Epoch #3960: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3961: 1025it [00:02, 362.31it/s, env_step=4056064, len=24, n/ep=3, n/st=64, player_1/loss=348.926, player_2/loss=275.566, rew=621.33]


Epoch #3961: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3962: 1025it [00:02, 360.02it/s, env_step=4057088, len=29, n/ep=2, n/st=64, player_1/loss=155.204, player_2/loss=395.007, rew=904.00]


Epoch #3962: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3963: 1025it [00:02, 363.46it/s, env_step=4058112, len=25, n/ep=3, n/st=64, player_1/loss=247.288, player_2/loss=364.685, rew=682.67]


Epoch #3963: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3964: 1025it [00:02, 361.16it/s, env_step=4059136, len=30, n/ep=3, n/st=64, player_1/loss=562.704, player_2/loss=109.006, rew=968.67]


Epoch #3964: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3965: 1025it [00:02, 359.14it/s, env_step=4060160, len=25, n/ep=3, n/st=64, player_1/loss=1175.818, player_2/loss=354.816, rew=782.00]


Epoch #3965: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3966: 1025it [00:02, 359.51it/s, env_step=4061184, len=33, n/ep=2, n/st=64, player_1/loss=1053.262, player_2/loss=713.238, rew=1136.00]


Epoch #3966: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3967: 1025it [00:02, 359.26it/s, env_step=4062208, len=34, n/ep=2, n/st=64, player_1/loss=293.268, player_2/loss=455.545, rew=1197.00]


Epoch #3967: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3968: 1025it [00:02, 363.21it/s, env_step=4063232, len=23, n/ep=3, n/st=64, player_1/loss=243.128, player_2/loss=219.101, rew=554.67]


Epoch #3968: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3969: 1025it [00:02, 362.56it/s, env_step=4064256, len=21, n/ep=3, n/st=64, player_1/loss=282.865, player_2/loss=672.107, rew=492.00]


Epoch #3969: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3970: 1025it [00:02, 358.26it/s, env_step=4065280, len=22, n/ep=3, n/st=64, player_1/loss=151.322, player_2/loss=743.622, rew=506.67]


Epoch #3970: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3971: 1025it [00:02, 364.50it/s, env_step=4066304, len=19, n/ep=3, n/st=64, player_1/loss=140.946, player_2/loss=421.902, rew=405.33]


Epoch #3971: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3972: 1025it [00:02, 358.76it/s, env_step=4067328, len=31, n/ep=2, n/st=64, player_1/loss=197.762, player_2/loss=503.244, rew=1022.00]


Epoch #3972: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3973: 1025it [00:02, 361.16it/s, env_step=4068352, len=32, n/ep=2, n/st=64, player_1/loss=321.752, player_2/loss=205.907, rew=1099.00]


Epoch #3973: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3974: 1025it [00:02, 363.34it/s, env_step=4069376, len=29, n/ep=2, n/st=64, player_1/loss=389.775, player_2/loss=595.527, rew=893.00]


Epoch #3974: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3975: 1025it [00:02, 356.51it/s, env_step=4070400, len=28, n/ep=2, n/st=64, player_1/loss=500.641, player_2/loss=609.608, rew=835.00]


Epoch #3975: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3976: 1025it [00:02, 361.80it/s, env_step=4071424, len=30, n/ep=2, n/st=64, player_1/loss=295.555, rew=937.00]  


Epoch #3976: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3977: 1025it [00:02, 362.82it/s, env_step=4072448, len=23, n/ep=3, n/st=64, player_1/loss=376.879, player_2/loss=441.974, rew=610.00]


Epoch #3977: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3978: 1025it [00:02, 356.89it/s, env_step=4073472, len=27, n/ep=3, n/st=64, player_1/loss=387.842, player_2/loss=850.200, rew=758.67]


Epoch #3978: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3979: 1025it [00:02, 362.44it/s, env_step=4074496, len=17, n/ep=3, n/st=64, player_1/loss=230.335, player_2/loss=760.394, rew=352.67]


Epoch #3979: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3980: 1025it [00:02, 360.40it/s, env_step=4075520, len=21, n/ep=3, n/st=64, player_1/loss=268.204, player_2/loss=361.604, rew=477.33]


Epoch #3980: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3981: 1025it [00:02, 359.01it/s, env_step=4076544, len=32, n/ep=2, n/st=64, player_1/loss=240.564, player_2/loss=556.042, rew=1055.00]


Epoch #3981: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3982: 1025it [00:02, 360.40it/s, env_step=4077568, len=18, n/ep=3, n/st=64, player_1/loss=240.024, player_2/loss=629.139, rew=369.33]


Epoch #3982: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3983: 1025it [00:02, 357.63it/s, env_step=4078592, len=17, n/ep=4, n/st=64, player_1/loss=238.011, player_2/loss=274.470, rew=314.00]


Epoch #3983: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3984: 1025it [00:02, 364.24it/s, env_step=4079616, len=19, n/ep=3, n/st=64, player_1/loss=257.598, player_2/loss=346.619, rew=408.00]


Epoch #3984: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3985: 1025it [00:02, 366.58it/s, env_step=4080640, len=21, n/ep=3, n/st=64, player_1/loss=280.379, player_2/loss=443.045, rew=460.67]


Epoch #3985: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3986: 1025it [00:02, 355.52it/s, env_step=4081664, len=21, n/ep=3, n/st=64, player_1/loss=385.757, player_2/loss=477.667, rew=490.00]


Epoch #3986: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3987: 1025it [00:02, 359.77it/s, env_step=4082688, len=21, n/ep=3, n/st=64, player_1/loss=417.185, player_2/loss=262.518, rew=478.67]


Epoch #3987: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3988: 1025it [00:02, 358.13it/s, env_step=4083712, len=16, n/ep=4, n/st=64, player_1/loss=206.966, player_2/loss=335.171, rew=317.50]


Epoch #3988: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3989: 1025it [00:02, 361.03it/s, env_step=4084736, len=18, n/ep=4, n/st=64, player_1/loss=140.722, player_2/loss=470.753, rew=456.50]


Epoch #3989: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3990: 1025it [00:02, 364.24it/s, env_step=4085760, len=29, n/ep=2, n/st=64, player_1/loss=442.969, player_2/loss=204.572, rew=917.00]


Epoch #3990: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3991: 1025it [00:02, 360.27it/s, env_step=4086784, len=28, n/ep=2, n/st=64, player_1/loss=397.078, player_2/loss=158.578, rew=811.00]


Epoch #3991: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3992: 1025it [00:02, 363.72it/s, env_step=4087808, len=34, n/ep=2, n/st=64, player_1/loss=390.710, player_2/loss=461.254, rew=1189.00]


Epoch #3992: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3993: 1025it [00:02, 359.01it/s, env_step=4088832, len=19, n/ep=3, n/st=64, player_1/loss=308.949, player_2/loss=441.213, rew=530.00]


Epoch #3993: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3994: 1025it [00:02, 360.65it/s, env_step=4089856, len=16, n/ep=4, n/st=64, player_1/loss=219.199, player_2/loss=153.034, rew=294.00]


Epoch #3994: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3995: 1025it [00:02, 362.95it/s, env_step=4090880, len=26, n/ep=3, n/st=64, player_1/loss=233.842, player_2/loss=527.994, rew=780.00]


Epoch #3995: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3996: 1025it [00:02, 361.54it/s, env_step=4091904, len=20, n/ep=3, n/st=64, player_1/loss=196.857, player_2/loss=497.550, rew=447.33]


Epoch #3996: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3997: 1025it [00:02, 356.64it/s, env_step=4092928, len=23, n/ep=2, n/st=64, player_1/loss=298.951, player_2/loss=42.938, rew=554.00]


Epoch #3997: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3998: 1025it [00:02, 362.56it/s, env_step=4093952, len=23, n/ep=3, n/st=64, player_1/loss=507.277, player_2/loss=38.669, rew=587.33]


Epoch #3998: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #3999: 1025it [00:02, 361.92it/s, env_step=4094976, len=17, n/ep=4, n/st=64, player_1/loss=900.871, player_2/loss=48.449, rew=326.50]


Epoch #3999: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4000: 1025it [00:02, 358.63it/s, env_step=4096000, len=23, n/ep=2, n/st=64, player_1/loss=489.903, player_2/loss=50.566, rew=814.00]


Epoch #4000: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4001: 1025it [00:02, 361.67it/s, env_step=4097024, len=36, n/ep=2, n/st=64, player_1/loss=526.784, player_2/loss=95.682, rew=1331.00]


Epoch #4001: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4002: 1025it [00:02, 363.59it/s, env_step=4098048, len=27, n/ep=2, n/st=64, player_1/loss=457.061, player_2/loss=425.197, rew=854.00]


Epoch #4002: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4003: 1025it [00:02, 358.38it/s, env_step=4099072, len=16, n/ep=4, n/st=64, player_1/loss=359.784, player_2/loss=398.502, rew=294.00]


Epoch #4003: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4004: 1025it [00:02, 362.56it/s, env_step=4100096, len=16, n/ep=4, n/st=64, player_1/loss=286.273, player_2/loss=51.222, rew=319.00]


Epoch #4004: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4005: 1025it [00:02, 354.91it/s, env_step=4101120, len=21, n/ep=3, n/st=64, player_1/loss=418.819, player_2/loss=433.809, rew=477.33]


Epoch #4005: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4006: 1025it [00:02, 363.34it/s, env_step=4102144, len=32, n/ep=2, n/st=64, player_1/loss=334.855, player_2/loss=666.932, rew=1058.00]


Epoch #4006: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4007: 1025it [00:02, 356.76it/s, env_step=4103168, len=18, n/ep=4, n/st=64, player_1/loss=225.106, rew=440.00]  


Epoch #4007: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4008: 1025it [00:02, 362.18it/s, env_step=4104192, len=35, n/ep=2, n/st=64, player_1/loss=180.391, player_2/loss=618.415, rew=1322.00]


Epoch #4008: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4009: 1025it [00:02, 358.88it/s, env_step=4105216, len=27, n/ep=3, n/st=64, player_1/loss=240.565, player_2/loss=404.722, rew=786.00]


Epoch #4009: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4010: 1025it [00:02, 360.91it/s, env_step=4106240, len=22, n/ep=3, n/st=64, player_1/loss=412.165, player_2/loss=132.874, rew=554.67]


Epoch #4010: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4011: 1025it [00:02, 362.69it/s, env_step=4107264, len=19, n/ep=3, n/st=64, player_1/loss=475.041, player_2/loss=66.257, rew=380.67]


Epoch #4011: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4012: 1025it [00:02, 359.51it/s, env_step=4108288, len=18, n/ep=3, n/st=64, player_1/loss=318.628, player_2/loss=352.939, rew=462.67]


Epoch #4012: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4013: 1025it [00:02, 363.08it/s, env_step=4109312, len=15, n/ep=4, n/st=64, player_1/loss=494.237, player_2/loss=375.819, rew=255.00]


Epoch #4013: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4014: 1025it [00:02, 361.41it/s, env_step=4110336, len=24, n/ep=3, n/st=64, player_1/loss=512.165, player_2/loss=86.317, rew=703.33]


Epoch #4014: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4015: 1025it [00:02, 362.56it/s, env_step=4111360, len=21, n/ep=2, n/st=64, player_1/loss=290.221, player_2/loss=477.842, rew=488.00]


Epoch #4015: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4016: 1025it [00:02, 363.21it/s, env_step=4112384, len=18, n/ep=4, n/st=64, player_1/loss=368.746, player_2/loss=1166.562, rew=459.00]


Epoch #4016: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4017: 1025it [00:02, 358.88it/s, env_step=4113408, len=12, n/ep=6, n/st=64, player_1/loss=275.097, player_2/loss=915.967, rew=186.33]


Epoch #4017: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4018: 1025it [00:02, 363.46it/s, env_step=4114432, len=14, n/ep=5, n/st=64, player_1/loss=162.139, player_2/loss=356.707, rew=215.20]


Epoch #4018: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4019: 1025it [00:02, 362.56it/s, env_step=4115456, len=16, n/ep=4, n/st=64, player_1/loss=101.237, player_2/loss=152.599, rew=292.00]


Epoch #4019: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4020: 1025it [00:02, 359.01it/s, env_step=4116480, len=17, n/ep=4, n/st=64, player_1/loss=63.608, player_2/loss=339.197, rew=356.50]


Epoch #4020: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4021: 1025it [00:02, 362.31it/s, env_step=4117504, len=21, n/ep=3, n/st=64, player_1/loss=155.494, player_2/loss=534.530, rew=462.00]


Epoch #4021: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4022: 1025it [00:02, 357.13it/s, env_step=4118528, len=19, n/ep=3, n/st=64, player_1/loss=153.514, player_2/loss=464.832, rew=408.00]


Epoch #4022: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4023: 1025it [00:02, 364.24it/s, env_step=4119552, len=32, n/ep=2, n/st=64, player_1/loss=166.481, player_2/loss=366.654, rew=1087.00]


Epoch #4023: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4024: 1025it [00:02, 362.44it/s, env_step=4120576, len=15, n/ep=4, n/st=64, player_1/loss=177.748, player_2/loss=157.185, rew=381.00]


Epoch #4024: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4025: 1025it [00:02, 354.54it/s, env_step=4121600, len=26, n/ep=2, n/st=64, player_1/loss=240.074, player_2/loss=302.567, rew=709.00]


Epoch #4025: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4026: 1025it [00:02, 361.80it/s, env_step=4122624, len=34, n/ep=2, n/st=64, player_1/loss=302.574, player_2/loss=299.958, rew=1192.00]


Epoch #4026: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4027: 1025it [00:02, 357.76it/s, env_step=4123648, len=30, n/ep=3, n/st=64, player_1/loss=342.577, player_2/loss=266.554, rew=986.67]


Epoch #4027: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4028: 1025it [00:02, 354.42it/s, env_step=4124672, len=27, n/ep=3, n/st=64, player_1/loss=270.047, player_2/loss=274.396, rew=828.67]


Epoch #4028: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4029: 1025it [00:02, 358.26it/s, env_step=4125696, len=35, n/ep=2, n/st=64, player_1/loss=343.045, player_2/loss=407.116, rew=1300.00]


Epoch #4029: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4030: 1025it [00:02, 356.51it/s, env_step=4126720, len=32, n/ep=2, n/st=64, player_1/loss=375.263, player_2/loss=408.599, rew=1089.00]


Epoch #4030: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4031: 1025it [00:02, 360.91it/s, env_step=4127744, len=34, n/ep=2, n/st=64, player_1/loss=332.352, player_2/loss=380.950, rew=1223.00]


Epoch #4031: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4032: 1025it [00:02, 359.77it/s, env_step=4128768, len=26, n/ep=3, n/st=64, player_1/loss=291.598, player_2/loss=330.073, rew=777.33]


Epoch #4032: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4033: 1025it [00:02, 357.76it/s, env_step=4129792, len=21, n/ep=3, n/st=64, player_1/loss=804.685, player_2/loss=238.397, rew=492.00]


Epoch #4033: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4034: 1025it [00:02, 362.95it/s, env_step=4130816, len=22, n/ep=3, n/st=64, player_1/loss=975.360, player_2/loss=263.422, rew=639.33]


Epoch #4034: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4035: 1025it [00:02, 359.26it/s, env_step=4131840, len=14, n/ep=5, n/st=64, player_1/loss=343.860, player_2/loss=296.311, rew=260.00]


Epoch #4035: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4036: 1025it [00:02, 353.20it/s, env_step=4132864, len=33, n/ep=2, n/st=64, player_1/loss=221.059, player_2/loss=413.972, rew=1136.00]


Epoch #4036: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4037: 1025it [00:02, 362.31it/s, env_step=4133888, len=12, n/ep=5, n/st=64, player_1/loss=259.145, player_2/loss=305.445, rew=164.40]


Epoch #4037: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4038: 1025it [00:02, 361.80it/s, env_step=4134912, len=32, n/ep=2, n/st=64, player_1/loss=336.419, player_2/loss=123.039, rew=1093.00]


Epoch #4038: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4039: 1025it [00:02, 360.91it/s, env_step=4135936, len=19, n/ep=2, n/st=64, player_1/loss=217.010, player_2/loss=299.342, rew=470.00]


Epoch #4039: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4040: 1025it [00:02, 359.51it/s, env_step=4136960, len=33, n/ep=2, n/st=64, player_1/loss=117.617, player_2/loss=561.948, rew=1120.00]


Epoch #4040: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4041: 1025it [00:02, 357.38it/s, env_step=4137984, len=35, n/ep=2, n/st=64, player_1/loss=155.394, player_2/loss=417.762, rew=1306.00]


Epoch #4041: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4042: 1025it [00:02, 343.50it/s, env_step=4139008, len=34, n/ep=2, n/st=64, player_1/loss=139.529, player_2/loss=137.899, rew=1188.00]


Epoch #4042: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4043: 1025it [00:02, 360.91it/s, env_step=4140032, len=15, n/ep=4, n/st=64, player_1/loss=310.809, player_2/loss=110.836, rew=268.50]


Epoch #4043: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4044: 1025it [00:02, 358.26it/s, env_step=4141056, len=31, n/ep=2, n/st=64, player_1/loss=428.808, player_2/loss=279.236, rew=1034.00]


Epoch #4044: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4045: 1025it [00:02, 361.54it/s, env_step=4142080, len=19, n/ep=3, n/st=64, player_1/loss=545.572, player_2/loss=479.881, rew=405.33]


Epoch #4045: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4046: 1025it [00:02, 361.03it/s, env_step=4143104, len=19, n/ep=3, n/st=64, player_1/loss=354.003, player_2/loss=314.868, rew=404.67]


Epoch #4046: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4047: 1025it [00:02, 358.76it/s, env_step=4144128, len=26, n/ep=3, n/st=64, player_1/loss=337.976, player_2/loss=341.282, rew=892.00]


Epoch #4047: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4048: 1025it [00:02, 360.78it/s, env_step=4145152, len=17, n/ep=4, n/st=64, player_1/loss=457.509, rew=315.00]  


Epoch #4048: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4049: 1025it [00:02, 357.01it/s, env_step=4146176, len=21, n/ep=3, n/st=64, player_1/loss=406.136, player_2/loss=248.816, rew=486.00]


Epoch #4049: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4050: 1025it [00:02, 363.72it/s, env_step=4147200, len=19, n/ep=3, n/st=64, player_1/loss=366.076, player_2/loss=156.019, rew=408.67]


Epoch #4050: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4051: 1025it [00:02, 362.05it/s, env_step=4148224, len=17, n/ep=4, n/st=64, player_1/loss=267.046, player_2/loss=59.898, rew=361.50]


Epoch #4051: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4052: 1025it [00:02, 359.77it/s, env_step=4149248, len=21, n/ep=3, n/st=64, player_1/loss=273.982, player_2/loss=57.851, rew=570.00]


Epoch #4052: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4053: 1025it [00:02, 361.03it/s, env_step=4150272, len=15, n/ep=4, n/st=64, player_1/loss=340.934, player_2/loss=91.260, rew=356.50]


Epoch #4053: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4054: 1025it [00:02, 360.78it/s, env_step=4151296, len=19, n/ep=2, n/st=64, player_1/loss=429.522, player_2/loss=100.811, rew=508.00]


Epoch #4054: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4055: 1025it [00:02, 362.18it/s, env_step=4152320, len=24, n/ep=3, n/st=64, player_1/loss=233.725, player_2/loss=469.996, rew=742.67]


Epoch #4055: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4056: 1025it [00:02, 361.29it/s, env_step=4153344, len=29, n/ep=2, n/st=64, player_1/loss=221.591, player_2/loss=845.274, rew=872.00]


Epoch #4056: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4057: 1025it [00:02, 359.39it/s, env_step=4154368, len=28, n/ep=2, n/st=64, player_1/loss=371.918, player_2/loss=524.322, rew=810.00]


Epoch #4057: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4058: 1025it [00:02, 361.80it/s, env_step=4155392, len=26, n/ep=2, n/st=64, player_1/loss=533.882, player_2/loss=302.586, rew=747.00]


Epoch #4058: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4059: 1025it [00:02, 359.14it/s, env_step=4156416, len=17, n/ep=4, n/st=64, player_1/loss=600.743, player_2/loss=245.649, rew=312.50]


Epoch #4059: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4060: 1025it [00:02, 356.39it/s, env_step=4157440, len=36, n/ep=2, n/st=64, player_1/loss=309.774, player_2/loss=71.428, rew=1339.00]


Epoch #4060: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4061: 1025it [00:02, 360.91it/s, env_step=4158464, len=32, n/ep=2, n/st=64, player_1/loss=108.489, player_2/loss=66.493, rew=1087.00]


Epoch #4061: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4062: 1025it [00:02, 358.13it/s, env_step=4159488, len=32, n/ep=2, n/st=64, player_1/loss=302.407, player_2/loss=83.888, rew=1063.00]


Epoch #4062: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4063: 1025it [00:02, 359.26it/s, env_step=4160512, len=29, n/ep=3, n/st=64, player_1/loss=262.571, player_2/loss=78.283, rew=965.33]


Epoch #4063: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4064: 1025it [00:02, 361.67it/s, env_step=4161536, len=14, n/ep=5, n/st=64, player_1/loss=275.014, player_2/loss=81.757, rew=221.20]


Epoch #4064: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4065: 1025it [00:02, 360.65it/s, env_step=4162560, len=22, n/ep=2, n/st=64, player_1/loss=394.129, player_2/loss=81.871, rew=659.00]


Epoch #4065: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4066: 1025it [00:02, 358.13it/s, env_step=4163584, len=27, n/ep=2, n/st=64, player_1/loss=345.016, player_2/loss=128.794, rew=898.00]


Epoch #4066: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4067: 1025it [00:02, 362.05it/s, env_step=4164608, len=31, n/ep=2, n/st=64, player_1/loss=446.133, player_2/loss=125.512, rew=991.00]


Epoch #4067: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4068: 1025it [00:02, 359.14it/s, env_step=4165632, len=15, n/ep=4, n/st=64, player_1/loss=735.616, player_2/loss=68.283, rew=268.50]


Epoch #4068: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4069: 1025it [00:02, 363.72it/s, env_step=4166656, len=36, n/ep=2, n/st=64, player_1/loss=632.265, player_2/loss=435.348, rew=1367.00]


Epoch #4069: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4070: 1025it [00:02, 360.14it/s, env_step=4167680, len=9, n/ep=7, n/st=64, player_1/loss=533.133, player_2/loss=1055.134, rew=98.00]


Epoch #4070: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4071: 1025it [00:02, 356.14it/s, env_step=4168704, len=29, n/ep=2, n/st=64, player_1/loss=346.569, player_2/loss=1274.932, rew=918.00]


Epoch #4071: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4072: 1025it [00:02, 363.08it/s, env_step=4169728, len=13, n/ep=5, n/st=64, player_1/loss=278.345, player_2/loss=755.586, rew=215.60]


Epoch #4072: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4073: 1025it [00:02, 360.14it/s, env_step=4170752, len=12, n/ep=5, n/st=64, player_1/loss=255.095, player_2/loss=367.058, rew=207.60]


Epoch #4073: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4074: 1025it [00:02, 356.27it/s, env_step=4171776, len=21, n/ep=3, n/st=64, player_1/loss=309.232, player_2/loss=194.780, rew=537.33]


Epoch #4074: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4075: 1025it [00:02, 361.92it/s, env_step=4172800, len=33, n/ep=2, n/st=64, player_1/loss=546.138, player_2/loss=259.457, rew=1124.00]


Epoch #4075: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4076: 1025it [00:02, 362.05it/s, env_step=4173824, len=16, n/ep=4, n/st=64, player_1/loss=544.089, player_2/loss=179.146, rew=297.50]


Epoch #4076: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4077: 1025it [00:02, 356.14it/s, env_step=4174848, len=30, n/ep=2, n/st=64, player_1/loss=457.793, player_2/loss=426.018, rew=1072.00]


Epoch #4077: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4078: 1025it [00:02, 363.85it/s, env_step=4175872, len=21, n/ep=3, n/st=64, player_1/loss=116.439, player_2/loss=405.154, rew=544.00]


Epoch #4078: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4079: 1025it [00:02, 360.91it/s, env_step=4176896, len=18, n/ep=3, n/st=64, player_1/loss=253.312, player_2/loss=198.100, rew=368.00]


Epoch #4079: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4080: 1025it [00:02, 358.38it/s, env_step=4177920, len=21, n/ep=3, n/st=64, player_1/loss=683.367, player_2/loss=364.678, rew=480.67]


Epoch #4080: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4081: 1025it [00:02, 362.56it/s, env_step=4178944, len=15, n/ep=4, n/st=64, player_1/loss=532.903, player_2/loss=311.319, rew=263.50]


Epoch #4081: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4082: 1025it [00:02, 360.40it/s, env_step=4179968, len=19, n/ep=3, n/st=64, player_1/loss=590.610, player_2/loss=86.460, rew=378.67]


Epoch #4082: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4083: 1025it [00:02, 356.89it/s, env_step=4180992, len=14, n/ep=4, n/st=64, player_1/loss=450.734, player_2/loss=131.934, rew=225.50]


Epoch #4083: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4084: 1025it [00:02, 359.64it/s, env_step=4182016, len=29, n/ep=2, n/st=64, player_1/loss=189.034, player_2/loss=359.298, rew=949.00]


Epoch #4084: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4085: 1025it [00:02, 359.14it/s, env_step=4183040, len=16, n/ep=4, n/st=64, player_1/loss=50.419, player_2/loss=413.510, rew=280.50]


Epoch #4085: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4086: 1025it [00:02, 363.34it/s, env_step=4184064, len=21, n/ep=2, n/st=64, player_1/loss=138.853, player_2/loss=277.059, rew=461.00]


Epoch #4086: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4087: 1025it [00:02, 363.08it/s, env_step=4185088, len=19, n/ep=3, n/st=64, player_1/loss=174.137, player_2/loss=333.811, rew=388.67]


Epoch #4087: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4088: 1025it [00:02, 356.76it/s, env_step=4186112, len=18, n/ep=4, n/st=64, player_1/loss=275.875, player_2/loss=360.218, rew=413.50]


Epoch #4088: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4089: 1025it [00:02, 362.18it/s, env_step=4187136, len=27, n/ep=2, n/st=64, player_1/loss=309.632, player_2/loss=395.487, rew=898.00]


Epoch #4089: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4090: 1025it [00:02, 356.76it/s, env_step=4188160, len=22, n/ep=3, n/st=64, player_1/loss=192.061, player_2/loss=187.800, rew=554.00]


Epoch #4090: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4091: 1025it [00:02, 361.16it/s, env_step=4189184, len=27, n/ep=2, n/st=64, player_1/loss=194.570, player_2/loss=138.447, rew=754.00]


Epoch #4091: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4092: 1025it [00:02, 359.26it/s, env_step=4190208, len=28, n/ep=2, n/st=64, player_1/loss=320.350, player_2/loss=152.389, rew=881.00]


Epoch #4092: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4093: 1025it [00:02, 362.18it/s, env_step=4191232, len=33, n/ep=2, n/st=64, player_1/loss=346.350, player_2/loss=88.312, rew=1121.00]


Epoch #4093: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4094: 1025it [00:02, 361.67it/s, env_step=4192256, len=29, n/ep=2, n/st=64, player_1/loss=584.229, player_2/loss=72.314, rew=872.00]


Epoch #4094: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4095: 1025it [00:02, 358.01it/s, env_step=4193280, len=23, n/ep=2, n/st=64, player_1/loss=473.934, player_2/loss=844.434, rew=586.00]


Epoch #4095: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4096: 1025it [00:02, 360.27it/s, env_step=4194304, len=16, n/ep=4, n/st=64, player_1/loss=421.456, player_2/loss=1027.187, rew=393.00]


Epoch #4096: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4097: 1025it [00:02, 359.14it/s, env_step=4195328, len=16, n/ep=4, n/st=64, player_1/loss=193.046, player_2/loss=839.172, rew=293.50]


Epoch #4097: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4098: 1025it [00:02, 355.28it/s, env_step=4196352, len=24, n/ep=3, n/st=64, player_1/loss=190.949, player_2/loss=1240.785, rew=684.00]


Epoch #4098: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4099: 1025it [00:02, 361.03it/s, env_step=4197376, len=37, n/ep=2, n/st=64, player_1/loss=182.868, player_2/loss=1134.025, rew=1442.00]


Epoch #4099: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4100: 1025it [00:02, 361.29it/s, env_step=4198400, len=23, n/ep=2, n/st=64, player_1/loss=177.698, player_2/loss=573.009, rew=616.00]


Epoch #4100: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4101: 1025it [00:02, 358.26it/s, env_step=4199424, len=30, n/ep=2, n/st=64, player_1/loss=404.882, player_2/loss=338.005, rew=961.00]


Epoch #4101: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4102: 1025it [00:02, 360.91it/s, env_step=4200448, len=22, n/ep=3, n/st=64, player_1/loss=506.833, player_2/loss=231.427, rew=642.67]


Epoch #4102: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4103: 1025it [00:02, 364.89it/s, env_step=4201472, len=36, n/ep=2, n/st=64, player_1/loss=308.824, player_2/loss=96.666, rew=1331.00]


Epoch #4103: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4104: 1025it [00:02, 362.44it/s, env_step=4202496, len=32, n/ep=2, n/st=64, player_1/loss=185.778, player_2/loss=251.335, rew=1107.00]


Epoch #4104: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4105: 1025it [00:02, 360.27it/s, env_step=4203520, len=38, n/ep=2, n/st=64, player_1/loss=198.596, player_2/loss=313.751, rew=1511.00]


Epoch #4105: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4106: 1025it [00:02, 357.63it/s, env_step=4204544, len=37, n/ep=2, n/st=64, player_1/loss=269.237, player_2/loss=109.237, rew=1442.00]


Epoch #4106: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4107: 1025it [00:02, 360.14it/s, env_step=4205568, len=28, n/ep=2, n/st=64, player_1/loss=246.074, player_2/loss=96.510, rew=911.00]


Epoch #4107: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4108: 1025it [00:02, 361.29it/s, env_step=4206592, len=27, n/ep=2, n/st=64, player_1/loss=590.406, player_2/loss=251.922, rew=812.00]


Epoch #4108: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4109: 1025it [00:02, 361.54it/s, env_step=4207616, len=34, n/ep=2, n/st=64, player_1/loss=682.272, player_2/loss=273.099, rew=1235.00]


Epoch #4109: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4110: 1025it [00:02, 363.08it/s, env_step=4208640, len=28, n/ep=2, n/st=64, player_1/loss=280.927, player_2/loss=105.139, rew=851.00]


Epoch #4110: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4111: 1025it [00:02, 357.01it/s, env_step=4209664, len=25, n/ep=3, n/st=64, player_1/loss=114.853, player_2/loss=251.916, rew=798.67]


Epoch #4111: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4112: 1025it [00:02, 361.67it/s, env_step=4210688, len=23, n/ep=3, n/st=64, player_1/loss=143.231, player_2/loss=327.037, rew=584.67]


Epoch #4112: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4113: 1025it [00:02, 357.01it/s, env_step=4211712, len=34, n/ep=2, n/st=64, player_1/loss=434.736, player_2/loss=171.313, rew=1223.00]


Epoch #4113: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4114: 1025it [00:02, 363.21it/s, env_step=4212736, len=33, n/ep=2, n/st=64, player_1/loss=382.462, player_2/loss=148.585, rew=1124.00]


Epoch #4114: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4115: 1025it [00:02, 357.38it/s, env_step=4213760, len=26, n/ep=2, n/st=64, player_1/loss=161.835, player_2/loss=239.428, rew=701.00]


Epoch #4115: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4116: 1025it [00:02, 360.02it/s, env_step=4214784, len=25, n/ep=3, n/st=64, player_1/loss=346.308, player_2/loss=493.970, rew=738.67]


Epoch #4116: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4117: 1025it [00:02, 363.85it/s, env_step=4215808, len=26, n/ep=2, n/st=64, player_1/loss=662.097, rew=727.00]  


Epoch #4117: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4118: 1025it [00:02, 360.27it/s, env_step=4216832, len=17, n/ep=5, n/st=64, player_1/loss=591.610, player_2/loss=495.035, rew=442.00]


Epoch #4118: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4119: 1025it [00:02, 357.51it/s, env_step=4217856, len=14, n/ep=5, n/st=64, player_1/loss=570.932, player_2/loss=460.906, rew=230.80]


Epoch #4119: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4120: 1025it [00:02, 361.29it/s, env_step=4218880, len=13, n/ep=5, n/st=64, player_1/loss=464.577, player_2/loss=280.546, rew=203.60]


Epoch #4120: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4121: 1025it [00:02, 361.29it/s, env_step=4219904, len=32, n/ep=2, n/st=64, player_1/loss=276.479, player_2/loss=189.605, rew=1079.00]


Epoch #4121: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4122: 1025it [00:02, 362.44it/s, env_step=4220928, len=27, n/ep=3, n/st=64, player_1/loss=274.169, player_2/loss=68.844, rew=810.00]


Epoch #4122: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4123: 1025it [00:02, 361.92it/s, env_step=4221952, len=31, n/ep=2, n/st=64, player_1/loss=494.716, player_2/loss=90.994, rew=1015.00]


Epoch #4123: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4124: 1025it [00:02, 357.51it/s, env_step=4222976, len=34, n/ep=2, n/st=64, player_1/loss=547.017, player_2/loss=74.518, rew=1253.00]


Epoch #4124: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4125: 1025it [00:02, 360.65it/s, env_step=4224000, len=24, n/ep=3, n/st=64, player_1/loss=186.580, player_2/loss=615.865, rew=704.00]


Epoch #4125: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4126: 1025it [00:02, 360.40it/s, env_step=4225024, len=14, n/ep=5, n/st=64, player_1/loss=200.189, player_2/loss=917.563, rew=226.80]


Epoch #4126: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4127: 1025it [00:02, 361.54it/s, env_step=4226048, len=24, n/ep=3, n/st=64, player_1/loss=326.552, player_2/loss=604.891, rew=714.00]


Epoch #4127: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4128: 1025it [00:02, 361.16it/s, env_step=4227072, len=31, n/ep=2, n/st=64, player_1/loss=121.479, player_2/loss=371.645, rew=1028.00]


Epoch #4128: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4129: 1025it [00:02, 358.51it/s, env_step=4228096, len=28, n/ep=3, n/st=64, player_1/loss=494.634, player_2/loss=476.622, rew=838.67]


Epoch #4129: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4130: 1025it [00:02, 362.69it/s, env_step=4229120, len=31, n/ep=2, n/st=64, player_1/loss=751.804, player_2/loss=240.158, rew=1006.00]


Epoch #4130: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4131: 1025it [00:02, 353.56it/s, env_step=4230144, len=14, n/ep=4, n/st=64, player_1/loss=484.329, player_2/loss=260.589, rew=236.50]


Epoch #4131: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4132: 1025it [00:02, 361.41it/s, env_step=4231168, len=12, n/ep=4, n/st=64, player_1/loss=657.160, player_2/loss=326.173, rew=169.50]


Epoch #4132: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4133: 1025it [00:02, 356.64it/s, env_step=4232192, len=38, n/ep=2, n/st=64, player_1/loss=553.803, player_2/loss=372.308, rew=1521.00]


Epoch #4133: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4134: 1025it [00:02, 361.92it/s, env_step=4233216, len=15, n/ep=3, n/st=64, player_1/loss=395.791, player_2/loss=240.323, rew=240.67]


Epoch #4134: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4135: 1025it [00:02, 360.65it/s, env_step=4234240, len=27, n/ep=2, n/st=64, player_1/loss=718.539, player_2/loss=193.678, rew=782.00]


Epoch #4135: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4136: 1025it [00:02, 359.14it/s, env_step=4235264, len=12, n/ep=6, n/st=64, player_1/loss=710.315, player_2/loss=338.282, rew=180.67]


Epoch #4136: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4137: 1025it [00:02, 356.64it/s, env_step=4236288, len=26, n/ep=3, n/st=64, player_1/loss=434.720, player_2/loss=644.957, rew=722.00]


Epoch #4137: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4138: 1025it [00:02, 356.27it/s, env_step=4237312, len=20, n/ep=4, n/st=64, player_1/loss=274.559, player_2/loss=500.427, rew=526.50]


Epoch #4138: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4139: 1025it [00:02, 362.31it/s, env_step=4238336, len=12, n/ep=3, n/st=64, player_1/loss=175.647, player_2/loss=67.664, rew=163.33]


Epoch #4139: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4140: 1025it [00:02, 358.76it/s, env_step=4239360, len=29, n/ep=2, n/st=64, player_1/loss=265.800, player_2/loss=205.255, rew=884.00]


Epoch #4140: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4141: 1025it [00:02, 362.18it/s, env_step=4240384, len=26, n/ep=3, n/st=64, player_1/loss=255.310, player_2/loss=212.839, rew=746.67]


Epoch #4141: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4142: 1025it [00:02, 361.92it/s, env_step=4241408, len=19, n/ep=4, n/st=64, player_1/loss=490.175, player_2/loss=280.930, rew=404.00]


Epoch #4142: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4143: 1025it [00:02, 357.38it/s, env_step=4242432, len=8, n/ep=6, n/st=64, player_1/loss=475.824, player_2/loss=264.217, rew=82.00]


Epoch #4143: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4144: 1025it [00:02, 360.78it/s, env_step=4243456, len=16, n/ep=2, n/st=64, player_1/loss=128.749, player_2/loss=351.636, rew=279.00]


Epoch #4144: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4145: 1025it [00:02, 361.67it/s, env_step=4244480, len=24, n/ep=3, n/st=64, player_1/loss=334.499, player_2/loss=460.357, rew=736.67]


Epoch #4145: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4146: 1025it [00:02, 363.46it/s, env_step=4245504, len=14, n/ep=5, n/st=64, player_1/loss=358.326, player_2/loss=163.964, rew=214.80]


Epoch #4146: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4147: 1025it [00:02, 359.77it/s, env_step=4246528, len=22, n/ep=2, n/st=64, player_1/loss=201.952, player_2/loss=54.318, rew=659.00]


Epoch #4147: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4148: 1025it [00:02, 359.14it/s, env_step=4247552, len=19, n/ep=2, n/st=64, player_1/loss=156.199, player_2/loss=479.370, rew=403.00]


Epoch #4148: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4149: 1025it [00:02, 359.26it/s, env_step=4248576, len=14, n/ep=4, n/st=64, player_1/loss=306.726, player_2/loss=511.219, rew=213.00]


Epoch #4149: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4150: 1025it [00:02, 362.05it/s, env_step=4249600, len=24, n/ep=3, n/st=64, player_1/loss=334.703, player_2/loss=307.207, rew=690.67]


Epoch #4150: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4151: 1025it [00:02, 360.78it/s, env_step=4250624, len=22, n/ep=4, n/st=64, player_1/loss=317.812, player_2/loss=757.282, rew=627.50]


Epoch #4151: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4152: 1025it [00:02, 360.52it/s, env_step=4251648, len=23, n/ep=3, n/st=64, player_1/loss=292.300, player_2/loss=1059.581, rew=568.00]


Epoch #4152: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4153: 1025it [00:02, 359.89it/s, env_step=4252672, len=25, n/ep=3, n/st=64, player_1/loss=343.911, player_2/loss=519.613, rew=728.67]


Epoch #4153: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4154: 1025it [00:02, 360.14it/s, env_step=4253696, len=22, n/ep=3, n/st=64, player_1/loss=273.301, player_2/loss=331.546, rew=578.00]


Epoch #4154: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4155: 1025it [00:02, 360.65it/s, env_step=4254720, len=15, n/ep=5, n/st=64, player_1/loss=340.318, player_2/loss=222.456, rew=273.60]


Epoch #4155: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4156: 1025it [00:02, 362.44it/s, env_step=4255744, len=27, n/ep=2, n/st=64, player_1/loss=383.633, player_2/loss=297.225, rew=854.00]


Epoch #4156: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4157: 1025it [00:02, 361.67it/s, env_step=4256768, len=25, n/ep=3, n/st=64, player_1/loss=131.253, player_2/loss=619.992, rew=792.67]


Epoch #4157: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4158: 1025it [00:02, 355.65it/s, env_step=4257792, len=16, n/ep=4, n/st=64, player_1/loss=135.878, player_2/loss=429.847, rew=332.00]


Epoch #4158: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4159: 1025it [00:02, 359.77it/s, env_step=4258816, len=9, n/ep=4, n/st=64, player_1/loss=170.545, player_2/loss=223.486, rew=93.50]


Epoch #4159: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4160: 1025it [00:02, 361.03it/s, env_step=4259840, len=16, n/ep=3, n/st=64, player_1/loss=323.877, player_2/loss=359.658, rew=368.00]


Epoch #4160: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4161: 1025it [00:02, 360.40it/s, env_step=4260864, len=26, n/ep=2, n/st=64, player_1/loss=301.806, player_2/loss=234.394, rew=701.00]


Epoch #4161: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4162: 1025it [00:02, 364.37it/s, env_step=4261888, len=31, n/ep=2, n/st=64, player_2/loss=52.862, rew=1022.00]  


Epoch #4162: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4163: 1025it [00:02, 359.77it/s, env_step=4262912, len=27, n/ep=2, n/st=64, player_1/loss=136.798, player_2/loss=539.290, rew=835.00]


Epoch #4163: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4164: 1025it [00:02, 361.41it/s, env_step=4263936, len=24, n/ep=2, n/st=64, player_1/loss=285.155, player_2/loss=644.344, rew=755.00]


Epoch #4164: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4165: 1025it [00:02, 361.41it/s, env_step=4264960, len=21, n/ep=3, n/st=64, player_1/loss=303.479, player_2/loss=862.826, rew=476.67]


Epoch #4165: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4166: 1025it [00:02, 359.64it/s, env_step=4265984, len=27, n/ep=3, n/st=64, player_1/loss=278.253, player_2/loss=581.511, rew=835.33]


Epoch #4166: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4167: 1025it [00:02, 356.27it/s, env_step=4267008, len=14, n/ep=4, n/st=64, player_1/loss=237.960, player_2/loss=562.731, rew=256.50]


Epoch #4167: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4168: 1025it [00:02, 359.77it/s, env_step=4268032, len=29, n/ep=2, n/st=64, player_1/loss=272.729, player_2/loss=273.733, rew=872.00]


Epoch #4168: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4169: 1025it [00:02, 361.80it/s, env_step=4269056, len=17, n/ep=2, n/st=64, player_1/loss=312.779, player_2/loss=272.353, rew=342.00]


Epoch #4169: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4170: 1025it [00:02, 362.56it/s, env_step=4270080, len=32, n/ep=2, n/st=64, player_1/loss=249.733, player_2/loss=278.057, rew=1070.00]


Epoch #4170: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4171: 1025it [00:02, 360.27it/s, env_step=4271104, len=21, n/ep=3, n/st=64, player_1/loss=450.015, player_2/loss=308.032, rew=478.00]


Epoch #4171: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4172: 1025it [00:02, 358.26it/s, env_step=4272128, len=13, n/ep=5, n/st=64, player_1/loss=424.994, player_2/loss=512.912, rew=228.80]


Epoch #4172: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4173: 1025it [00:02, 360.52it/s, env_step=4273152, len=22, n/ep=3, n/st=64, player_1/loss=244.851, player_2/loss=578.054, rew=637.33]


Epoch #4173: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4174: 1025it [00:02, 361.54it/s, env_step=4274176, len=15, n/ep=3, n/st=64, player_1/loss=405.744, player_2/loss=210.512, rew=260.00]


Epoch #4174: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4175: 1025it [00:02, 360.52it/s, env_step=4275200, len=20, n/ep=3, n/st=64, player_1/loss=414.543, player_2/loss=221.921, rew=514.00]


Epoch #4175: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4176: 1025it [00:02, 361.03it/s, env_step=4276224, len=34, n/ep=1, n/st=64, player_1/loss=373.297, player_2/loss=299.929, rew=1188.00]


Epoch #4176: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4177: 1025it [00:02, 353.81it/s, env_step=4277248, len=22, n/ep=3, n/st=64, player_1/loss=480.035, player_2/loss=417.599, rew=581.33]


Epoch #4177: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4178: 1025it [00:02, 360.14it/s, env_step=4278272, len=19, n/ep=5, n/st=64, player_1/loss=374.622, player_2/loss=720.479, rew=586.40]


Epoch #4178: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4179: 1025it [00:02, 359.01it/s, env_step=4279296, len=11, n/ep=4, n/st=64, player_1/loss=230.183, player_2/loss=784.867, rew=180.00]


Epoch #4179: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4180: 1025it [00:02, 359.77it/s, env_step=4280320, len=20, n/ep=3, n/st=64, player_1/loss=336.323, player_2/loss=490.217, rew=596.67]


Epoch #4180: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4181: 1025it [00:02, 360.65it/s, env_step=4281344, len=20, n/ep=4, n/st=64, player_1/loss=285.649, player_2/loss=527.542, rew=571.50]


Epoch #4181: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4182: 1025it [00:02, 358.38it/s, env_step=4282368, len=19, n/ep=3, n/st=64, player_1/loss=84.912, player_2/loss=734.350, rew=584.67]


Epoch #4182: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4183: 1025it [00:02, 358.51it/s, env_step=4283392, len=31, n/ep=2, n/st=64, player_2/loss=339.795, rew=1006.00] 


Epoch #4183: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4184: 1025it [00:02, 361.67it/s, env_step=4284416, len=36, n/ep=2, n/st=64, player_1/loss=415.636, player_2/loss=245.440, rew=1373.00]


Epoch #4184: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4185: 1025it [00:02, 362.18it/s, env_step=4285440, len=26, n/ep=2, n/st=64, player_1/loss=314.319, player_2/loss=918.117, rew=747.00]


Epoch #4185: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4186: 1025it [00:02, 360.27it/s, env_step=4286464, len=19, n/ep=4, n/st=64, player_1/loss=267.047, player_2/loss=956.289, rew=408.50]


Epoch #4186: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4187: 1025it [00:02, 358.76it/s, env_step=4287488, len=16, n/ep=4, n/st=64, player_1/loss=452.691, player_2/loss=899.225, rew=289.50]


Epoch #4187: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4188: 1025it [00:02, 359.89it/s, env_step=4288512, len=24, n/ep=3, n/st=64, player_1/loss=394.347, player_2/loss=370.749, rew=646.67]


Epoch #4188: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4189: 1025it [00:02, 358.63it/s, env_step=4289536, len=21, n/ep=3, n/st=64, player_1/loss=367.105, player_2/loss=434.709, rew=512.00]


Epoch #4189: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4190: 1025it [00:02, 360.27it/s, env_step=4290560, len=29, n/ep=2, n/st=64, player_1/loss=351.948, player_2/loss=834.463, rew=877.00]


Epoch #4190: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4191: 1025it [00:02, 361.92it/s, env_step=4291584, len=11, n/ep=6, n/st=64, player_1/loss=651.634, player_2/loss=1095.342, rew=155.00]


Epoch #4191: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4192: 1025it [00:02, 357.13it/s, env_step=4292608, len=22, n/ep=3, n/st=64, player_1/loss=665.162, player_2/loss=698.811, rew=506.67]


Epoch #4192: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4193: 1025it [00:02, 358.88it/s, env_step=4293632, len=27, n/ep=3, n/st=64, player_1/loss=353.528, player_2/loss=871.158, rew=760.00]


Epoch #4193: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4194: 1025it [00:02, 360.91it/s, env_step=4294656, len=29, n/ep=2, n/st=64, player_1/loss=617.686, player_2/loss=779.421, rew=940.00]


Epoch #4194: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4195: 1025it [00:02, 360.02it/s, env_step=4295680, len=24, n/ep=2, n/st=64, player_1/loss=818.661, player_2/loss=657.147, rew=713.00]


Epoch #4195: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4196: 1025it [00:02, 359.51it/s, env_step=4296704, len=32, n/ep=3, n/st=64, player_1/loss=407.390, player_2/loss=204.215, rew=1108.67]


Epoch #4196: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4197: 1025it [00:02, 360.52it/s, env_step=4297728, len=22, n/ep=3, n/st=64, player_1/loss=549.742, player_2/loss=95.269, rew=520.67]


Epoch #4197: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4198: 1025it [00:02, 362.95it/s, env_step=4298752, len=21, n/ep=2, n/st=64, player_1/loss=566.834, player_2/loss=77.529, rew=482.00]


Epoch #4198: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4199: 1025it [00:02, 360.78it/s, env_step=4299776, len=30, n/ep=2, n/st=64, player_1/loss=292.715, player_2/loss=52.527, rew=1015.00]


Epoch #4199: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4200: 1025it [00:02, 360.40it/s, env_step=4300800, len=17, n/ep=4, n/st=64, player_1/loss=276.745, player_2/loss=166.707, rew=316.00]


Epoch #4200: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4201: 1025it [00:02, 360.27it/s, env_step=4301824, len=30, n/ep=2, n/st=64, player_1/loss=286.123, rew=959.00]  


Epoch #4201: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4202: 1025it [00:02, 355.52it/s, env_step=4302848, len=27, n/ep=2, n/st=64, player_1/loss=270.912, player_2/loss=470.823, rew=824.00]


Epoch #4202: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4203: 1025it [00:02, 359.26it/s, env_step=4303872, len=31, n/ep=2, n/st=64, player_1/loss=178.124, player_2/loss=134.902, rew=999.00]


Epoch #4203: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4204: 1025it [00:02, 361.03it/s, env_step=4304896, len=30, n/ep=2, n/st=64, player_1/loss=141.890, player_2/loss=87.807, rew=932.00]


Epoch #4204: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4205: 1025it [00:02, 360.91it/s, env_step=4305920, len=26, n/ep=3, n/st=64, player_1/loss=401.884, player_2/loss=89.986, rew=752.67]


Epoch #4205: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4206: 1025it [00:02, 358.51it/s, env_step=4306944, len=20, n/ep=4, n/st=64, player_1/loss=369.029, player_2/loss=93.071, rew=476.50]


Epoch #4206: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4207: 1025it [00:02, 354.79it/s, env_step=4307968, len=32, n/ep=2, n/st=64, player_1/loss=320.287, player_2/loss=133.011, rew=1055.00]


Epoch #4207: test_reward: 1720.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4208: 1025it [00:02, 361.92it/s, env_step=4308992, len=33, n/ep=2, n/st=64, player_1/loss=383.483, player_2/loss=241.934, rew=1154.00]


Epoch #4208: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4209: 1025it [00:02, 357.26it/s, env_step=4310016, len=22, n/ep=2, n/st=64, player_1/loss=163.301, player_2/loss=256.386, rew=505.00]


Epoch #4209: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4210: 1025it [00:02, 359.26it/s, env_step=4311040, len=33, n/ep=1, n/st=64, player_1/loss=451.656, player_2/loss=242.582, rew=1120.00]


Epoch #4210: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4211: 1025it [00:02, 357.76it/s, env_step=4312064, len=25, n/ep=3, n/st=64, player_1/loss=488.688, rew=792.67]  


Epoch #4211: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4212: 1025it [00:02, 363.46it/s, env_step=4313088, len=23, n/ep=3, n/st=64, player_1/loss=324.479, player_2/loss=196.075, rew=588.00]


Epoch #4212: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4213: 1025it [00:02, 360.91it/s, env_step=4314112, len=30, n/ep=2, n/st=64, player_1/loss=357.701, player_2/loss=263.700, rew=992.00]


Epoch #4213: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4214: 1025it [00:02, 359.89it/s, env_step=4315136, len=35, n/ep=1, n/st=64, player_1/loss=276.397, player_2/loss=311.245, rew=1258.00]


Epoch #4214: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4215: 1025it [00:02, 361.29it/s, env_step=4316160, len=29, n/ep=3, n/st=64, player_1/loss=378.513, player_2/loss=284.680, rew=966.67]


Epoch #4215: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4216: 1025it [00:02, 360.02it/s, env_step=4317184, len=32, n/ep=3, n/st=64, player_1/loss=476.649, player_2/loss=166.185, rew=1094.67]


Epoch #4216: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4217: 1025it [00:02, 361.80it/s, env_step=4318208, len=34, n/ep=2, n/st=64, player_1/loss=633.924, player_2/loss=336.538, rew=1225.00]


Epoch #4217: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4218: 1025it [00:02, 358.51it/s, env_step=4319232, len=15, n/ep=4, n/st=64, player_1/loss=575.476, player_2/loss=483.365, rew=375.50]


Epoch #4218: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4219: 1025it [00:02, 360.91it/s, env_step=4320256, len=18, n/ep=3, n/st=64, player_1/loss=403.274, player_2/loss=545.304, rew=394.67]


Epoch #4219: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4220: 1025it [00:02, 361.29it/s, env_step=4321280, len=29, n/ep=3, n/st=64, player_1/loss=479.199, player_2/loss=251.619, rew=988.67]


Epoch #4220: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4221: 1025it [00:02, 358.26it/s, env_step=4322304, len=34, n/ep=1, n/st=64, player_1/loss=360.136, player_2/loss=82.035, rew=1188.00]


Epoch #4221: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4222: 1025it [00:02, 362.95it/s, env_step=4323328, len=22, n/ep=4, n/st=64, player_1/loss=123.992, player_2/loss=103.435, rew=724.00]


Epoch #4222: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4223: 1025it [00:02, 361.92it/s, env_step=4324352, len=21, n/ep=3, n/st=64, player_1/loss=264.756, player_2/loss=80.914, rew=578.67]


Epoch #4223: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4224: 1025it [00:02, 356.02it/s, env_step=4325376, len=33, n/ep=2, n/st=64, player_1/loss=448.049, player_2/loss=228.511, rew=1145.00]


Epoch #4224: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4225: 1025it [00:02, 360.14it/s, env_step=4326400, len=21, n/ep=2, n/st=64, player_1/loss=355.294, player_2/loss=361.790, rew=592.00]


Epoch #4225: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4226: 1025it [00:02, 358.26it/s, env_step=4327424, len=37, n/ep=2, n/st=64, player_1/loss=860.392, player_2/loss=177.328, rew=1442.00]


Epoch #4226: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4227: 1025it [00:02, 360.02it/s, env_step=4328448, len=26, n/ep=3, n/st=64, player_1/loss=1190.388, player_2/loss=130.323, rew=822.00]


Epoch #4227: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4228: 1025it [00:02, 362.69it/s, env_step=4329472, len=13, n/ep=5, n/st=64, player_1/loss=575.104, player_2/loss=117.819, rew=260.80]


Epoch #4228: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4229: 1025it [00:02, 359.64it/s, env_step=4330496, len=23, n/ep=3, n/st=64, player_1/loss=397.565, player_2/loss=423.184, rew=679.33]


Epoch #4229: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4230: 1025it [00:02, 362.18it/s, env_step=4331520, len=30, n/ep=2, n/st=64, player_1/loss=619.823, player_2/loss=386.621, rew=1049.00]


Epoch #4230: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4231: 1025it [00:02, 358.38it/s, env_step=4332544, len=26, n/ep=2, n/st=64, player_1/loss=462.718, player_2/loss=237.974, rew=764.00]


Epoch #4231: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4232: 1025it [00:02, 360.02it/s, env_step=4333568, len=24, n/ep=3, n/st=64, player_1/loss=315.043, player_2/loss=243.775, rew=626.67]


Epoch #4232: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4233: 1025it [00:02, 359.26it/s, env_step=4334592, len=26, n/ep=2, n/st=64, player_1/loss=325.036, player_2/loss=232.871, rew=967.00]


Epoch #4233: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4234: 1025it [00:02, 354.66it/s, env_step=4335616, len=30, n/ep=3, n/st=64, player_1/loss=630.106, player_2/loss=644.737, rew=950.00]


Epoch #4234: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4235: 1025it [00:02, 355.65it/s, env_step=4336640, len=29, n/ep=2, n/st=64, player_2/loss=1315.454, rew=898.00] 


Epoch #4235: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4236: 1025it [00:02, 359.14it/s, env_step=4337664, len=32, n/ep=2, n/st=64, player_1/loss=150.041, player_2/loss=860.492, rew=1063.00]


Epoch #4236: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4237: 1025it [00:02, 359.39it/s, env_step=4338688, len=31, n/ep=2, n/st=64, player_1/loss=241.262, player_2/loss=206.926, rew=1022.00]


Epoch #4237: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4238: 1025it [00:02, 359.26it/s, env_step=4339712, len=22, n/ep=2, n/st=64, player_1/loss=437.648, player_2/loss=349.641, rew=625.00]


Epoch #4238: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4239: 1025it [00:02, 358.26it/s, env_step=4340736, len=31, n/ep=2, n/st=64, player_1/loss=472.336, player_2/loss=242.517, rew=1022.00]


Epoch #4239: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4240: 1025it [00:02, 356.76it/s, env_step=4341760, len=24, n/ep=3, n/st=64, player_1/loss=505.412, player_2/loss=205.173, rew=755.33]


Epoch #4240: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4241: 1025it [00:02, 359.51it/s, env_step=4342784, len=15, n/ep=4, n/st=64, player_1/loss=499.661, player_2/loss=141.723, rew=242.50]


Epoch #4241: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4242: 1025it [00:02, 359.39it/s, env_step=4343808, len=24, n/ep=3, n/st=64, player_1/loss=277.639, player_2/loss=169.676, rew=682.00]


Epoch #4242: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4243: 1025it [00:02, 361.67it/s, env_step=4344832, len=19, n/ep=3, n/st=64, player_1/loss=242.625, player_2/loss=86.484, rew=493.33]


Epoch #4243: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4244: 1025it [00:02, 361.54it/s, env_step=4345856, len=31, n/ep=2, n/st=64, player_1/loss=594.282, player_2/loss=127.777, rew=994.00]


Epoch #4244: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4245: 1025it [00:02, 355.89it/s, env_step=4346880, len=21, n/ep=3, n/st=64, player_1/loss=774.272, player_2/loss=520.142, rew=550.67]


Epoch #4245: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4246: 1025it [00:02, 361.54it/s, env_step=4347904, len=17, n/ep=3, n/st=64, player_1/loss=455.196, player_2/loss=606.248, rew=342.00]


Epoch #4246: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4247: 1025it [00:02, 358.63it/s, env_step=4348928, len=20, n/ep=3, n/st=64, player_1/loss=443.406, player_2/loss=242.418, rew=596.67]


Epoch #4247: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4248: 1025it [00:02, 354.79it/s, env_step=4349952, len=20, n/ep=3, n/st=64, player_1/loss=432.897, player_2/loss=80.366, rew=546.67]


Epoch #4248: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4249: 1025it [00:02, 356.51it/s, env_step=4350976, len=14, n/ep=5, n/st=64, player_1/loss=362.105, player_2/loss=58.166, rew=228.00]


Epoch #4249: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4250: 1025it [00:02, 361.03it/s, env_step=4352000, len=21, n/ep=3, n/st=64, player_1/loss=524.669, player_2/loss=129.785, rew=580.67]


Epoch #4250: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4251: 1025it [00:02, 361.41it/s, env_step=4353024, len=35, n/ep=2, n/st=64, player_1/loss=360.833, player_2/loss=475.240, rew=1300.00]


Epoch #4251: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4252: 1025it [00:02, 359.51it/s, env_step=4354048, len=18, n/ep=3, n/st=64, player_1/loss=68.502, player_2/loss=452.451, rew=460.00]


Epoch #4252: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4253: 1025it [00:02, 360.40it/s, env_step=4355072, len=40, n/ep=1, n/st=64, player_1/loss=81.868, player_2/loss=271.936, rew=1638.00]


Epoch #4253: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4254: 1025it [00:02, 359.26it/s, env_step=4356096, len=36, n/ep=1, n/st=64, player_1/loss=234.108, player_2/loss=291.630, rew=1330.00]


Epoch #4254: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4255: 1025it [00:02, 360.40it/s, env_step=4357120, len=42, n/ep=2, n/st=64, player_1/loss=236.915, player_2/loss=216.236, rew=1819.00]


Epoch #4255: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4256: 1025it [00:02, 364.11it/s, env_step=4358144, len=37, n/ep=1, n/st=64, player_1/loss=216.028, player_2/loss=177.303, rew=1404.00]


Epoch #4256: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4257: 1025it [00:02, 361.03it/s, env_step=4359168, len=33, n/ep=2, n/st=64, player_1/loss=167.704, player_2/loss=126.868, rew=1121.00]


Epoch #4257: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4258: 1025it [00:02, 358.63it/s, env_step=4360192, len=33, n/ep=2, n/st=64, player_1/loss=454.515, player_2/loss=218.866, rew=1154.00]


Epoch #4258: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4259: 1025it [00:02, 359.14it/s, env_step=4361216, len=24, n/ep=3, n/st=64, player_1/loss=570.949, player_2/loss=184.473, rew=678.67]


Epoch #4259: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4260: 1025it [00:02, 361.03it/s, env_step=4362240, len=33, n/ep=2, n/st=64, player_1/loss=492.603, player_2/loss=283.466, rew=1121.00]


Epoch #4260: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4261: 1025it [00:02, 358.13it/s, env_step=4363264, len=31, n/ep=2, n/st=64, player_1/loss=279.321, player_2/loss=642.359, rew=1024.00]


Epoch #4261: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4262: 1025it [00:02, 356.39it/s, env_step=4364288, len=21, n/ep=2, n/st=64, player_1/loss=95.755, player_2/loss=556.090, rew=592.00]


Epoch #4262: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4263: 1025it [00:02, 360.14it/s, env_step=4365312, len=21, n/ep=3, n/st=64, player_1/loss=347.216, player_2/loss=739.876, rew=462.00]


Epoch #4263: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4264: 1025it [00:02, 360.78it/s, env_step=4366336, len=26, n/ep=3, n/st=64, player_1/loss=451.835, player_2/loss=643.684, rew=920.00]


Epoch #4264: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4265: 1025it [00:02, 360.40it/s, env_step=4367360, len=32, n/ep=2, n/st=64, player_1/loss=359.216, player_2/loss=794.130, rew=1055.00]


Epoch #4265: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4266: 1025it [00:02, 360.52it/s, env_step=4368384, len=22, n/ep=2, n/st=64, player_1/loss=239.404, player_2/loss=740.087, rew=673.00]


Epoch #4266: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4267: 1025it [00:02, 359.51it/s, env_step=4369408, len=28, n/ep=2, n/st=64, player_1/loss=248.933, player_2/loss=517.919, rew=881.00]


Epoch #4267: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4268: 1025it [00:02, 360.02it/s, env_step=4370432, len=26, n/ep=2, n/st=64, player_1/loss=268.922, player_2/loss=286.238, rew=725.00]


Epoch #4268: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4269: 1025it [00:02, 361.03it/s, env_step=4371456, len=42, n/ep=1, n/st=64, player_1/loss=283.982, player_2/loss=193.082, rew=1834.00]


Epoch #4269: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4270: 1025it [00:02, 360.65it/s, env_step=4372480, len=21, n/ep=2, n/st=64, player_1/loss=198.170, player_2/loss=831.374, rew=464.00]


Epoch #4270: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4271: 1025it [00:02, 356.02it/s, env_step=4373504, len=34, n/ep=2, n/st=64, player_1/loss=240.697, player_2/loss=402.556, rew=1188.00]


Epoch #4271: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4272: 1025it [00:02, 363.72it/s, env_step=4374528, len=22, n/ep=3, n/st=64, player_1/loss=244.913, player_2/loss=284.463, rew=562.00]


Epoch #4272: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4273: 1025it [00:02, 360.40it/s, env_step=4375552, len=30, n/ep=2, n/st=64, player_1/loss=92.708, player_2/loss=266.892, rew=971.00]


Epoch #4273: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4274: 1025it [00:02, 361.92it/s, env_step=4376576, len=17, n/ep=3, n/st=64, player_1/loss=49.379, player_2/loss=171.791, rew=478.00]


Epoch #4274: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4275: 1025it [00:02, 361.67it/s, env_step=4377600, len=33, n/ep=2, n/st=64, player_1/loss=36.291, player_2/loss=187.592, rew=1166.00]


Epoch #4275: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4276: 1025it [00:02, 356.51it/s, env_step=4378624, len=32, n/ep=2, n/st=64, player_1/loss=109.017, player_2/loss=103.676, rew=1055.00]


Epoch #4276: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4277: 1025it [00:02, 358.88it/s, env_step=4379648, len=26, n/ep=2, n/st=64, player_1/loss=271.407, player_2/loss=155.339, rew=764.00]


Epoch #4277: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4278: 1025it [00:02, 360.78it/s, env_step=4380672, len=37, n/ep=2, n/st=64, player_1/loss=220.674, player_2/loss=140.759, rew=1442.00]


Epoch #4278: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4279: 1025it [00:02, 361.41it/s, env_step=4381696, len=40, n/ep=1, n/st=64, player_1/loss=199.548, player_2/loss=513.447, rew=1638.00]


Epoch #4279: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4280: 1025it [00:02, 352.59it/s, env_step=4382720, len=27, n/ep=2, n/st=64, player_1/loss=458.001, player_2/loss=586.779, rew=779.00]


Epoch #4280: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4281: 1025it [00:02, 359.77it/s, env_step=4383744, len=18, n/ep=4, n/st=64, player_1/loss=459.479, player_2/loss=354.459, rew=369.00]


Epoch #4281: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4282: 1025it [00:02, 359.01it/s, env_step=4384768, len=31, n/ep=2, n/st=64, player_1/loss=186.429, player_2/loss=494.608, rew=1034.00]


Epoch #4282: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4283: 1025it [00:02, 361.67it/s, env_step=4385792, len=26, n/ep=3, n/st=64, player_1/loss=197.064, player_2/loss=426.636, rew=893.33]


Epoch #4283: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4284: 1025it [00:02, 361.80it/s, env_step=4386816, len=36, n/ep=1, n/st=64, player_1/loss=463.888, player_2/loss=293.406, rew=1330.00]


Epoch #4284: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4285: 1025it [00:02, 356.89it/s, env_step=4387840, len=34, n/ep=2, n/st=64, player_1/loss=410.040, player_2/loss=171.052, rew=1192.00]


Epoch #4285: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4286: 1025it [00:02, 360.02it/s, env_step=4388864, len=34, n/ep=1, n/st=64, player_1/loss=103.574, player_2/loss=392.329, rew=1188.00]


Epoch #4286: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4287: 1025it [00:02, 359.64it/s, env_step=4389888, len=29, n/ep=3, n/st=64, player_1/loss=141.962, player_2/loss=519.653, rew=922.67]


Epoch #4287: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4288: 1025it [00:02, 360.91it/s, env_step=4390912, len=36, n/ep=1, n/st=64, player_1/loss=194.581, player_2/loss=341.859, rew=1330.00]


Epoch #4288: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4289: 1025it [00:02, 362.05it/s, env_step=4391936, len=34, n/ep=2, n/st=64, player_1/loss=112.992, player_2/loss=88.101, rew=1225.00]


Epoch #4289: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4290: 1025it [00:02, 356.76it/s, env_step=4392960, len=29, n/ep=2, n/st=64, player_2/loss=121.618, rew=904.00]  


Epoch #4290: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4291: 1025it [00:02, 360.02it/s, env_step=4393984, len=35, n/ep=2, n/st=64, player_1/loss=358.209, player_2/loss=127.625, rew=1258.00]


Epoch #4291: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4292: 1025it [00:02, 360.91it/s, env_step=4395008, len=21, n/ep=2, n/st=64, player_1/loss=191.471, player_2/loss=413.786, rew=524.00]


Epoch #4292: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4293: 1025it [00:02, 359.01it/s, env_step=4396032, len=14, n/ep=4, n/st=64, player_1/loss=174.535, player_2/loss=364.402, rew=228.00]


Epoch #4293: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4294: 1025it [00:02, 353.44it/s, env_step=4397056, len=21, n/ep=3, n/st=64, player_1/loss=212.681, rew=532.00]  


Epoch #4294: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4295: 1025it [00:02, 363.72it/s, env_step=4398080, len=19, n/ep=3, n/st=64, player_1/loss=347.374, player_2/loss=306.641, rew=394.00]


Epoch #4295: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4296: 1025it [00:02, 361.67it/s, env_step=4399104, len=30, n/ep=2, n/st=64, player_1/loss=309.704, player_2/loss=372.346, rew=989.00]


Epoch #4296: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4297: 1025it [00:02, 363.08it/s, env_step=4400128, len=15, n/ep=3, n/st=64, player_1/loss=296.993, player_2/loss=555.302, rew=295.33]


Epoch #4297: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4298: 1025it [00:02, 356.27it/s, env_step=4401152, len=14, n/ep=5, n/st=64, player_1/loss=517.552, player_2/loss=668.082, rew=218.00]


Epoch #4298: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4299: 1025it [00:02, 360.02it/s, env_step=4402176, len=26, n/ep=3, n/st=64, player_1/loss=823.984, player_2/loss=543.612, rew=891.33]


Epoch #4299: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4300: 1025it [00:02, 358.88it/s, env_step=4403200, len=15, n/ep=5, n/st=64, player_1/loss=557.590, player_2/loss=684.817, rew=340.40]


Epoch #4300: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4301: 1025it [00:02, 362.18it/s, env_step=4404224, len=21, n/ep=2, n/st=64, player_1/loss=191.388, player_2/loss=568.208, rew=592.00]


Epoch #4301: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4302: 1025it [00:02, 359.64it/s, env_step=4405248, len=40, n/ep=1, n/st=64, player_1/loss=232.034, player_2/loss=678.178, rew=1638.00]


Epoch #4302: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4303: 1025it [00:02, 356.39it/s, env_step=4406272, len=16, n/ep=3, n/st=64, player_1/loss=485.905, player_2/loss=588.260, rew=443.33]


Epoch #4303: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4304: 1025it [00:02, 360.02it/s, env_step=4407296, len=30, n/ep=2, n/st=64, player_1/loss=619.922, player_2/loss=411.986, rew=937.00]


Epoch #4304: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4305: 1025it [00:02, 358.63it/s, env_step=4408320, len=37, n/ep=2, n/st=64, player_1/loss=405.392, player_2/loss=397.318, rew=1408.00]


Epoch #4305: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4306: 1025it [00:02, 357.76it/s, env_step=4409344, len=23, n/ep=3, n/st=64, player_1/loss=445.878, player_2/loss=306.006, rew=664.67]


Epoch #4306: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4307: 1025it [00:02, 360.52it/s, env_step=4410368, len=33, n/ep=2, n/st=64, player_1/loss=191.357, player_2/loss=279.759, rew=1196.00]


Epoch #4307: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4308: 1025it [00:02, 356.51it/s, env_step=4411392, len=35, n/ep=2, n/st=64, player_1/loss=417.547, player_2/loss=77.254, rew=1262.00]


Epoch #4308: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4309: 1025it [00:02, 359.26it/s, env_step=4412416, len=23, n/ep=3, n/st=64, player_1/loss=388.343, player_2/loss=174.709, rew=731.33]


Epoch #4309: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4310: 1025it [00:02, 362.05it/s, env_step=4413440, len=24, n/ep=2, n/st=64, player_1/loss=120.988, player_2/loss=252.458, rew=944.00]


Epoch #4310: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4311: 1025it [00:02, 360.14it/s, env_step=4414464, len=37, n/ep=1, n/st=64, player_1/loss=200.353, player_2/loss=339.136, rew=1404.00]


Epoch #4311: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4312: 1025it [00:02, 359.26it/s, env_step=4415488, len=30, n/ep=2, n/st=64, player_1/loss=334.975, player_2/loss=292.289, rew=964.00]


Epoch #4312: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4313: 1025it [00:02, 358.38it/s, env_step=4416512, len=31, n/ep=2, n/st=64, player_1/loss=231.431, player_2/loss=144.247, rew=999.00]


Epoch #4313: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4314: 1025it [00:02, 358.76it/s, env_step=4417536, len=18, n/ep=3, n/st=64, player_1/loss=187.438, player_2/loss=159.059, rew=516.00]


Epoch #4314: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4315: 1025it [00:02, 360.02it/s, env_step=4418560, len=29, n/ep=2, n/st=64, player_1/loss=265.608, player_2/loss=346.324, rew=918.00]


Epoch #4315: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4316: 1025it [00:02, 359.64it/s, env_step=4419584, len=27, n/ep=3, n/st=64, player_1/loss=421.475, player_2/loss=341.981, rew=794.67]


Epoch #4316: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4317: 1025it [00:02, 356.27it/s, env_step=4420608, len=37, n/ep=2, n/st=64, player_1/loss=394.428, player_2/loss=881.142, rew=1442.00]


Epoch #4317: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4318: 1025it [00:02, 358.76it/s, env_step=4421632, len=28, n/ep=2, n/st=64, player_1/loss=699.103, player_2/loss=1206.694, rew=891.00]


Epoch #4318: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4319: 1025it [00:02, 359.77it/s, env_step=4422656, len=34, n/ep=2, n/st=64, player_1/loss=649.999, player_2/loss=665.606, rew=1204.00]


Epoch #4319: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4320: 1025it [00:02, 361.80it/s, env_step=4423680, len=37, n/ep=2, n/st=64, player_1/loss=542.546, player_2/loss=391.024, rew=1405.00]


Epoch #4320: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4321: 1025it [00:02, 357.76it/s, env_step=4424704, len=34, n/ep=2, n/st=64, player_1/loss=113.816, player_2/loss=285.658, rew=1223.00]


Epoch #4321: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4322: 1025it [00:02, 358.88it/s, env_step=4425728, len=29, n/ep=2, n/st=64, player_1/loss=190.846, player_2/loss=117.106, rew=928.00]


Epoch #4322: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4323: 1025it [00:02, 358.76it/s, env_step=4426752, len=33, n/ep=2, n/st=64, player_1/loss=262.985, player_2/loss=439.287, rew=1156.00]


Epoch #4323: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4324: 1025it [00:02, 359.64it/s, env_step=4427776, len=37, n/ep=1, n/st=64, player_1/loss=160.297, player_2/loss=382.459, rew=1404.00]


Epoch #4324: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4325: 1025it [00:02, 359.77it/s, env_step=4428800, len=37, n/ep=1, n/st=64, player_1/loss=339.642, player_2/loss=157.840, rew=1404.00]


Epoch #4325: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4326: 1025it [00:02, 358.13it/s, env_step=4429824, len=27, n/ep=2, n/st=64, player_1/loss=360.824, player_2/loss=225.358, rew=763.00]


Epoch #4326: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4327: 1025it [00:02, 361.80it/s, env_step=4430848, len=31, n/ep=2, n/st=64, player_1/loss=198.881, player_2/loss=224.983, rew=1028.00]


Epoch #4327: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4328: 1025it [00:02, 361.29it/s, env_step=4431872, len=34, n/ep=2, n/st=64, player_1/loss=167.688, player_2/loss=326.088, rew=1229.00]


Epoch #4328: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4329: 1025it [00:02, 360.91it/s, env_step=4432896, len=38, n/ep=1, n/st=64, player_1/loss=116.250, player_2/loss=359.559, rew=1480.00]


Epoch #4329: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4330: 1025it [00:02, 358.38it/s, env_step=4433920, len=33, n/ep=2, n/st=64, player_1/loss=41.103, player_2/loss=437.835, rew=1156.00]


Epoch #4330: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4331: 1025it [00:02, 358.01it/s, env_step=4434944, len=35, n/ep=2, n/st=64, player_1/loss=290.158, player_2/loss=487.855, rew=1262.00]


Epoch #4331: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4332: 1025it [00:02, 360.14it/s, env_step=4435968, len=40, n/ep=1, n/st=64, player_1/loss=207.856, player_2/loss=380.209, rew=1638.00]


Epoch #4332: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4333: 1025it [00:02, 360.65it/s, env_step=4436992, len=35, n/ep=2, n/st=64, player_1/loss=56.434, player_2/loss=106.624, rew=1322.00]


Epoch #4333: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4334: 1025it [00:02, 362.18it/s, env_step=4438016, len=29, n/ep=2, n/st=64, player_1/loss=115.933, player_2/loss=71.624, rew=893.00]


Epoch #4334: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4335: 1025it [00:02, 361.29it/s, env_step=4439040, len=36, n/ep=2, n/st=64, player_1/loss=250.576, player_2/loss=88.892, rew=1334.00]


Epoch #4335: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4336: 1025it [00:02, 356.39it/s, env_step=4440064, len=40, n/ep=2, n/st=64, player_1/loss=164.995, player_2/loss=107.417, rew=1696.00]


Epoch #4336: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4337: 1025it [00:02, 350.42it/s, env_step=4441088, len=25, n/ep=2, n/st=64, player_1/loss=374.720, player_2/loss=132.130, rew=657.00]


Epoch #4337: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4338: 1025it [00:03, 336.40it/s, env_step=4442112, len=28, n/ep=3, n/st=64, player_1/loss=410.541, player_2/loss=564.443, rew=911.33]


Epoch #4338: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4339: 1025it [00:02, 352.11it/s, env_step=4443136, len=37, n/ep=2, n/st=64, player_1/loss=455.702, player_2/loss=611.804, rew=1404.00]


Epoch #4339: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4340: 1025it [00:02, 347.57it/s, env_step=4444160, len=34, n/ep=2, n/st=64, player_1/loss=709.501, player_2/loss=360.057, rew=1188.00]


Epoch #4340: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4341: 1025it [00:02, 352.71it/s, env_step=4445184, len=39, n/ep=2, n/st=64, player_1/loss=579.449, player_2/loss=218.332, rew=1598.00]


Epoch #4341: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4342: 1025it [00:02, 354.17it/s, env_step=4446208, len=26, n/ep=2, n/st=64, player_1/loss=316.893, player_2/loss=584.105, rew=701.00]


Epoch #4342: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4343: 1025it [00:02, 357.63it/s, env_step=4447232, len=33, n/ep=2, n/st=64, player_1/loss=270.892, player_2/loss=525.071, rew=1136.00]


Epoch #4343: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4344: 1025it [00:02, 352.11it/s, env_step=4448256, len=34, n/ep=2, n/st=64, player_1/loss=97.648, player_2/loss=79.703, rew=1235.00]


Epoch #4344: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4345: 1025it [00:02, 355.52it/s, env_step=4449280, len=36, n/ep=2, n/st=64, player_1/loss=71.772, player_2/loss=62.834, rew=1330.00]


Epoch #4345: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4346: 1025it [00:02, 357.26it/s, env_step=4450304, len=21, n/ep=3, n/st=64, player_1/loss=207.661, player_2/loss=261.872, rew=481.33]


Epoch #4346: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4347: 1025it [00:02, 354.17it/s, env_step=4451328, len=37, n/ep=1, n/st=64, player_1/loss=283.885, player_2/loss=341.011, rew=1404.00]


Epoch #4347: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4348: 1025it [00:02, 357.13it/s, env_step=4452352, len=30, n/ep=2, n/st=64, player_1/loss=124.615, player_2/loss=106.145, rew=965.00]


Epoch #4348: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4349: 1025it [00:02, 351.62it/s, env_step=4453376, len=34, n/ep=2, n/st=64, player_1/loss=37.043, player_2/loss=390.728, rew=1204.00]


Epoch #4349: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4350: 1025it [00:02, 356.02it/s, env_step=4454400, len=25, n/ep=3, n/st=64, player_1/loss=186.866, player_2/loss=386.058, rew=792.67]


Epoch #4350: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4351: 1025it [00:02, 355.28it/s, env_step=4455424, len=27, n/ep=2, n/st=64, player_1/loss=528.259, player_2/loss=171.799, rew=838.00]


Epoch #4351: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4352: 1025it [00:02, 341.78it/s, env_step=4456448, len=28, n/ep=2, n/st=64, player_1/loss=517.127, player_2/loss=350.783, rew=839.00]


Epoch #4352: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4353: 1025it [00:02, 354.30it/s, env_step=4457472, len=27, n/ep=3, n/st=64, player_1/loss=468.981, player_2/loss=306.631, rew=837.33]


Epoch #4353: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4354: 1025it [00:02, 352.23it/s, env_step=4458496, len=35, n/ep=2, n/st=64, player_1/loss=550.177, player_2/loss=165.689, rew=1274.00]


Epoch #4354: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4355: 1025it [00:02, 356.51it/s, env_step=4459520, len=36, n/ep=2, n/st=64, player_1/loss=292.619, player_2/loss=192.809, rew=1369.00]


Epoch #4355: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4356: 1025it [00:02, 355.16it/s, env_step=4460544, len=34, n/ep=2, n/st=64, player_1/loss=94.083, player_2/loss=159.336, rew=1243.00]


Epoch #4356: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4357: 1025it [00:02, 356.89it/s, env_step=4461568, len=27, n/ep=2, n/st=64, player_1/loss=141.870, player_2/loss=95.010, rew=803.00]


Epoch #4357: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4358: 1025it [00:02, 352.11it/s, env_step=4462592, len=42, n/ep=1, n/st=64, player_1/loss=94.883, player_2/loss=398.390, rew=1834.00]


Epoch #4358: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4359: 1025it [00:02, 354.54it/s, env_step=4463616, len=37, n/ep=2, n/st=64, player_1/loss=51.924, player_2/loss=895.710, rew=1408.00]


Epoch #4359: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4360: 1025it [00:02, 358.38it/s, env_step=4464640, len=27, n/ep=3, n/st=64, player_1/loss=200.468, player_2/loss=651.134, rew=826.00]


Epoch #4360: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4361: 1025it [00:02, 358.51it/s, env_step=4465664, len=25, n/ep=3, n/st=64, player_1/loss=496.477, rew=650.00]  


Epoch #4361: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4362: 1025it [00:02, 355.89it/s, env_step=4466688, len=30, n/ep=2, n/st=64, player_1/loss=334.880, player_2/loss=536.289, rew=959.00]


Epoch #4362: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4363: 1025it [00:02, 348.63it/s, env_step=4467712, len=31, n/ep=2, n/st=64, player_1/loss=205.210, player_2/loss=678.146, rew=1024.00]


Epoch #4363: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4364: 1025it [00:02, 356.02it/s, env_step=4468736, len=20, n/ep=3, n/st=64, player_1/loss=337.071, player_2/loss=476.241, rew=582.67]


Epoch #4364: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4365: 1025it [00:02, 352.47it/s, env_step=4469760, len=36, n/ep=2, n/st=64, player_1/loss=227.926, player_2/loss=414.748, rew=1355.00]


Epoch #4365: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4366: 1025it [00:02, 355.16it/s, env_step=4470784, len=31, n/ep=2, n/st=64, player_1/loss=70.234, player_2/loss=514.442, rew=991.00]


Epoch #4366: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4367: 1025it [00:02, 352.71it/s, env_step=4471808, len=37, n/ep=2, n/st=64, player_1/loss=228.702, player_2/loss=419.810, rew=1442.00]


Epoch #4367: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4368: 1025it [00:02, 356.02it/s, env_step=4472832, len=39, n/ep=2, n/st=64, player_1/loss=88.005, player_2/loss=324.534, rew=1598.00]


Epoch #4368: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4369: 1025it [00:02, 355.03it/s, env_step=4473856, len=27, n/ep=3, n/st=64, player_1/loss=266.508, player_2/loss=296.561, rew=762.67]


Epoch #4369: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4370: 1025it [00:02, 354.42it/s, env_step=4474880, len=32, n/ep=2, n/st=64, player_1/loss=281.172, player_2/loss=388.021, rew=1070.00]


Epoch #4370: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4371: 1025it [00:02, 359.01it/s, env_step=4475904, len=18, n/ep=5, n/st=64, player_1/loss=278.104, player_2/loss=470.743, rew=392.00]


Epoch #4371: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4372: 1025it [00:02, 354.05it/s, env_step=4476928, len=27, n/ep=2, n/st=64, player_1/loss=234.489, player_2/loss=419.291, rew=784.00]


Epoch #4372: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4373: 1025it [00:02, 355.65it/s, env_step=4477952, len=28, n/ep=1, n/st=64, player_1/loss=289.825, player_2/loss=459.941, rew=810.00]


Epoch #4373: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4374: 1025it [00:02, 358.01it/s, env_step=4478976, len=38, n/ep=2, n/st=64, player_1/loss=401.794, player_2/loss=292.197, rew=1480.00]


Epoch #4374: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4375: 1025it [00:02, 355.52it/s, env_step=4480000, len=30, n/ep=2, n/st=64, player_1/loss=309.724, player_2/loss=578.931, rew=964.00]


Epoch #4375: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4376: 1025it [00:02, 351.62it/s, env_step=4481024, len=29, n/ep=2, n/st=64, player_1/loss=309.646, player_2/loss=669.785, rew=893.00]


Epoch #4376: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4377: 1025it [00:02, 357.88it/s, env_step=4482048, len=27, n/ep=2, n/st=64, player_1/loss=99.698, player_2/loss=215.906, rew=803.00]


Epoch #4377: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4378: 1025it [00:02, 353.93it/s, env_step=4483072, len=15, n/ep=4, n/st=64, player_1/loss=75.275, player_2/loss=114.229, rew=300.00]


Epoch #4378: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4379: 1025it [00:02, 356.14it/s, env_step=4484096, len=24, n/ep=3, n/st=64, player_1/loss=177.374, player_2/loss=98.624, rew=665.33]


Epoch #4379: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4380: 1025it [00:02, 350.66it/s, env_step=4485120, len=16, n/ep=4, n/st=64, player_1/loss=697.779, player_2/loss=65.271, rew=390.50]


Epoch #4380: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4381: 1025it [00:02, 354.54it/s, env_step=4486144, len=16, n/ep=4, n/st=64, player_1/loss=694.526, player_2/loss=221.058, rew=312.00]


Epoch #4381: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4382: 1025it [00:02, 353.32it/s, env_step=4487168, len=16, n/ep=4, n/st=64, player_1/loss=467.797, player_2/loss=796.214, rew=339.00]


Epoch #4382: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4383: 1025it [00:02, 356.76it/s, env_step=4488192, len=29, n/ep=3, n/st=64, player_1/loss=412.350, player_2/loss=922.379, rew=958.00]


Epoch #4383: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4384: 1025it [00:02, 356.51it/s, env_step=4489216, len=27, n/ep=3, n/st=64, player_1/loss=374.685, player_2/loss=615.695, rew=778.67]


Epoch #4384: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4385: 1025it [00:02, 350.90it/s, env_step=4490240, len=38, n/ep=1, n/st=64, player_1/loss=100.054, player_2/loss=771.448, rew=1480.00]


Epoch #4385: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4386: 1025it [00:02, 354.30it/s, env_step=4491264, len=28, n/ep=2, n/st=64, player_1/loss=214.005, player_2/loss=580.720, rew=846.00]


Epoch #4386: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4387: 1025it [00:02, 354.91it/s, env_step=4492288, len=29, n/ep=2, n/st=64, player_1/loss=205.897, player_2/loss=362.729, rew=949.00]


Epoch #4387: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4388: 1025it [00:02, 353.56it/s, env_step=4493312, len=25, n/ep=2, n/st=64, player_1/loss=201.987, player_2/loss=653.378, rew=729.00]


Epoch #4388: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4389: 1025it [00:02, 357.76it/s, env_step=4494336, len=25, n/ep=3, n/st=64, player_1/loss=310.119, player_2/loss=687.755, rew=656.67]


Epoch #4389: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4390: 1025it [00:02, 351.26it/s, env_step=4495360, len=21, n/ep=3, n/st=64, player_1/loss=234.523, player_2/loss=899.101, rew=607.33]


Epoch #4390: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4391: 1025it [00:02, 355.03it/s, env_step=4496384, len=24, n/ep=2, n/st=64, player_1/loss=150.249, player_2/loss=640.731, rew=625.00]


Epoch #4391: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4392: 1025it [00:02, 355.03it/s, env_step=4497408, len=39, n/ep=1, n/st=64, player_1/loss=147.393, player_2/loss=438.896, rew=1558.00]


Epoch #4392: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4393: 1025it [00:02, 357.01it/s, env_step=4498432, len=19, n/ep=3, n/st=64, player_1/loss=35.293, player_2/loss=504.145, rew=412.67]


Epoch #4393: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4394: 1025it [00:02, 352.83it/s, env_step=4499456, len=31, n/ep=2, n/st=64, player_1/loss=550.584, player_2/loss=701.234, rew=991.00]


Epoch #4394: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4395: 1025it [00:02, 351.74it/s, env_step=4500480, len=28, n/ep=3, n/st=64, player_1/loss=708.890, rew=810.67]  


Epoch #4395: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4396: 1025it [00:02, 357.76it/s, env_step=4501504, len=27, n/ep=3, n/st=64, player_1/loss=522.390, player_2/loss=1083.074, rew=772.67]


Epoch #4396: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4397: 1025it [00:02, 355.77it/s, env_step=4502528, len=20, n/ep=2, n/st=64, player_1/loss=1010.750, player_2/loss=235.664, rew=419.00]


Epoch #4397: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4398: 1025it [00:02, 356.27it/s, env_step=4503552, len=24, n/ep=3, n/st=64, player_1/loss=788.371, player_2/loss=194.919, rew=641.33]


Epoch #4398: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4399: 1025it [00:02, 349.70it/s, env_step=4504576, len=28, n/ep=3, n/st=64, player_1/loss=383.466, player_2/loss=311.246, rew=992.67]


Epoch #4399: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4400: 1025it [00:02, 354.54it/s, env_step=4505600, len=37, n/ep=2, n/st=64, player_1/loss=509.886, player_2/loss=339.681, rew=1408.00]


Epoch #4400: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4401: 1025it [00:02, 355.77it/s, env_step=4506624, len=25, n/ep=3, n/st=64, player_1/loss=439.307, player_2/loss=95.768, rew=656.00]


Epoch #4401: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4402: 1025it [00:02, 355.03it/s, env_step=4507648, len=33, n/ep=2, n/st=64, player_1/loss=123.179, player_2/loss=219.731, rew=1174.00]


Epoch #4402: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4403: 1025it [00:02, 356.76it/s, env_step=4508672, len=24, n/ep=3, n/st=64, player_1/loss=130.398, player_2/loss=792.820, rew=602.67]


Epoch #4403: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4404: 1025it [00:02, 351.38it/s, env_step=4509696, len=37, n/ep=2, n/st=64, player_1/loss=207.724, player_2/loss=757.797, rew=1442.00]


Epoch #4404: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4405: 1025it [00:02, 359.26it/s, env_step=4510720, len=31, n/ep=2, n/st=64, player_1/loss=499.688, player_2/loss=312.862, rew=1024.00]


Epoch #4405: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4406: 1025it [00:02, 351.02it/s, env_step=4511744, len=30, n/ep=1, n/st=64, player_1/loss=595.256, player_2/loss=664.304, rew=928.00]


Epoch #4406: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4407: 1025it [00:02, 355.28it/s, env_step=4512768, len=21, n/ep=3, n/st=64, player_1/loss=512.016, player_2/loss=875.068, rew=480.67]


Epoch #4407: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4408: 1025it [00:02, 351.86it/s, env_step=4513792, len=17, n/ep=5, n/st=64, player_1/loss=626.869, player_2/loss=861.029, rew=465.60]


Epoch #4408: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4409: 1025it [00:02, 354.66it/s, env_step=4514816, len=28, n/ep=2, n/st=64, player_1/loss=835.902, player_2/loss=864.198, rew=819.00]


Epoch #4409: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4410: 1025it [00:02, 356.27it/s, env_step=4515840, len=27, n/ep=2, n/st=64, player_1/loss=813.242, player_2/loss=111.192, rew=755.00]


Epoch #4410: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4411: 1025it [00:02, 354.17it/s, env_step=4516864, len=32, n/ep=2, n/st=64, player_1/loss=198.344, player_2/loss=297.652, rew=1118.00]


Epoch #4411: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4412: 1025it [00:02, 354.79it/s, env_step=4517888, len=29, n/ep=2, n/st=64, player_1/loss=93.326, player_2/loss=334.610, rew=910.00]


Epoch #4412: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4413: 1025it [00:02, 353.81it/s, env_step=4518912, len=30, n/ep=2, n/st=64, player_1/loss=224.805, player_2/loss=241.110, rew=979.00]


Epoch #4413: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4414: 1025it [00:02, 356.76it/s, env_step=4519936, len=22, n/ep=3, n/st=64, player_1/loss=720.700, player_2/loss=167.406, rew=522.67]


Epoch #4414: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4415: 1025it [00:02, 355.03it/s, env_step=4520960, len=33, n/ep=2, n/st=64, player_1/loss=794.588, player_2/loss=54.886, rew=1184.00]


Epoch #4415: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4416: 1025it [00:02, 354.17it/s, env_step=4521984, len=34, n/ep=2, n/st=64, player_1/loss=657.600, player_2/loss=61.962, rew=1192.00]


Epoch #4416: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4417: 1025it [00:02, 350.06it/s, env_step=4523008, len=21, n/ep=3, n/st=64, player_1/loss=204.725, player_2/loss=140.234, rew=460.00]


Epoch #4417: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4418: 1025it [00:02, 355.89it/s, env_step=4524032, len=23, n/ep=3, n/st=64, player_1/loss=105.897, player_2/loss=290.800, rew=662.67]


Epoch #4418: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4419: 1025it [00:02, 354.05it/s, env_step=4525056, len=23, n/ep=3, n/st=64, player_1/loss=183.983, player_2/loss=390.750, rew=638.67]


Epoch #4419: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4420: 1025it [00:02, 356.76it/s, env_step=4526080, len=13, n/ep=4, n/st=64, player_1/loss=269.573, player_2/loss=992.050, rew=221.00]


Epoch #4420: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4421: 1025it [00:02, 350.42it/s, env_step=4527104, len=25, n/ep=2, n/st=64, player_1/loss=486.753, player_2/loss=885.308, rew=649.00]


Epoch #4421: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4422: 1025it [00:02, 358.76it/s, env_step=4528128, len=31, n/ep=2, n/st=64, player_1/loss=424.827, player_2/loss=708.569, rew=1028.00]


Epoch #4422: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4423: 1025it [00:02, 356.89it/s, env_step=4529152, len=28, n/ep=2, n/st=64, player_1/loss=225.584, rew=841.00]  


Epoch #4423: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4424: 1025it [00:02, 352.71it/s, env_step=4530176, len=32, n/ep=2, n/st=64, player_1/loss=773.598, player_2/loss=204.168, rew=1079.00]


Epoch #4424: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4425: 1025it [00:02, 354.66it/s, env_step=4531200, len=26, n/ep=2, n/st=64, player_1/loss=886.774, player_2/loss=204.111, rew=701.00]


Epoch #4425: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4426: 1025it [00:02, 351.26it/s, env_step=4532224, len=24, n/ep=2, n/st=64, player_1/loss=620.144, player_2/loss=244.261, rew=623.00]


Epoch #4426: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4427: 1025it [00:02, 355.77it/s, env_step=4533248, len=27, n/ep=2, n/st=64, player_1/loss=518.221, player_2/loss=148.416, rew=758.00]


Epoch #4427: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4428: 1025it [00:02, 354.66it/s, env_step=4534272, len=21, n/ep=3, n/st=64, player_1/loss=297.422, player_2/loss=195.790, rew=503.33]


Epoch #4428: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4429: 1025it [00:02, 356.51it/s, env_step=4535296, len=21, n/ep=3, n/st=64, player_1/loss=434.655, player_2/loss=206.847, rew=607.33]


Epoch #4429: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4430: 1025it [00:02, 351.02it/s, env_step=4536320, len=33, n/ep=2, n/st=64, player_1/loss=430.084, player_2/loss=227.374, rew=1156.00]


Epoch #4430: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4431: 1025it [00:02, 355.52it/s, env_step=4537344, len=37, n/ep=2, n/st=64, player_1/loss=228.807, player_2/loss=68.941, rew=1448.00]


Epoch #4431: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4432: 1025it [00:02, 352.83it/s, env_step=4538368, len=33, n/ep=2, n/st=64, player_1/loss=307.967, player_2/loss=38.794, rew=1160.00]


Epoch #4432: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4433: 1025it [00:02, 357.88it/s, env_step=4539392, len=27, n/ep=2, n/st=64, player_1/loss=625.321, player_2/loss=42.367, rew=758.00]


Epoch #4433: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4434: 1025it [00:02, 356.51it/s, env_step=4540416, len=28, n/ep=2, n/st=64, player_1/loss=641.604, player_2/loss=50.826, rew=851.00]


Epoch #4434: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4435: 1025it [00:02, 350.30it/s, env_step=4541440, len=30, n/ep=2, n/st=64, player_1/loss=424.468, player_2/loss=48.341, rew=1015.00]


Epoch #4435: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4436: 1025it [00:02, 357.63it/s, env_step=4542464, len=28, n/ep=2, n/st=64, player_1/loss=258.081, player_2/loss=42.790, rew=826.00]


Epoch #4436: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4437: 1025it [00:02, 359.26it/s, env_step=4543488, len=26, n/ep=3, n/st=64, player_1/loss=150.503, player_2/loss=48.517, rew=722.00]


Epoch #4437: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4438: 1025it [00:02, 350.42it/s, env_step=4544512, len=25, n/ep=3, n/st=64, player_1/loss=242.793, player_2/loss=56.581, rew=682.00]


Epoch #4438: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4439: 1025it [00:02, 353.81it/s, env_step=4545536, len=19, n/ep=3, n/st=64, player_1/loss=329.953, player_2/loss=364.352, rew=402.67]


Epoch #4439: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4440: 1025it [00:02, 355.52it/s, env_step=4546560, len=32, n/ep=2, n/st=64, player_1/loss=190.172, player_2/loss=694.901, rew=1058.00]


Epoch #4440: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4441: 1025it [00:02, 353.93it/s, env_step=4547584, len=31, n/ep=2, n/st=64, player_1/loss=193.519, player_2/loss=499.316, rew=990.00]


Epoch #4441: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4442: 1025it [00:02, 354.79it/s, env_step=4548608, len=21, n/ep=2, n/st=64, player_1/loss=145.166, player_2/loss=542.603, rew=614.00]


Epoch #4442: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4443: 1025it [00:02, 349.82it/s, env_step=4549632, len=32, n/ep=2, n/st=64, player_1/loss=115.101, player_2/loss=1003.537, rew=1087.00]


Epoch #4443: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4444: 1025it [00:02, 354.54it/s, env_step=4550656, len=17, n/ep=3, n/st=64, player_1/loss=160.915, player_2/loss=1469.409, rew=306.67]


Epoch #4444: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4445: 1025it [00:02, 353.08it/s, env_step=4551680, len=29, n/ep=2, n/st=64, player_1/loss=107.267, player_2/loss=927.892, rew=872.00]


Epoch #4445: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4446: 1025it [00:02, 355.28it/s, env_step=4552704, len=27, n/ep=3, n/st=64, player_1/loss=122.310, player_2/loss=472.817, rew=888.00]


Epoch #4446: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4447: 1025it [00:02, 354.05it/s, env_step=4553728, len=39, n/ep=2, n/st=64, player_1/loss=116.568, player_2/loss=461.807, rew=1598.00]


Epoch #4447: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4448: 1025it [00:02, 351.14it/s, env_step=4554752, len=29, n/ep=2, n/st=64, player_1/loss=125.036, player_2/loss=60.712, rew=869.00]


Epoch #4448: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4449: 1025it [00:02, 356.02it/s, env_step=4555776, len=30, n/ep=2, n/st=64, player_1/loss=222.460, player_2/loss=351.961, rew=1009.00]


Epoch #4449: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4450: 1025it [00:02, 358.26it/s, env_step=4556800, len=21, n/ep=2, n/st=64, player_1/loss=264.066, player_2/loss=371.604, rew=604.00]


Epoch #4450: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4451: 1025it [00:02, 354.54it/s, env_step=4557824, len=30, n/ep=2, n/st=64, player_1/loss=79.823, player_2/loss=223.333, rew=964.00]


Epoch #4451: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4452: 1025it [00:02, 355.16it/s, env_step=4558848, len=15, n/ep=3, n/st=64, player_1/loss=566.593, player_2/loss=596.260, rew=387.33]


Epoch #4452: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4453: 1025it [00:02, 351.86it/s, env_step=4559872, len=27, n/ep=3, n/st=64, player_1/loss=771.060, player_2/loss=1003.401, rew=754.67]


Epoch #4453: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4454: 1025it [00:02, 355.77it/s, env_step=4560896, len=29, n/ep=2, n/st=64, player_1/loss=547.036, player_2/loss=660.403, rew=910.00]


Epoch #4454: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4455: 1025it [00:02, 352.47it/s, env_step=4561920, len=28, n/ep=2, n/st=64, player_1/loss=167.384, player_2/loss=221.177, rew=859.00]


Epoch #4455: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4456: 1025it [00:02, 353.93it/s, env_step=4562944, len=29, n/ep=2, n/st=64, player_1/loss=243.071, player_2/loss=168.711, rew=868.00]


Epoch #4456: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4457: 1025it [00:02, 349.47it/s, env_step=4563968, len=33, n/ep=2, n/st=64, player_1/loss=181.512, player_2/loss=138.241, rew=1174.00]


Epoch #4457: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4458: 1025it [00:02, 355.40it/s, env_step=4564992, len=21, n/ep=2, n/st=64, player_1/loss=214.805, player_2/loss=294.777, rew=572.00]


Epoch #4458: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4459: 1025it [00:02, 354.05it/s, env_step=4566016, len=38, n/ep=1, n/st=64, player_1/loss=426.901, player_2/loss=721.956, rew=1480.00]


Epoch #4459: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4460: 1025it [00:02, 355.89it/s, env_step=4567040, len=22, n/ep=2, n/st=64, player_1/loss=337.662, player_2/loss=630.366, rew=767.00]


Epoch #4460: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4461: 1025it [00:02, 355.28it/s, env_step=4568064, len=35, n/ep=2, n/st=64, player_1/loss=234.621, player_2/loss=579.973, rew=1314.00]


Epoch #4461: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4462: 1025it [00:02, 352.11it/s, env_step=4569088, len=27, n/ep=3, n/st=64, player_1/loss=361.556, player_2/loss=916.608, rew=830.00]


Epoch #4462: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4463: 1025it [00:02, 354.91it/s, env_step=4570112, len=23, n/ep=3, n/st=64, player_1/loss=330.896, player_2/loss=1022.469, rew=558.00]


Epoch #4463: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4464: 1025it [00:02, 353.93it/s, env_step=4571136, len=38, n/ep=1, n/st=64, player_1/loss=446.863, player_2/loss=1028.709, rew=1480.00]


Epoch #4464: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4465: 1025it [00:02, 354.79it/s, env_step=4572160, len=32, n/ep=2, n/st=64, player_1/loss=662.713, player_2/loss=712.929, rew=1107.00]


Epoch #4465: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4466: 1025it [00:02, 353.32it/s, env_step=4573184, len=36, n/ep=2, n/st=64, player_1/loss=507.035, player_2/loss=341.325, rew=1373.00]


Epoch #4466: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4467: 1025it [00:02, 356.02it/s, env_step=4574208, len=21, n/ep=3, n/st=64, player_1/loss=232.977, player_2/loss=442.575, rew=541.33]


Epoch #4467: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4468: 1025it [00:02, 353.20it/s, env_step=4575232, len=23, n/ep=2, n/st=64, player_1/loss=211.051, player_2/loss=805.209, rew=806.00]


Epoch #4468: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4469: 1025it [00:02, 353.20it/s, env_step=4576256, len=16, n/ep=4, n/st=64, player_1/loss=374.291, player_2/loss=758.874, rew=433.50]


Epoch #4469: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4470: 1025it [00:02, 356.14it/s, env_step=4577280, len=37, n/ep=2, n/st=64, player_1/loss=306.749, player_2/loss=265.648, rew=1442.00]


Epoch #4470: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4471: 1025it [00:02, 349.82it/s, env_step=4578304, len=28, n/ep=2, n/st=64, player_1/loss=378.575, player_2/loss=120.730, rew=859.00]


Epoch #4471: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4472: 1025it [00:02, 356.64it/s, env_step=4579328, len=10, n/ep=6, n/st=64, player_1/loss=574.824, player_2/loss=539.332, rew=160.00]


Epoch #4472: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4473: 1025it [00:02, 354.79it/s, env_step=4580352, len=27, n/ep=2, n/st=64, player_1/loss=528.673, player_2/loss=613.330, rew=838.00]


Epoch #4473: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4474: 1025it [00:02, 355.03it/s, env_step=4581376, len=24, n/ep=3, n/st=64, player_1/loss=372.070, player_2/loss=538.181, rew=662.67]


Epoch #4474: test_reward: 70.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4475: 1025it [00:02, 357.13it/s, env_step=4582400, len=32, n/ep=2, n/st=64, player_1/loss=329.992, player_2/loss=368.418, rew=1117.00]


Epoch #4475: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4476: 1025it [00:02, 351.62it/s, env_step=4583424, len=15, n/ep=5, n/st=64, player_1/loss=215.475, player_2/loss=185.428, rew=375.20]


Epoch #4476: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4477: 1025it [00:02, 354.54it/s, env_step=4584448, len=17, n/ep=4, n/st=64, player_1/loss=163.618, player_2/loss=320.250, rew=486.00]


Epoch #4477: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4478: 1025it [00:02, 354.79it/s, env_step=4585472, len=31, n/ep=2, n/st=64, player_1/loss=244.328, player_2/loss=467.100, rew=1024.00]


Epoch #4478: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4479: 1025it [00:02, 355.40it/s, env_step=4586496, len=35, n/ep=2, n/st=64, player_1/loss=195.221, player_2/loss=679.442, rew=1259.00]


Epoch #4479: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4480: 1025it [00:02, 349.70it/s, env_step=4587520, len=32, n/ep=2, n/st=64, player_1/loss=136.278, player_2/loss=750.578, rew=1107.00]


Epoch #4480: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4481: 1025it [00:02, 353.20it/s, env_step=4588544, len=13, n/ep=5, n/st=64, player_1/loss=169.207, player_2/loss=1349.108, rew=185.20]


Epoch #4481: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4482: 1025it [00:02, 353.44it/s, env_step=4589568, len=12, n/ep=5, n/st=64, player_1/loss=200.753, rew=172.00]  


Epoch #4482: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4483: 1025it [00:02, 355.65it/s, env_step=4590592, len=13, n/ep=5, n/st=64, player_1/loss=350.617, player_2/loss=902.391, rew=194.00]


Epoch #4483: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4484: 1025it [00:02, 354.17it/s, env_step=4591616, len=15, n/ep=4, n/st=64, player_1/loss=377.276, player_2/loss=801.274, rew=267.50]


Epoch #4484: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4485: 1025it [00:02, 350.66it/s, env_step=4592640, len=24, n/ep=2, n/st=64, player_1/loss=270.210, player_2/loss=472.257, rew=863.00]


Epoch #4485: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4486: 1025it [00:02, 353.20it/s, env_step=4593664, len=15, n/ep=4, n/st=64, player_1/loss=235.099, player_2/loss=351.518, rew=443.50]


Epoch #4486: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4487: 1025it [00:02, 352.59it/s, env_step=4594688, len=29, n/ep=2, n/st=64, player_1/loss=273.487, player_2/loss=171.400, rew=1008.00]


Epoch #4487: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4488: 1025it [00:02, 356.39it/s, env_step=4595712, len=9, n/ep=7, n/st=64, player_1/loss=107.056, player_2/loss=388.702, rew=114.29]


Epoch #4488: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4489: 1025it [00:02, 353.68it/s, env_step=4596736, len=16, n/ep=4, n/st=64, player_1/loss=114.166, player_2/loss=614.713, rew=327.50]


Epoch #4489: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4490: 1025it [00:02, 351.50it/s, env_step=4597760, len=26, n/ep=3, n/st=64, player_1/loss=196.652, player_2/loss=528.983, rew=765.33]


Epoch #4490: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4491: 1025it [00:02, 354.30it/s, env_step=4598784, len=30, n/ep=2, n/st=64, player_1/loss=139.735, player_2/loss=746.848, rew=961.00]


Epoch #4491: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4492: 1025it [00:02, 355.28it/s, env_step=4599808, len=16, n/ep=4, n/st=64, player_1/loss=124.337, player_2/loss=851.295, rew=441.50]


Epoch #4492: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4493: 1025it [00:02, 353.44it/s, env_step=4600832, len=13, n/ep=5, n/st=64, player_1/loss=405.543, player_2/loss=752.484, rew=214.00]


Epoch #4493: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4494: 1025it [00:02, 353.68it/s, env_step=4601856, len=25, n/ep=3, n/st=64, player_1/loss=410.188, player_2/loss=754.819, rew=806.67]


Epoch #4494: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4495: 1025it [00:02, 354.66it/s, env_step=4602880, len=16, n/ep=4, n/st=64, player_1/loss=306.684, player_2/loss=728.804, rew=335.00]


Epoch #4495: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4496: 1025it [00:02, 352.23it/s, env_step=4603904, len=16, n/ep=4, n/st=64, player_1/loss=247.483, player_2/loss=577.841, rew=378.00]


Epoch #4496: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4497: 1025it [00:02, 354.05it/s, env_step=4604928, len=16, n/ep=3, n/st=64, player_1/loss=211.446, player_2/loss=201.258, rew=288.00]


Epoch #4497: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4498: 1025it [00:02, 356.02it/s, env_step=4605952, len=34, n/ep=2, n/st=64, player_1/loss=422.454, player_2/loss=57.929, rew=1188.00]


Epoch #4498: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4499: 1025it [00:02, 351.62it/s, env_step=4606976, len=36, n/ep=2, n/st=64, player_1/loss=391.001, player_2/loss=46.424, rew=1373.00]


Epoch #4499: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4500: 1025it [00:02, 352.35it/s, env_step=4608000, len=34, n/ep=2, n/st=64, player_1/loss=226.488, player_2/loss=189.163, rew=1229.00]


Epoch #4500: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4501: 1025it [00:02, 355.52it/s, env_step=4609024, len=24, n/ep=3, n/st=64, player_1/loss=229.342, player_2/loss=279.705, rew=630.67]


Epoch #4501: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4502: 1025it [00:02, 353.08it/s, env_step=4610048, len=18, n/ep=3, n/st=64, player_1/loss=90.519, player_2/loss=144.514, rew=353.33]


Epoch #4502: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4503: 1025it [00:02, 350.54it/s, env_step=4611072, len=21, n/ep=3, n/st=64, player_1/loss=276.776, player_2/loss=174.112, rew=490.67]


Epoch #4503: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4504: 1025it [00:02, 354.17it/s, env_step=4612096, len=17, n/ep=4, n/st=64, player_1/loss=299.422, player_2/loss=553.985, rew=342.50]


Epoch #4504: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4505: 1025it [00:02, 356.02it/s, env_step=4613120, len=26, n/ep=3, n/st=64, player_1/loss=290.925, player_2/loss=752.519, rew=840.67]


Epoch #4505: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4506: 1025it [00:02, 354.30it/s, env_step=4614144, len=17, n/ep=4, n/st=64, player_1/loss=275.198, player_2/loss=1084.353, rew=324.00]


Epoch #4506: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4507: 1025it [00:02, 352.11it/s, env_step=4615168, len=22, n/ep=3, n/st=64, player_1/loss=106.572, player_2/loss=863.837, rew=658.67]


Epoch #4507: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4508: 1025it [00:02, 356.89it/s, env_step=4616192, len=20, n/ep=4, n/st=64, player_1/loss=108.711, player_2/loss=261.845, rew=422.50]


Epoch #4508: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4509: 1025it [00:02, 354.79it/s, env_step=4617216, len=25, n/ep=3, n/st=64, player_1/loss=30.325, player_2/loss=518.509, rew=826.00]


Epoch #4509: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4510: 1025it [00:02, 352.59it/s, env_step=4618240, len=27, n/ep=3, n/st=64, player_1/loss=175.791, player_2/loss=508.131, rew=816.00]


Epoch #4510: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4511: 1025it [00:02, 355.40it/s, env_step=4619264, len=13, n/ep=5, n/st=64, player_1/loss=186.469, player_2/loss=605.995, rew=252.40]


Epoch #4511: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4512: 1025it [00:02, 350.18it/s, env_step=4620288, len=29, n/ep=2, n/st=64, player_1/loss=267.716, player_2/loss=627.225, rew=893.00]


Epoch #4512: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4513: 1025it [00:02, 353.08it/s, env_step=4621312, len=23, n/ep=3, n/st=64, player_1/loss=589.403, player_2/loss=243.196, rew=576.67]


Epoch #4513: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4514: 1025it [00:02, 354.79it/s, env_step=4622336, len=20, n/ep=2, n/st=64, player_1/loss=404.931, player_2/loss=160.378, rew=529.00]


Epoch #4514: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4515: 1025it [00:02, 352.47it/s, env_step=4623360, len=15, n/ep=4, n/st=64, player_1/loss=903.714, player_2/loss=220.863, rew=324.50]


Epoch #4515: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4516: 1025it [00:02, 355.65it/s, env_step=4624384, len=15, n/ep=4, n/st=64, player_1/loss=998.892, player_2/loss=551.370, rew=272.00]


Epoch #4516: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4517: 1025it [00:02, 350.18it/s, env_step=4625408, len=14, n/ep=5, n/st=64, player_1/loss=577.039, player_2/loss=598.398, rew=240.40]


Epoch #4517: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4518: 1025it [00:02, 352.83it/s, env_step=4626432, len=29, n/ep=2, n/st=64, player_1/loss=494.536, player_2/loss=475.341, rew=989.00]


Epoch #4518: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4519: 1025it [00:02, 354.17it/s, env_step=4627456, len=14, n/ep=5, n/st=64, player_1/loss=504.895, player_2/loss=552.824, rew=246.80]


Epoch #4519: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4520: 1025it [00:02, 354.42it/s, env_step=4628480, len=21, n/ep=3, n/st=64, player_1/loss=591.449, player_2/loss=798.079, rew=605.33]


Epoch #4520: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4521: 1025it [00:02, 348.52it/s, env_step=4629504, len=30, n/ep=2, n/st=64, player_1/loss=377.983, player_2/loss=820.383, rew=977.00]


Epoch #4521: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4522: 1025it [00:02, 353.68it/s, env_step=4630528, len=35, n/ep=2, n/st=64, player_1/loss=323.539, player_2/loss=294.282, rew=1274.00]


Epoch #4522: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4523: 1025it [00:02, 354.54it/s, env_step=4631552, len=22, n/ep=2, n/st=64, player_1/loss=174.101, player_2/loss=552.984, rew=767.00]


Epoch #4523: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4524: 1025it [00:02, 353.20it/s, env_step=4632576, len=33, n/ep=2, n/st=64, player_1/loss=454.845, player_2/loss=636.506, rew=1184.00]


Epoch #4524: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4525: 1025it [00:02, 353.08it/s, env_step=4633600, len=22, n/ep=3, n/st=64, player_1/loss=680.856, player_2/loss=627.910, rew=536.00]


Epoch #4525: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4526: 1025it [00:02, 350.18it/s, env_step=4634624, len=22, n/ep=3, n/st=64, player_1/loss=560.005, player_2/loss=1023.107, rew=541.33]


Epoch #4526: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4527: 1025it [00:02, 353.93it/s, env_step=4635648, len=30, n/ep=2, n/st=64, player_1/loss=227.807, player_2/loss=1466.172, rew=1106.00]


Epoch #4527: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4528: 1025it [00:02, 352.71it/s, env_step=4636672, len=36, n/ep=2, n/st=64, player_1/loss=307.776, player_2/loss=913.232, rew=1339.00]


Epoch #4528: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4529: 1025it [00:02, 354.42it/s, env_step=4637696, len=16, n/ep=4, n/st=64, player_1/loss=344.033, player_2/loss=430.085, rew=287.50]


Epoch #4529: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4530: 1025it [00:02, 349.94it/s, env_step=4638720, len=30, n/ep=2, n/st=64, player_1/loss=225.696, player_2/loss=399.995, rew=937.00]


Epoch #4530: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4531: 1025it [00:02, 352.47it/s, env_step=4639744, len=21, n/ep=2, n/st=64, player_1/loss=285.227, player_2/loss=641.709, rew=496.00]


Epoch #4531: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4532: 1025it [00:02, 356.39it/s, env_step=4640768, len=27, n/ep=3, n/st=64, player_1/loss=477.250, player_2/loss=651.824, rew=827.33]


Epoch #4532: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4533: 1025it [00:02, 354.30it/s, env_step=4641792, len=23, n/ep=2, n/st=64, player_1/loss=515.896, player_2/loss=501.217, rew=554.00]


Epoch #4533: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4534: 1025it [00:02, 353.08it/s, env_step=4642816, len=33, n/ep=2, n/st=64, player_1/loss=580.540, player_2/loss=280.853, rew=1154.00]


Epoch #4534: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4535: 1025it [00:02, 353.68it/s, env_step=4643840, len=23, n/ep=3, n/st=64, player_1/loss=715.234, player_2/loss=462.991, rew=593.33]


Epoch #4535: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4536: 1025it [00:02, 355.77it/s, env_step=4644864, len=24, n/ep=3, n/st=64, player_1/loss=533.620, player_2/loss=450.783, rew=738.00]


Epoch #4536: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4537: 1025it [00:02, 353.44it/s, env_step=4645888, len=29, n/ep=2, n/st=64, player_1/loss=469.472, player_2/loss=63.865, rew=940.00]


Epoch #4537: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4538: 1025it [00:02, 356.76it/s, env_step=4646912, len=15, n/ep=4, n/st=64, player_1/loss=413.791, player_2/loss=406.186, rew=238.50]


Epoch #4538: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4539: 1025it [00:02, 354.54it/s, env_step=4647936, len=15, n/ep=4, n/st=64, player_1/loss=393.342, player_2/loss=570.825, rew=240.00]


Epoch #4539: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4540: 1025it [00:03, 341.55it/s, env_step=4648960, len=18, n/ep=3, n/st=64, player_1/loss=345.152, player_2/loss=464.435, rew=356.00]


Epoch #4540: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4541: 1025it [00:02, 352.59it/s, env_step=4649984, len=13, n/ep=5, n/st=64, player_1/loss=263.085, player_2/loss=302.033, rew=205.60]


Epoch #4541: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4542: 1025it [00:02, 354.91it/s, env_step=4651008, len=27, n/ep=2, n/st=64, player_1/loss=534.358, player_2/loss=257.678, rew=824.00]


Epoch #4542: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4543: 1025it [00:02, 353.08it/s, env_step=4652032, len=23, n/ep=2, n/st=64, player_1/loss=456.361, player_2/loss=415.458, rew=576.00]


Epoch #4543: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4544: 1025it [00:02, 350.66it/s, env_step=4653056, len=16, n/ep=3, n/st=64, player_2/loss=277.250, rew=443.33]  


Epoch #4544: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4545: 1025it [00:02, 356.02it/s, env_step=4654080, len=33, n/ep=2, n/st=64, player_1/loss=202.645, player_2/loss=435.504, rew=1120.00]


Epoch #4545: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4546: 1025it [00:02, 356.51it/s, env_step=4655104, len=21, n/ep=3, n/st=64, player_1/loss=661.162, player_2/loss=403.782, rew=558.00]


Epoch #4546: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4547: 1025it [00:02, 355.03it/s, env_step=4656128, len=14, n/ep=4, n/st=64, player_1/loss=753.766, player_2/loss=130.084, rew=354.50]


Epoch #4547: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4548: 1025it [00:02, 352.10it/s, env_step=4657152, len=8, n/ep=6, n/st=64, player_1/loss=372.403, player_2/loss=166.892, rew=72.33]


Epoch #4548: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4549: 1025it [00:02, 349.70it/s, env_step=4658176, len=20, n/ep=4, n/st=64, player_2/loss=118.497, rew=543.50]  


Epoch #4549: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4550: 1025it [00:02, 354.66it/s, env_step=4659200, len=15, n/ep=5, n/st=64, player_1/loss=379.543, player_2/loss=237.360, rew=245.60]


Epoch #4550: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4551: 1025it [00:02, 352.71it/s, env_step=4660224, len=38, n/ep=2, n/st=64, player_1/loss=312.246, player_2/loss=246.140, rew=1511.00]


Epoch #4551: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4552: 1025it [00:02, 351.86it/s, env_step=4661248, len=19, n/ep=4, n/st=64, player_1/loss=381.886, player_2/loss=265.872, rew=482.50]


Epoch #4552: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4553: 1025it [00:02, 356.14it/s, env_step=4662272, len=28, n/ep=2, n/st=64, player_1/loss=133.285, player_2/loss=269.081, rew=835.00]


Epoch #4553: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4554: 1025it [00:02, 350.06it/s, env_step=4663296, len=26, n/ep=2, n/st=64, player_1/loss=278.343, player_2/loss=41.944, rew=733.00]


Epoch #4554: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4555: 1025it [00:02, 355.16it/s, env_step=4664320, len=13, n/ep=5, n/st=64, player_1/loss=314.703, player_2/loss=452.881, rew=229.60]


Epoch #4555: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4556: 1025it [00:02, 354.05it/s, env_step=4665344, len=25, n/ep=2, n/st=64, player_1/loss=392.953, player_2/loss=555.656, rew=746.00]


Epoch #4556: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4557: 1025it [00:02, 355.40it/s, env_step=4666368, len=32, n/ep=2, n/st=64, player_1/loss=353.847, player_2/loss=510.563, rew=1118.00]


Epoch #4557: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4558: 1025it [00:02, 353.20it/s, env_step=4667392, len=34, n/ep=2, n/st=64, player_2/loss=751.277, rew=1229.00] 


Epoch #4558: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4559: 1025it [00:02, 349.94it/s, env_step=4668416, len=25, n/ep=3, n/st=64, player_1/loss=767.025, player_2/loss=397.380, rew=686.00]


Epoch #4559: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4560: 1025it [00:02, 355.77it/s, env_step=4669440, len=28, n/ep=2, n/st=64, player_1/loss=446.669, player_2/loss=186.962, rew=846.00]


Epoch #4560: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4561: 1025it [00:02, 354.79it/s, env_step=4670464, len=13, n/ep=5, n/st=64, player_1/loss=253.201, player_2/loss=622.734, rew=181.60]


Epoch #4561: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4562: 1025it [00:02, 353.32it/s, env_step=4671488, len=15, n/ep=4, n/st=64, player_1/loss=245.908, player_2/loss=559.298, rew=248.00]


Epoch #4562: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4563: 1025it [00:02, 349.47it/s, env_step=4672512, len=29, n/ep=3, n/st=64, player_1/loss=345.927, player_2/loss=225.129, rew=888.67]


Epoch #4563: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4564: 1025it [00:02, 357.01it/s, env_step=4673536, len=18, n/ep=3, n/st=64, player_1/loss=354.270, player_2/loss=218.603, rew=367.33]


Epoch #4564: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4565: 1025it [00:02, 351.99it/s, env_step=4674560, len=25, n/ep=2, n/st=64, player_1/loss=617.006, player_2/loss=307.773, rew=674.00]


Epoch #4565: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4566: 1025it [00:02, 356.02it/s, env_step=4675584, len=27, n/ep=2, n/st=64, player_1/loss=719.466, player_2/loss=354.278, rew=812.00]


Epoch #4566: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4567: 1025it [00:02, 353.81it/s, env_step=4676608, len=30, n/ep=2, n/st=64, player_1/loss=651.280, player_2/loss=185.383, rew=992.00]


Epoch #4567: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4568: 1025it [00:02, 350.42it/s, env_step=4677632, len=12, n/ep=6, n/st=64, player_1/loss=202.712, player_2/loss=121.240, rew=215.67]


Epoch #4568: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4569: 1025it [00:02, 354.54it/s, env_step=4678656, len=18, n/ep=4, n/st=64, player_1/loss=412.030, player_2/loss=595.048, rew=407.50]


Epoch #4569: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4570: 1025it [00:02, 355.65it/s, env_step=4679680, len=21, n/ep=3, n/st=64, player_1/loss=345.759, player_2/loss=674.854, rew=570.00]


Epoch #4570: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4571: 1025it [00:02, 354.79it/s, env_step=4680704, len=12, n/ep=4, n/st=64, player_1/loss=483.983, player_2/loss=203.160, rew=197.00]


Epoch #4571: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4572: 1025it [00:02, 352.11it/s, env_step=4681728, len=20, n/ep=3, n/st=64, player_1/loss=744.702, player_2/loss=62.942, rew=543.33]


Epoch #4572: test_reward: 108.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4573: 1025it [00:02, 349.58it/s, env_step=4682752, len=23, n/ep=2, n/st=64, player_1/loss=513.348, player_2/loss=239.326, rew=756.00]


Epoch #4573: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4574: 1025it [00:02, 354.42it/s, env_step=4683776, len=33, n/ep=2, n/st=64, player_1/loss=722.004, player_2/loss=499.657, rew=1154.00]


Epoch #4574: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4575: 1025it [00:02, 354.05it/s, env_step=4684800, len=19, n/ep=4, n/st=64, player_1/loss=889.111, player_2/loss=748.366, rew=422.00]


Epoch #4575: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4576: 1025it [00:02, 355.03it/s, env_step=4685824, len=34, n/ep=2, n/st=64, player_1/loss=1310.994, player_2/loss=622.916, rew=1252.00]


Epoch #4576: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4577: 1025it [00:02, 355.15it/s, env_step=4686848, len=19, n/ep=3, n/st=64, player_1/loss=993.207, player_2/loss=617.571, rew=494.67]


Epoch #4577: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4578: 1025it [00:02, 350.42it/s, env_step=4687872, len=19, n/ep=3, n/st=64, player_1/loss=350.393, player_2/loss=469.802, rew=528.00]


Epoch #4578: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4579: 1025it [00:02, 352.35it/s, env_step=4688896, len=15, n/ep=4, n/st=64, player_1/loss=289.892, player_2/loss=419.681, rew=261.50]


Epoch #4579: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4580: 1025it [00:02, 357.38it/s, env_step=4689920, len=11, n/ep=4, n/st=64, player_1/loss=212.833, player_2/loss=575.196, rew=141.00]


Epoch #4580: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4581: 1025it [00:02, 351.50it/s, env_step=4690944, len=34, n/ep=2, n/st=64, player_1/loss=213.789, player_2/loss=436.277, rew=1192.00]


Epoch #4581: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4582: 1025it [00:02, 347.81it/s, env_step=4691968, len=22, n/ep=3, n/st=64, player_1/loss=371.861, player_2/loss=721.199, rew=535.33]


Epoch #4582: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4583: 1025it [00:02, 352.35it/s, env_step=4692992, len=23, n/ep=3, n/st=64, player_1/loss=650.577, player_2/loss=849.337, rew=576.67]


Epoch #4583: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4584: 1025it [00:02, 354.54it/s, env_step=4694016, len=17, n/ep=3, n/st=64, player_1/loss=698.882, player_2/loss=935.005, rew=496.00]


Epoch #4584: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4585: 1025it [00:02, 356.14it/s, env_step=4695040, len=34, n/ep=2, n/st=64, player_1/loss=313.064, player_2/loss=695.146, rew=1189.00]


Epoch #4585: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4586: 1025it [00:02, 351.26it/s, env_step=4696064, len=22, n/ep=3, n/st=64, player_1/loss=271.554, player_2/loss=504.159, rew=624.67]


Epoch #4586: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4587: 1025it [00:02, 354.91it/s, env_step=4697088, len=21, n/ep=3, n/st=64, player_1/loss=263.987, player_2/loss=469.331, rew=477.33]


Epoch #4587: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4588: 1025it [00:02, 355.15it/s, env_step=4698112, len=14, n/ep=5, n/st=64, player_1/loss=151.093, player_2/loss=617.961, rew=264.00]


Epoch #4588: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4589: 1025it [00:02, 352.71it/s, env_step=4699136, len=32, n/ep=2, n/st=64, player_1/loss=192.987, player_2/loss=635.424, rew=1099.00]


Epoch #4589: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4590: 1025it [00:02, 350.42it/s, env_step=4700160, len=29, n/ep=3, n/st=64, player_1/loss=163.803, player_2/loss=626.654, rew=998.67]


Epoch #4590: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4591: 1025it [00:02, 356.39it/s, env_step=4701184, len=26, n/ep=3, n/st=64, player_1/loss=114.385, player_2/loss=572.754, rew=868.67]


Epoch #4591: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4592: 1025it [00:02, 353.56it/s, env_step=4702208, len=34, n/ep=2, n/st=64, player_1/loss=123.585, player_2/loss=377.062, rew=1192.00]


Epoch #4592: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4593: 1025it [00:02, 352.83it/s, env_step=4703232, len=17, n/ep=4, n/st=64, player_1/loss=125.226, player_2/loss=295.751, rew=436.50]


Epoch #4593: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4594: 1025it [00:02, 354.30it/s, env_step=4704256, len=17, n/ep=3, n/st=64, player_1/loss=89.123, player_2/loss=366.483, rew=446.00]


Epoch #4594: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4595: 1025it [00:02, 351.62it/s, env_step=4705280, len=35, n/ep=2, n/st=64, player_1/loss=219.452, player_2/loss=797.718, rew=1294.00]


Epoch #4595: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4596: 1025it [00:02, 353.08it/s, env_step=4706304, len=35, n/ep=2, n/st=64, player_1/loss=366.191, player_2/loss=586.876, rew=1351.00]


Epoch #4596: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4597: 1025it [00:02, 353.08it/s, env_step=4707328, len=30, n/ep=2, n/st=64, player_1/loss=279.060, player_2/loss=224.720, rew=989.00]


Epoch #4597: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4598: 1025it [00:02, 354.66it/s, env_step=4708352, len=34, n/ep=2, n/st=64, player_1/loss=346.697, player_2/loss=315.451, rew=1192.00]


Epoch #4598: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4599: 1025it [00:02, 354.91it/s, env_step=4709376, len=29, n/ep=2, n/st=64, player_1/loss=549.228, player_2/loss=669.009, rew=893.00]


Epoch #4599: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4600: 1025it [00:02, 351.38it/s, env_step=4710400, len=18, n/ep=3, n/st=64, player_1/loss=428.504, player_2/loss=454.101, rew=382.00]


Epoch #4600: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4601: 1025it [00:02, 354.54it/s, env_step=4711424, len=17, n/ep=3, n/st=64, player_1/loss=90.890, player_2/loss=405.767, rew=336.00]


Epoch #4601: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4602: 1025it [00:02, 354.66it/s, env_step=4712448, len=30, n/ep=2, n/st=64, player_1/loss=156.974, player_2/loss=404.646, rew=961.00]


Epoch #4602: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4603: 1025it [00:02, 354.54it/s, env_step=4713472, len=26, n/ep=2, n/st=64, player_1/loss=320.886, player_2/loss=393.142, rew=701.00]


Epoch #4603: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4604: 1025it [00:02, 353.81it/s, env_step=4714496, len=25, n/ep=3, n/st=64, player_1/loss=622.936, player_2/loss=569.152, rew=683.33]


Epoch #4604: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4605: 1025it [00:02, 350.90it/s, env_step=4715520, len=36, n/ep=2, n/st=64, player_1/loss=596.562, player_2/loss=289.235, rew=1339.00]


Epoch #4605: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4606: 1025it [00:02, 355.16it/s, env_step=4716544, len=30, n/ep=3, n/st=64, player_1/loss=499.527, player_2/loss=288.426, rew=994.67]


Epoch #4606: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4607: 1025it [00:02, 355.03it/s, env_step=4717568, len=20, n/ep=2, n/st=64, player_1/loss=341.480, player_2/loss=323.184, rew=621.00]


Epoch #4607: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4608: 1025it [00:02, 356.27it/s, env_step=4718592, len=26, n/ep=3, n/st=64, player_1/loss=323.513, player_2/loss=112.139, rew=741.33]


Epoch #4608: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4609: 1025it [00:02, 351.86it/s, env_step=4719616, len=33, n/ep=2, n/st=64, player_1/loss=535.590, player_2/loss=49.510, rew=1154.00]


Epoch #4609: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4610: 1025it [00:02, 352.23it/s, env_step=4720640, len=18, n/ep=3, n/st=64, player_1/loss=410.532, player_2/loss=57.656, rew=540.67]


Epoch #4610: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4611: 1025it [00:02, 353.44it/s, env_step=4721664, len=30, n/ep=2, n/st=64, player_1/loss=173.698, player_2/loss=186.563, rew=944.00]


Epoch #4611: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4612: 1025it [00:02, 355.28it/s, env_step=4722688, len=25, n/ep=2, n/st=64, player_1/loss=339.164, player_2/loss=274.570, rew=674.00]


Epoch #4612: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4613: 1025it [00:02, 353.32it/s, env_step=4723712, len=34, n/ep=2, n/st=64, player_1/loss=520.980, player_2/loss=155.297, rew=1237.00]


Epoch #4613: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4614: 1025it [00:02, 350.06it/s, env_step=4724736, len=23, n/ep=3, n/st=64, player_1/loss=388.863, player_2/loss=300.920, rew=695.33]


Epoch #4614: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4615: 1025it [00:02, 354.79it/s, env_step=4725760, len=21, n/ep=3, n/st=64, player_1/loss=356.921, player_2/loss=407.961, rew=514.00]


Epoch #4615: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4616: 1025it [00:02, 353.93it/s, env_step=4726784, len=29, n/ep=2, n/st=64, player_1/loss=500.160, player_2/loss=366.020, rew=898.00]


Epoch #4616: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4617: 1025it [00:02, 354.42it/s, env_step=4727808, len=28, n/ep=2, n/st=64, player_1/loss=381.409, player_2/loss=288.123, rew=910.00]


Epoch #4617: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4618: 1025it [00:02, 348.99it/s, env_step=4728832, len=22, n/ep=3, n/st=64, player_1/loss=413.640, player_2/loss=154.009, rew=564.67]


Epoch #4618: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4619: 1025it [00:02, 351.50it/s, env_step=4729856, len=29, n/ep=3, n/st=64, player_1/loss=419.265, player_2/loss=261.884, rew=902.67]


Epoch #4619: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4620: 1025it [00:02, 349.47it/s, env_step=4730880, len=34, n/ep=2, n/st=64, player_1/loss=175.248, player_2/loss=262.982, rew=1229.00]


Epoch #4620: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4621: 1025it [00:02, 354.91it/s, env_step=4731904, len=32, n/ep=2, n/st=64, player_1/loss=85.191, player_2/loss=458.035, rew=1054.00]


Epoch #4621: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4622: 1025it [00:02, 353.44it/s, env_step=4732928, len=17, n/ep=3, n/st=64, player_1/loss=81.409, player_2/loss=599.033, rew=423.33]


Epoch #4622: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4623: 1025it [00:02, 349.23it/s, env_step=4733952, len=29, n/ep=3, n/st=64, player_1/loss=135.550, player_2/loss=829.781, rew=909.33]


Epoch #4623: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4624: 1025it [00:02, 352.71it/s, env_step=4734976, len=29, n/ep=2, n/st=64, player_1/loss=115.250, player_2/loss=1074.661, rew=869.00]


Epoch #4624: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4625: 1025it [00:02, 356.51it/s, env_step=4736000, len=40, n/ep=1, n/st=64, player_1/loss=62.459, player_2/loss=605.252, rew=1638.00]


Epoch #4625: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4626: 1025it [00:02, 352.23it/s, env_step=4737024, len=33, n/ep=2, n/st=64, player_1/loss=65.014, player_2/loss=127.388, rew=1156.00]


Epoch #4626: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4627: 1025it [00:02, 357.01it/s, env_step=4738048, len=12, n/ep=5, n/st=64, player_1/loss=296.853, player_2/loss=110.641, rew=250.80]


Epoch #4627: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4628: 1025it [00:02, 351.02it/s, env_step=4739072, len=14, n/ep=4, n/st=64, player_1/loss=547.142, player_2/loss=250.487, rew=211.00]


Epoch #4628: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4629: 1025it [00:02, 353.93it/s, env_step=4740096, len=13, n/ep=3, n/st=64, player_1/loss=349.608, player_2/loss=470.934, rew=204.00]


Epoch #4629: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4630: 1025it [00:02, 352.59it/s, env_step=4741120, len=18, n/ep=4, n/st=64, player_1/loss=166.121, player_2/loss=568.062, rew=480.00]


Epoch #4630: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4631: 1025it [00:02, 352.83it/s, env_step=4742144, len=26, n/ep=2, n/st=64, player_1/loss=223.193, player_2/loss=448.407, rew=967.00]


Epoch #4631: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4632: 1025it [00:02, 348.52it/s, env_step=4743168, len=37, n/ep=1, n/st=64, player_1/loss=215.696, player_2/loss=113.682, rew=1404.00]


Epoch #4632: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4633: 1025it [00:02, 355.40it/s, env_step=4744192, len=31, n/ep=2, n/st=64, player_1/loss=169.140, player_2/loss=74.178, rew=1022.00]


Epoch #4633: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4634: 1025it [00:02, 355.65it/s, env_step=4745216, len=31, n/ep=3, n/st=64, player_1/loss=266.901, player_2/loss=336.994, rew=1039.33]


Epoch #4634: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4635: 1025it [00:02, 354.54it/s, env_step=4746240, len=33, n/ep=2, n/st=64, player_1/loss=216.473, player_2/loss=395.934, rew=1121.00]


Epoch #4635: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4636: 1025it [00:02, 353.44it/s, env_step=4747264, len=24, n/ep=3, n/st=64, player_1/loss=111.747, player_2/loss=199.447, rew=708.67]


Epoch #4636: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4637: 1025it [00:02, 348.04it/s, env_step=4748288, len=17, n/ep=3, n/st=64, player_1/loss=72.436, player_2/loss=677.450, rew=419.33]


Epoch #4637: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4638: 1025it [00:02, 355.03it/s, env_step=4749312, len=16, n/ep=4, n/st=64, player_1/loss=356.274, player_2/loss=673.690, rew=275.00]


Epoch #4638: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4639: 1025it [00:02, 352.23it/s, env_step=4750336, len=17, n/ep=4, n/st=64, player_1/loss=537.491, player_2/loss=101.500, rew=339.50]


Epoch #4639: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4640: 1025it [00:02, 352.47it/s, env_step=4751360, len=23, n/ep=4, n/st=64, player_1/loss=310.119, player_2/loss=46.231, rew=630.00]


Epoch #4640: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4641: 1025it [00:02, 349.94it/s, env_step=4752384, len=25, n/ep=2, n/st=64, player_1/loss=162.862, player_2/loss=432.691, rew=676.00]


Epoch #4641: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4642: 1025it [00:03, 334.86it/s, env_step=4753408, len=21, n/ep=3, n/st=64, player_1/loss=211.625, player_2/loss=540.348, rew=564.00]


Epoch #4642: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4643: 1025it [00:02, 348.28it/s, env_step=4754432, len=17, n/ep=3, n/st=64, player_1/loss=169.364, player_2/loss=342.968, rew=338.67]


Epoch #4643: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4644: 1025it [00:02, 349.23it/s, env_step=4755456, len=27, n/ep=2, n/st=64, player_1/loss=235.694, player_2/loss=684.431, rew=994.00]


Epoch #4644: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4645: 1025it [00:02, 353.56it/s, env_step=4756480, len=29, n/ep=2, n/st=64, player_2/loss=790.519, rew=893.00]  


Epoch #4645: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4646: 1025it [00:02, 351.62it/s, env_step=4757504, len=23, n/ep=3, n/st=64, player_1/loss=576.152, player_2/loss=1091.952, rew=610.67]


Epoch #4646: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4647: 1025it [00:02, 348.99it/s, env_step=4758528, len=31, n/ep=2, n/st=64, player_1/loss=149.458, player_2/loss=1152.934, rew=991.00]


Epoch #4647: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4648: 1025it [00:02, 354.05it/s, env_step=4759552, len=23, n/ep=3, n/st=64, player_1/loss=621.345, player_2/loss=447.721, rew=623.33]


Epoch #4648: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4649: 1025it [00:02, 353.08it/s, env_step=4760576, len=14, n/ep=5, n/st=64, player_1/loss=838.413, player_2/loss=280.037, rew=267.60]


Epoch #4649: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4650: 1025it [00:02, 352.71it/s, env_step=4761600, len=33, n/ep=2, n/st=64, player_1/loss=549.059, player_2/loss=514.861, rew=1154.00]


Epoch #4650: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4651: 1025it [00:02, 349.82it/s, env_step=4762624, len=39, n/ep=1, n/st=64, player_1/loss=397.568, player_2/loss=471.277, rew=1558.00]


Epoch #4651: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4652: 1025it [00:02, 353.32it/s, env_step=4763648, len=31, n/ep=2, n/st=64, player_1/loss=176.018, player_2/loss=685.704, rew=1034.00]


Epoch #4652: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4653: 1025it [00:02, 354.79it/s, env_step=4764672, len=23, n/ep=2, n/st=64, player_1/loss=381.084, player_2/loss=970.296, rew=650.00]


Epoch #4653: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4654: 1025it [00:02, 351.50it/s, env_step=4765696, len=26, n/ep=3, n/st=64, player_1/loss=325.850, rew=729.33]  


Epoch #4654: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4655: 1025it [00:02, 355.52it/s, env_step=4766720, len=32, n/ep=3, n/st=64, player_1/loss=335.319, player_2/loss=301.308, rew=1109.33]


Epoch #4655: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4656: 1025it [00:02, 349.35it/s, env_step=4767744, len=34, n/ep=2, n/st=64, player_1/loss=913.849, player_2/loss=344.721, rew=1188.00]


Epoch #4656: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4657: 1025it [00:02, 344.65it/s, env_step=4768768, len=14, n/ep=4, n/st=64, player_1/loss=738.371, player_2/loss=1076.014, rew=265.00]


Epoch #4657: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4658: 1025it [00:02, 346.63it/s, env_step=4769792, len=33, n/ep=2, n/st=64, player_1/loss=236.535, player_2/loss=1226.342, rew=1154.00]


Epoch #4658: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4659: 1025it [00:02, 354.17it/s, env_step=4770816, len=34, n/ep=2, n/st=64, player_1/loss=245.223, player_2/loss=612.904, rew=1192.00]


Epoch #4659: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4660: 1025it [00:02, 354.91it/s, env_step=4771840, len=23, n/ep=2, n/st=64, player_1/loss=182.304, player_2/loss=298.478, rew=554.00]


Epoch #4660: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4661: 1025it [00:02, 354.30it/s, env_step=4772864, len=23, n/ep=3, n/st=64, player_1/loss=399.768, player_2/loss=177.975, rew=628.67]


Epoch #4661: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4662: 1025it [00:02, 352.71it/s, env_step=4773888, len=38, n/ep=2, n/st=64, player_1/loss=341.804, rew=1521.00] 


Epoch #4662: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4663: 1025it [00:02, 354.91it/s, env_step=4774912, len=32, n/ep=2, n/st=64, player_1/loss=601.154, player_2/loss=181.549, rew=1087.00]


Epoch #4663: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4664: 1025it [00:02, 352.95it/s, env_step=4775936, len=19, n/ep=3, n/st=64, player_1/loss=511.729, player_2/loss=124.228, rew=554.67]


Epoch #4664: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4665: 1025it [00:02, 354.30it/s, env_step=4776960, len=13, n/ep=4, n/st=64, player_1/loss=197.429, player_2/loss=121.790, rew=188.50]


Epoch #4665: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4666: 1025it [00:02, 354.54it/s, env_step=4777984, len=30, n/ep=2, n/st=64, player_1/loss=361.344, player_2/loss=419.119, rew=959.00]


Epoch #4666: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4667: 1025it [00:02, 354.05it/s, env_step=4779008, len=34, n/ep=2, n/st=64, player_1/loss=288.886, player_2/loss=533.379, rew=1188.00]


Epoch #4667: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4668: 1025it [00:02, 356.27it/s, env_step=4780032, len=28, n/ep=2, n/st=64, player_1/loss=361.978, player_2/loss=244.046, rew=1006.00]


Epoch #4668: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4669: 1025it [00:02, 347.33it/s, env_step=4781056, len=15, n/ep=4, n/st=64, player_1/loss=516.716, player_2/loss=104.887, rew=240.00]


Epoch #4669: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4670: 1025it [00:02, 351.38it/s, env_step=4782080, len=21, n/ep=2, n/st=64, player_1/loss=272.588, player_2/loss=842.535, rew=524.00]


Epoch #4670: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4671: 1025it [00:02, 353.81it/s, env_step=4783104, len=14, n/ep=4, n/st=64, player_1/loss=258.716, player_2/loss=665.947, rew=230.00]


Epoch #4671: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4672: 1025it [00:02, 355.16it/s, env_step=4784128, len=22, n/ep=3, n/st=64, player_1/loss=157.883, player_2/loss=382.990, rew=572.67]


Epoch #4672: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4673: 1025it [00:02, 355.16it/s, env_step=4785152, len=20, n/ep=4, n/st=64, player_1/loss=198.629, player_2/loss=237.885, rew=550.50]


Epoch #4673: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4674: 1025it [00:02, 355.16it/s, env_step=4786176, len=22, n/ep=3, n/st=64, player_1/loss=598.925, player_2/loss=93.193, rew=544.67]


Epoch #4674: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4675: 1025it [00:02, 348.75it/s, env_step=4787200, len=25, n/ep=3, n/st=64, player_1/loss=767.508, player_2/loss=127.437, rew=722.67]


Epoch #4675: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4676: 1025it [00:02, 355.89it/s, env_step=4788224, len=27, n/ep=2, n/st=64, player_1/loss=182.120, player_2/loss=568.868, rew=788.00]


Epoch #4676: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4677: 1025it [00:02, 352.59it/s, env_step=4789248, len=25, n/ep=2, n/st=64, player_1/loss=138.011, player_2/loss=642.644, rew=676.00]


Epoch #4677: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4678: 1025it [00:02, 354.79it/s, env_step=4790272, len=38, n/ep=2, n/st=64, player_1/loss=305.540, player_2/loss=226.098, rew=1519.00]


Epoch #4678: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4679: 1025it [00:02, 354.54it/s, env_step=4791296, len=28, n/ep=2, n/st=64, player_1/loss=140.850, player_2/loss=265.333, rew=881.00]


Epoch #4679: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4680: 1025it [00:02, 346.05it/s, env_step=4792320, len=22, n/ep=2, n/st=64, player_1/loss=151.898, player_2/loss=263.135, rew=700.00]


Epoch #4680: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4681: 1025it [00:02, 352.35it/s, env_step=4793344, len=24, n/ep=2, n/st=64, player_1/loss=212.603, player_2/loss=247.765, rew=713.00]


Epoch #4681: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4682: 1025it [00:02, 354.79it/s, env_step=4794368, len=25, n/ep=3, n/st=64, player_1/loss=328.025, player_2/loss=96.583, rew=753.33]


Epoch #4682: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4683: 1025it [00:02, 354.91it/s, env_step=4795392, len=26, n/ep=3, n/st=64, player_1/loss=320.580, player_2/loss=410.883, rew=852.00]


Epoch #4683: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4684: 1025it [00:02, 348.40it/s, env_step=4796416, len=13, n/ep=5, n/st=64, player_1/loss=188.234, player_2/loss=464.661, rew=183.20]


Epoch #4684: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4685: 1025it [00:02, 349.82it/s, env_step=4797440, len=20, n/ep=3, n/st=64, player_1/loss=180.000, player_2/loss=366.528, rew=510.00]


Epoch #4685: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4686: 1025it [00:02, 353.08it/s, env_step=4798464, len=19, n/ep=4, n/st=64, player_1/loss=116.127, player_2/loss=651.530, rew=498.00]


Epoch #4686: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4687: 1025it [00:02, 352.71it/s, env_step=4799488, len=35, n/ep=2, n/st=64, player_1/loss=92.653, player_2/loss=763.655, rew=1259.00]


Epoch #4687: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4688: 1025it [00:02, 349.47it/s, env_step=4800512, len=17, n/ep=4, n/st=64, player_1/loss=154.417, player_2/loss=485.974, rew=342.00]


Epoch #4688: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4689: 1025it [00:02, 354.42it/s, env_step=4801536, len=16, n/ep=4, n/st=64, player_1/loss=286.239, player_2/loss=387.052, rew=297.50]


Epoch #4689: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4690: 1025it [00:02, 354.30it/s, env_step=4802560, len=32, n/ep=2, n/st=64, player_1/loss=439.333, player_2/loss=496.072, rew=1087.00]


Epoch #4690: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4691: 1025it [00:02, 350.18it/s, env_step=4803584, len=30, n/ep=2, n/st=64, player_1/loss=291.379, player_2/loss=348.057, rew=944.00]


Epoch #4691: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4692: 1025it [00:02, 354.79it/s, env_step=4804608, len=32, n/ep=2, n/st=64, player_1/loss=229.538, player_2/loss=627.509, rew=1118.00]


Epoch #4692: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4693: 1025it [00:02, 351.02it/s, env_step=4805632, len=23, n/ep=3, n/st=64, player_1/loss=503.808, player_2/loss=859.928, rew=678.67]


Epoch #4693: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4694: 1025it [00:02, 356.89it/s, env_step=4806656, len=40, n/ep=1, n/st=64, player_1/loss=473.155, player_2/loss=438.576, rew=1638.00]


Epoch #4694: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4695: 1025it [00:02, 356.02it/s, env_step=4807680, len=34, n/ep=1, n/st=64, player_1/loss=174.794, player_2/loss=482.257, rew=1188.00]


Epoch #4695: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4696: 1025it [00:02, 357.88it/s, env_step=4808704, len=41, n/ep=1, n/st=64, player_1/loss=55.855, player_2/loss=472.870, rew=1720.00]


Epoch #4696: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4697: 1025it [00:02, 356.27it/s, env_step=4809728, len=28, n/ep=2, n/st=64, player_1/loss=200.793, player_2/loss=268.767, rew=859.00]


Epoch #4697: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4698: 1025it [00:02, 349.82it/s, env_step=4810752, len=32, n/ep=2, n/st=64, player_1/loss=289.411, player_2/loss=617.201, rew=1063.00]


Epoch #4698: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4699: 1025it [00:02, 358.26it/s, env_step=4811776, len=12, n/ep=5, n/st=64, player_1/loss=221.005, player_2/loss=589.180, rew=164.80]


Epoch #4699: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4700: 1025it [00:02, 358.26it/s, env_step=4812800, len=22, n/ep=3, n/st=64, player_1/loss=222.952, player_2/loss=250.991, rew=552.67]


Epoch #4700: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4701: 1025it [00:02, 355.77it/s, env_step=4813824, len=31, n/ep=2, n/st=64, player_1/loss=255.783, player_2/loss=429.311, rew=1024.00]


Epoch #4701: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4702: 1025it [00:02, 350.30it/s, env_step=4814848, len=33, n/ep=2, n/st=64, player_1/loss=176.198, player_2/loss=677.200, rew=1156.00]


Epoch #4702: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4703: 1025it [00:02, 353.56it/s, env_step=4815872, len=32, n/ep=1, n/st=64, player_1/loss=375.951, player_2/loss=437.087, rew=1054.00]


Epoch #4703: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4704: 1025it [00:02, 356.76it/s, env_step=4816896, len=30, n/ep=1, n/st=64, player_1/loss=424.327, player_2/loss=668.872, rew=928.00]


Epoch #4704: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4705: 1025it [00:02, 356.89it/s, env_step=4817920, len=18, n/ep=3, n/st=64, player_1/loss=108.097, player_2/loss=610.499, rew=474.00]


Epoch #4705: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4706: 1025it [00:02, 357.13it/s, env_step=4818944, len=38, n/ep=2, n/st=64, player_1/loss=29.899, player_2/loss=262.508, rew=1519.00]


Epoch #4706: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4707: 1025it [00:02, 353.44it/s, env_step=4819968, len=38, n/ep=1, n/st=64, player_1/loss=42.266, player_2/loss=264.954, rew=1480.00]


Epoch #4707: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4708: 1025it [00:02, 354.54it/s, env_step=4820992, len=31, n/ep=2, n/st=64, player_1/loss=222.664, player_2/loss=83.560, rew=1028.00]


Epoch #4708: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4709: 1025it [00:02, 356.51it/s, env_step=4822016, len=21, n/ep=3, n/st=64, player_1/loss=952.845, player_2/loss=293.115, rew=481.33]


Epoch #4709: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4710: 1025it [00:02, 355.65it/s, env_step=4823040, len=25, n/ep=3, n/st=64, player_1/loss=965.053, player_2/loss=602.694, rew=839.33]


Epoch #4710: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4711: 1025it [00:02, 357.63it/s, env_step=4824064, len=38, n/ep=1, n/st=64, player_1/loss=444.041, player_2/loss=586.748, rew=1480.00]


Epoch #4711: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4712: 1025it [00:02, 352.35it/s, env_step=4825088, len=28, n/ep=2, n/st=64, player_1/loss=260.722, rew=839.00]  


Epoch #4712: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4713: 1025it [00:02, 354.54it/s, env_step=4826112, len=36, n/ep=1, n/st=64, player_1/loss=379.328, player_2/loss=567.206, rew=1330.00]


Epoch #4713: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4714: 1025it [00:02, 357.26it/s, env_step=4827136, len=31, n/ep=2, n/st=64, player_1/loss=696.864, player_2/loss=464.596, rew=994.00]


Epoch #4714: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4715: 1025it [00:02, 357.26it/s, env_step=4828160, len=33, n/ep=1, n/st=64, player_1/loss=811.725, player_2/loss=217.490, rew=1120.00]


Epoch #4715: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4716: 1025it [00:02, 352.83it/s, env_step=4829184, len=24, n/ep=3, n/st=64, player_1/loss=704.756, player_2/loss=73.261, rew=770.67]


Epoch #4716: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4717: 1025it [00:02, 356.27it/s, env_step=4830208, len=29, n/ep=2, n/st=64, player_2/loss=390.016, rew=872.00]  


Epoch #4717: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4718: 1025it [00:02, 352.83it/s, env_step=4831232, len=16, n/ep=4, n/st=64, player_1/loss=577.851, player_2/loss=256.606, rew=312.00]


Epoch #4718: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4719: 1025it [00:02, 357.01it/s, env_step=4832256, len=16, n/ep=3, n/st=64, player_1/loss=841.702, player_2/loss=281.104, rew=432.00]


Epoch #4719: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4720: 1025it [00:02, 352.23it/s, env_step=4833280, len=13, n/ep=4, n/st=64, player_1/loss=658.192, player_2/loss=423.181, rew=337.50]


Epoch #4720: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4721: 1025it [00:02, 349.11it/s, env_step=4834304, len=27, n/ep=2, n/st=64, player_1/loss=365.120, player_2/loss=220.524, rew=754.00]


Epoch #4721: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4722: 1025it [00:02, 353.56it/s, env_step=4835328, len=28, n/ep=2, n/st=64, player_1/loss=626.053, rew=839.00]  


Epoch #4722: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4723: 1025it [00:02, 356.76it/s, env_step=4836352, len=33, n/ep=2, n/st=64, player_1/loss=597.999, player_2/loss=228.106, rew=1166.00]


Epoch #4723: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4724: 1025it [00:02, 354.79it/s, env_step=4837376, len=31, n/ep=1, n/st=64, player_1/loss=284.418, player_2/loss=398.100, rew=990.00]


Epoch #4724: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4725: 1025it [00:02, 351.99it/s, env_step=4838400, len=18, n/ep=3, n/st=64, player_1/loss=286.814, player_2/loss=718.995, rew=392.00]


Epoch #4725: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4726: 1025it [00:02, 353.93it/s, env_step=4839424, len=35, n/ep=2, n/st=64, player_1/loss=264.100, player_2/loss=561.795, rew=1314.00]


Epoch #4726: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4727: 1025it [00:02, 355.40it/s, env_step=4840448, len=36, n/ep=2, n/st=64, player_1/loss=369.070, player_2/loss=291.211, rew=1367.00]


Epoch #4727: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4728: 1025it [00:02, 356.51it/s, env_step=4841472, len=33, n/ep=2, n/st=64, player_1/loss=416.885, player_2/loss=555.104, rew=1241.00]


Epoch #4728: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4729: 1025it [00:02, 355.03it/s, env_step=4842496, len=18, n/ep=4, n/st=64, player_1/loss=923.807, player_2/loss=658.381, rew=381.50]


Epoch #4729: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4730: 1025it [00:02, 351.38it/s, env_step=4843520, len=26, n/ep=3, n/st=64, player_1/loss=1118.384, player_2/loss=676.627, rew=700.67]


Epoch #4730: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4731: 1025it [00:02, 355.65it/s, env_step=4844544, len=28, n/ep=2, n/st=64, player_1/loss=858.451, player_2/loss=421.517, rew=814.00]


Epoch #4731: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4732: 1025it [00:02, 353.20it/s, env_step=4845568, len=24, n/ep=3, n/st=64, player_1/loss=722.803, player_2/loss=251.169, rew=632.67]


Epoch #4732: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4733: 1025it [00:02, 354.66it/s, env_step=4846592, len=25, n/ep=2, n/st=64, player_1/loss=177.391, player_2/loss=372.362, rew=716.00]


Epoch #4733: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4734: 1025it [00:02, 355.52it/s, env_step=4847616, len=31, n/ep=3, n/st=64, player_1/loss=154.786, player_2/loss=429.547, rew=1070.00]


Epoch #4734: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4735: 1025it [00:02, 354.91it/s, env_step=4848640, len=28, n/ep=3, n/st=64, player_1/loss=340.725, player_2/loss=135.293, rew=952.67]


Epoch #4735: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4736: 1025it [00:02, 356.14it/s, env_step=4849664, len=14, n/ep=4, n/st=64, player_1/loss=594.023, player_2/loss=88.197, rew=350.00]


Epoch #4736: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4737: 1025it [00:02, 351.74it/s, env_step=4850688, len=20, n/ep=3, n/st=64, player_1/loss=588.140, player_2/loss=867.098, rew=622.00]


Epoch #4737: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4738: 1025it [00:02, 357.13it/s, env_step=4851712, len=31, n/ep=1, n/st=64, player_1/loss=488.865, player_2/loss=1053.027, rew=990.00]


Epoch #4738: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4739: 1025it [00:02, 349.94it/s, env_step=4852736, len=33, n/ep=2, n/st=64, player_1/loss=423.850, player_2/loss=292.106, rew=1174.00]


Epoch #4739: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4740: 1025it [00:02, 356.27it/s, env_step=4853760, len=37, n/ep=2, n/st=64, player_1/loss=155.928, player_2/loss=110.360, rew=1405.00]


Epoch #4740: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4741: 1025it [00:02, 355.40it/s, env_step=4854784, len=33, n/ep=2, n/st=64, player_1/loss=61.866, player_2/loss=70.488, rew=1129.00]


Epoch #4741: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4742: 1025it [00:02, 357.63it/s, env_step=4855808, len=25, n/ep=2, n/st=64, player_1/loss=39.681, player_2/loss=83.052, rew=746.00]


Epoch #4742: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4743: 1025it [00:03, 340.19it/s, env_step=4856832, len=27, n/ep=3, n/st=64, player_1/loss=212.715, player_2/loss=495.233, rew=842.67]


Epoch #4743: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4744: 1025it [00:02, 353.08it/s, env_step=4857856, len=36, n/ep=2, n/st=64, player_1/loss=233.117, player_2/loss=477.745, rew=1334.00]


Epoch #4744: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4745: 1025it [00:02, 355.65it/s, env_step=4858880, len=25, n/ep=3, n/st=64, player_1/loss=50.237, player_2/loss=509.087, rew=862.67]


Epoch #4745: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4746: 1025it [00:02, 355.03it/s, env_step=4859904, len=33, n/ep=2, n/st=64, player_1/loss=184.474, player_2/loss=666.646, rew=1145.00]


Epoch #4746: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4747: 1025it [00:02, 354.42it/s, env_step=4860928, len=36, n/ep=2, n/st=64, player_1/loss=257.357, player_2/loss=240.664, rew=1367.00]


Epoch #4747: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4748: 1025it [00:02, 351.38it/s, env_step=4861952, len=30, n/ep=2, n/st=64, player_1/loss=885.115, player_2/loss=87.697, rew=1049.00]


Epoch #4748: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4749: 1025it [00:02, 355.52it/s, env_step=4862976, len=31, n/ep=2, n/st=64, player_1/loss=1062.441, player_2/loss=605.766, rew=991.00]


Epoch #4749: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4750: 1025it [00:02, 355.77it/s, env_step=4864000, len=30, n/ep=2, n/st=64, player_1/loss=628.489, player_2/loss=891.856, rew=929.00]


Epoch #4750: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4751: 1025it [00:02, 357.13it/s, env_step=4865024, len=31, n/ep=2, n/st=64, player_1/loss=286.682, player_2/loss=620.346, rew=1054.00]


Epoch #4751: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4752: 1025it [00:02, 356.14it/s, env_step=4866048, len=33, n/ep=2, n/st=64, player_1/loss=173.690, player_2/loss=1153.352, rew=1145.00]


Epoch #4752: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4753: 1025it [00:02, 350.54it/s, env_step=4867072, len=30, n/ep=2, n/st=64, player_1/loss=211.234, player_2/loss=1418.268, rew=932.00]


Epoch #4753: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4754: 1025it [00:02, 354.91it/s, env_step=4868096, len=34, n/ep=2, n/st=64, player_1/loss=409.431, player_2/loss=1409.526, rew=1279.00]


Epoch #4754: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4755: 1025it [00:02, 358.38it/s, env_step=4869120, len=36, n/ep=2, n/st=64, player_1/loss=547.570, player_2/loss=556.726, rew=1412.00]


Epoch #4755: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4756: 1025it [00:02, 356.64it/s, env_step=4870144, len=35, n/ep=2, n/st=64, player_1/loss=383.476, player_2/loss=546.813, rew=1267.00]


Epoch #4756: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4757: 1025it [00:02, 349.70it/s, env_step=4871168, len=32, n/ep=2, n/st=64, player_1/loss=248.742, player_2/loss=197.655, rew=1090.00]


Epoch #4757: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4758: 1025it [00:02, 354.79it/s, env_step=4872192, len=30, n/ep=3, n/st=64, player_1/loss=340.507, player_2/loss=374.719, rew=1160.00]


Epoch #4758: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4759: 1025it [00:02, 356.88it/s, env_step=4873216, len=28, n/ep=2, n/st=64, player_1/loss=446.121, player_2/loss=382.706, rew=881.00]


Epoch #4759: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4760: 1025it [00:02, 355.40it/s, env_step=4874240, len=27, n/ep=3, n/st=64, player_1/loss=610.690, player_2/loss=184.617, rew=792.67]


Epoch #4760: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4761: 1025it [00:02, 356.27it/s, env_step=4875264, len=23, n/ep=2, n/st=64, player_1/loss=434.933, player_2/loss=679.706, rew=559.00]


Epoch #4761: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4762: 1025it [00:02, 352.95it/s, env_step=4876288, len=26, n/ep=2, n/st=64, player_1/loss=217.020, player_2/loss=813.725, rew=736.00]


Epoch #4762: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4763: 1025it [00:02, 354.91it/s, env_step=4877312, len=39, n/ep=1, n/st=64, player_1/loss=156.511, player_2/loss=522.661, rew=1558.00]


Epoch #4763: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4764: 1025it [00:02, 357.38it/s, env_step=4878336, len=29, n/ep=2, n/st=64, player_1/loss=258.710, player_2/loss=272.322, rew=917.00]


Epoch #4764: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4765: 1025it [00:02, 355.03it/s, env_step=4879360, len=31, n/ep=2, n/st=64, player_1/loss=338.064, player_2/loss=397.168, rew=1054.00]


Epoch #4765: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4766: 1025it [00:02, 355.28it/s, env_step=4880384, len=27, n/ep=2, n/st=64, player_1/loss=441.889, player_2/loss=233.448, rew=782.00]


Epoch #4766: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4767: 1025it [00:02, 351.74it/s, env_step=4881408, len=30, n/ep=2, n/st=64, player_1/loss=340.915, player_2/loss=559.584, rew=932.00]


Epoch #4767: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4768: 1025it [00:02, 356.14it/s, env_step=4882432, len=21, n/ep=3, n/st=64, player_1/loss=259.643, player_2/loss=588.404, rew=518.67]


Epoch #4768: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4769: 1025it [00:02, 359.01it/s, env_step=4883456, len=28, n/ep=2, n/st=64, player_1/loss=473.914, player_2/loss=335.545, rew=851.00]


Epoch #4769: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4770: 1025it [00:02, 354.79it/s, env_step=4884480, len=24, n/ep=2, n/st=64, player_1/loss=646.487, player_2/loss=322.497, rew=598.00]


Epoch #4770: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4771: 1025it [00:02, 355.65it/s, env_step=4885504, len=39, n/ep=2, n/st=64, player_1/loss=1030.503, player_2/loss=82.500, rew=1558.00]


Epoch #4771: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4772: 1025it [00:02, 349.58it/s, env_step=4886528, len=28, n/ep=2, n/st=64, player_1/loss=686.691, player_2/loss=102.548, rew=851.00]


Epoch #4772: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4773: 1025it [00:02, 358.13it/s, env_step=4887552, len=21, n/ep=3, n/st=64, player_1/loss=297.246, player_2/loss=220.594, rew=490.00]


Epoch #4773: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4774: 1025it [00:02, 353.32it/s, env_step=4888576, len=27, n/ep=3, n/st=64, player_1/loss=254.085, player_2/loss=188.542, rew=800.67]


Epoch #4774: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4775: 1025it [00:02, 357.13it/s, env_step=4889600, len=38, n/ep=1, n/st=64, player_1/loss=168.217, player_2/loss=970.383, rew=1480.00]


Epoch #4775: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4776: 1025it [00:02, 356.14it/s, env_step=4890624, len=28, n/ep=3, n/st=64, player_1/loss=50.317, player_2/loss=1100.840, rew=854.67]


Epoch #4776: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4777: 1025it [00:02, 351.50it/s, env_step=4891648, len=26, n/ep=2, n/st=64, player_1/loss=147.044, player_2/loss=318.899, rew=799.00]


Epoch #4777: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4778: 1025it [00:02, 354.30it/s, env_step=4892672, len=42, n/ep=1, n/st=64, player_1/loss=245.959, player_2/loss=60.409, rew=1834.00]


Epoch #4778: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4779: 1025it [00:02, 356.64it/s, env_step=4893696, len=21, n/ep=3, n/st=64, player_1/loss=358.012, player_2/loss=65.309, rew=490.67]


Epoch #4779: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4780: 1025it [00:02, 357.51it/s, env_step=4894720, len=23, n/ep=3, n/st=64, player_1/loss=267.974, player_2/loss=667.167, rew=580.67]


Epoch #4780: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4781: 1025it [00:02, 350.90it/s, env_step=4895744, len=28, n/ep=2, n/st=64, player_1/loss=289.362, player_2/loss=829.987, rew=810.00]


Epoch #4781: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4782: 1025it [00:02, 355.77it/s, env_step=4896768, len=32, n/ep=2, n/st=64, player_1/loss=454.891, player_2/loss=189.070, rew=1090.00]


Epoch #4782: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4783: 1025it [00:02, 360.52it/s, env_step=4897792, len=32, n/ep=2, n/st=64, player_1/loss=674.852, player_2/loss=129.931, rew=1129.00]


Epoch #4783: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4784: 1025it [00:02, 354.05it/s, env_step=4898816, len=19, n/ep=2, n/st=64, player_1/loss=804.004, player_2/loss=123.065, rew=400.00]


Epoch #4784: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4785: 1025it [00:02, 354.42it/s, env_step=4899840, len=34, n/ep=2, n/st=64, player_1/loss=566.284, player_2/loss=141.258, rew=1243.00]


Epoch #4785: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4786: 1025it [00:02, 350.90it/s, env_step=4900864, len=22, n/ep=4, n/st=64, player_1/loss=381.453, player_2/loss=392.240, rew=531.00]


Epoch #4786: test_reward: 868.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4787: 1025it [00:02, 355.65it/s, env_step=4901888, len=35, n/ep=2, n/st=64, player_1/loss=206.799, player_2/loss=839.651, rew=1294.00]


Epoch #4787: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4788: 1025it [00:02, 355.77it/s, env_step=4902912, len=32, n/ep=2, n/st=64, player_1/loss=329.748, player_2/loss=707.919, rew=1107.00]


Epoch #4788: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4789: 1025it [00:02, 356.51it/s, env_step=4903936, len=35, n/ep=2, n/st=64, player_1/loss=310.279, player_2/loss=287.455, rew=1267.00]


Epoch #4789: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4790: 1025it [00:02, 358.13it/s, env_step=4904960, len=36, n/ep=2, n/st=64, player_1/loss=214.122, player_2/loss=97.444, rew=1367.00]


Epoch #4790: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4791: 1025it [00:02, 347.81it/s, env_step=4905984, len=26, n/ep=2, n/st=64, player_1/loss=193.675, player_2/loss=78.373, rew=764.00]


Epoch #4791: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4792: 1025it [00:02, 356.14it/s, env_step=4907008, len=39, n/ep=2, n/st=64, player_1/loss=300.928, player_2/loss=105.165, rew=1559.00]


Epoch #4792: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4793: 1025it [00:02, 355.77it/s, env_step=4908032, len=32, n/ep=2, n/st=64, player_1/loss=315.648, player_2/loss=139.097, rew=1090.00]


Epoch #4793: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4794: 1025it [00:02, 355.16it/s, env_step=4909056, len=25, n/ep=2, n/st=64, player_1/loss=52.197, player_2/loss=100.449, rew=716.00]


Epoch #4794: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4795: 1025it [00:02, 351.99it/s, env_step=4910080, len=40, n/ep=2, n/st=64, player_1/loss=128.843, player_2/loss=477.538, rew=1657.00]


Epoch #4795: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4796: 1025it [00:02, 355.89it/s, env_step=4911104, len=30, n/ep=3, n/st=64, player_1/loss=236.739, player_2/loss=509.273, rew=1067.33]


Epoch #4796: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4797: 1025it [00:02, 356.51it/s, env_step=4912128, len=42, n/ep=1, n/st=64, player_1/loss=151.219, player_2/loss=98.976, rew=1834.00]


Epoch #4797: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4798: 1025it [00:02, 357.76it/s, env_step=4913152, len=27, n/ep=3, n/st=64, player_1/loss=129.800, player_2/loss=189.061, rew=754.67]


Epoch #4798: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4799: 1025it [00:02, 356.51it/s, env_step=4914176, len=25, n/ep=3, n/st=64, player_1/loss=208.582, player_2/loss=280.079, rew=867.33]


Epoch #4799: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4800: 1025it [00:02, 351.74it/s, env_step=4915200, len=24, n/ep=2, n/st=64, player_1/loss=217.741, player_2/loss=210.567, rew=662.00]


Epoch #4800: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4801: 1025it [00:02, 356.02it/s, env_step=4916224, len=27, n/ep=2, n/st=64, player_1/loss=64.649, player_2/loss=269.170, rew=758.00]


Epoch #4801: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4802: 1025it [00:02, 356.51it/s, env_step=4917248, len=38, n/ep=1, n/st=64, player_1/loss=627.603, player_2/loss=262.980, rew=1480.00]


Epoch #4802: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4803: 1025it [00:02, 356.27it/s, env_step=4918272, len=42, n/ep=1, n/st=64, player_1/loss=750.956, player_2/loss=117.327, rew=1804.00]


Epoch #4803: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4804: 1025it [00:02, 356.89it/s, env_step=4919296, len=31, n/ep=2, n/st=64, player_1/loss=281.966, player_2/loss=107.210, rew=1078.00]


Epoch #4804: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4805: 1025it [00:02, 351.86it/s, env_step=4920320, len=27, n/ep=2, n/st=64, player_1/loss=248.562, player_2/loss=286.596, rew=803.00]


Epoch #4805: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4806: 1025it [00:02, 352.83it/s, env_step=4921344, len=37, n/ep=1, n/st=64, player_1/loss=333.594, player_2/loss=316.783, rew=1404.00]


Epoch #4806: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4807: 1025it [00:02, 356.76it/s, env_step=4922368, len=28, n/ep=2, n/st=64, player_1/loss=315.553, player_2/loss=149.919, rew=841.00]


Epoch #4807: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4808: 1025it [00:02, 355.89it/s, env_step=4923392, len=30, n/ep=3, n/st=64, player_1/loss=276.575, player_2/loss=282.316, rew=1064.00]


Epoch #4808: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4809: 1025it [00:02, 356.64it/s, env_step=4924416, len=29, n/ep=3, n/st=64, player_1/loss=183.839, player_2/loss=239.903, rew=941.33]


Epoch #4809: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4810: 1025it [00:02, 355.16it/s, env_step=4925440, len=35, n/ep=2, n/st=64, player_1/loss=216.360, player_2/loss=79.647, rew=1300.00]


Epoch #4810: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4811: 1025it [00:02, 353.08it/s, env_step=4926464, len=31, n/ep=2, n/st=64, player_1/loss=407.726, player_2/loss=103.243, rew=991.00]


Epoch #4811: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4812: 1025it [00:02, 355.15it/s, env_step=4927488, len=31, n/ep=2, n/st=64, player_1/loss=277.773, player_2/loss=463.188, rew=1028.00]


Epoch #4812: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4813: 1025it [00:02, 353.93it/s, env_step=4928512, len=32, n/ep=2, n/st=64, player_1/loss=371.536, player_2/loss=822.898, rew=1093.00]


Epoch #4813: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4814: 1025it [00:02, 351.50it/s, env_step=4929536, len=38, n/ep=2, n/st=64, player_1/loss=436.359, player_2/loss=484.583, rew=1519.00]


Epoch #4814: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4815: 1025it [00:02, 355.03it/s, env_step=4930560, len=18, n/ep=3, n/st=64, player_1/loss=195.737, player_2/loss=83.595, rew=389.33]


Epoch #4815: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4816: 1025it [00:02, 352.47it/s, env_step=4931584, len=38, n/ep=2, n/st=64, player_1/loss=190.895, player_2/loss=294.492, rew=1481.00]


Epoch #4816: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4817: 1025it [00:02, 355.52it/s, env_step=4932608, len=30, n/ep=2, n/st=64, player_1/loss=96.380, player_2/loss=886.342, rew=989.00]


Epoch #4817: test_reward: 754.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4818: 1025it [00:02, 354.91it/s, env_step=4933632, len=26, n/ep=3, n/st=64, player_1/loss=70.132, player_2/loss=968.843, rew=780.67]


Epoch #4818: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4819: 1025it [00:02, 352.47it/s, env_step=4934656, len=23, n/ep=4, n/st=64, player_1/loss=68.920, rew=675.00]   


Epoch #4819: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4820: 1025it [00:02, 355.65it/s, env_step=4935680, len=22, n/ep=3, n/st=64, player_1/loss=333.619, player_2/loss=385.838, rew=532.67]


Epoch #4820: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4821: 1025it [00:02, 353.08it/s, env_step=4936704, len=19, n/ep=3, n/st=64, player_1/loss=702.037, player_2/loss=364.369, rew=414.00]


Epoch #4821: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4822: 1025it [00:02, 351.26it/s, env_step=4937728, len=28, n/ep=2, n/st=64, player_1/loss=539.992, player_2/loss=1428.351, rew=859.00]


Epoch #4822: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4823: 1025it [00:02, 353.08it/s, env_step=4938752, len=28, n/ep=3, n/st=64, player_1/loss=189.235, player_2/loss=1345.460, rew=833.33]


Epoch #4823: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4824: 1025it [00:02, 349.58it/s, env_step=4939776, len=31, n/ep=2, n/st=64, player_1/loss=461.747, player_2/loss=586.409, rew=1039.00]


Epoch #4824: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4825: 1025it [00:02, 353.08it/s, env_step=4940800, len=22, n/ep=3, n/st=64, player_1/loss=244.162, player_2/loss=443.063, rew=519.33]


Epoch #4825: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4826: 1025it [00:02, 354.30it/s, env_step=4941824, len=36, n/ep=2, n/st=64, player_1/loss=319.646, player_2/loss=374.394, rew=1367.00]


Epoch #4826: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4827: 1025it [00:02, 352.83it/s, env_step=4942848, len=30, n/ep=2, n/st=64, player_1/loss=225.026, player_2/loss=163.457, rew=961.00]


Epoch #4827: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4828: 1025it [00:02, 351.14it/s, env_step=4943872, len=25, n/ep=3, n/st=64, player_1/loss=174.033, player_2/loss=81.673, rew=720.00]


Epoch #4828: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4829: 1025it [00:02, 355.28it/s, env_step=4944896, len=29, n/ep=2, n/st=64, player_1/loss=266.454, player_2/loss=246.543, rew=904.00]


Epoch #4829: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4830: 1025it [00:02, 357.38it/s, env_step=4945920, len=32, n/ep=2, n/st=64, player_1/loss=260.661, player_2/loss=368.596, rew=1090.00]


Epoch #4830: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4831: 1025it [00:02, 355.77it/s, env_step=4946944, len=28, n/ep=2, n/st=64, player_1/loss=318.456, player_2/loss=381.124, rew=819.00]


Epoch #4831: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4832: 1025it [00:02, 354.17it/s, env_step=4947968, len=24, n/ep=3, n/st=64, player_1/loss=369.935, player_2/loss=312.588, rew=667.33]


Epoch #4832: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4833: 1025it [00:02, 352.95it/s, env_step=4948992, len=19, n/ep=3, n/st=64, player_1/loss=763.192, player_2/loss=219.224, rew=414.00]


Epoch #4833: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4834: 1025it [00:02, 357.38it/s, env_step=4950016, len=25, n/ep=3, n/st=64, player_1/loss=642.952, player_2/loss=495.965, rew=650.67]


Epoch #4834: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4835: 1025it [00:02, 353.68it/s, env_step=4951040, len=29, n/ep=2, n/st=64, player_1/loss=449.076, player_2/loss=579.113, rew=970.00]


Epoch #4835: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4836: 1025it [00:02, 355.16it/s, env_step=4952064, len=16, n/ep=3, n/st=64, player_1/loss=585.829, player_2/loss=366.308, rew=270.67]


Epoch #4836: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4837: 1025it [00:02, 354.54it/s, env_step=4953088, len=17, n/ep=3, n/st=64, player_1/loss=827.115, player_2/loss=580.131, rew=342.00]


Epoch #4837: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4838: 1025it [00:02, 350.42it/s, env_step=4954112, len=34, n/ep=2, n/st=64, player_1/loss=861.100, player_2/loss=221.252, rew=1204.00]


Epoch #4838: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4839: 1025it [00:02, 355.28it/s, env_step=4955136, len=18, n/ep=2, n/st=64, player_1/loss=456.617, player_2/loss=392.833, rew=371.00]


Epoch #4839: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4840: 1025it [00:02, 360.52it/s, env_step=4956160, len=19, n/ep=3, n/st=64, player_1/loss=225.163, player_2/loss=521.813, rew=474.00]


Epoch #4840: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4841: 1025it [00:02, 353.32it/s, env_step=4957184, len=34, n/ep=2, n/st=64, player_1/loss=209.536, player_2/loss=535.615, rew=1213.00]


Epoch #4841: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4842: 1025it [00:02, 356.89it/s, env_step=4958208, len=25, n/ep=2, n/st=64, player_1/loss=618.017, player_2/loss=389.388, rew=657.00]


Epoch #4842: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4843: 1025it [00:02, 351.50it/s, env_step=4959232, len=25, n/ep=2, n/st=64, player_1/loss=505.657, player_2/loss=280.093, rew=676.00]


Epoch #4843: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4844: 1025it [00:02, 352.47it/s, env_step=4960256, len=34, n/ep=2, n/st=64, player_1/loss=225.814, player_2/loss=231.051, rew=1213.00]


Epoch #4844: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4845: 1025it [00:02, 355.65it/s, env_step=4961280, len=32, n/ep=2, n/st=64, player_1/loss=214.400, player_2/loss=248.251, rew=1107.00]


Epoch #4845: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4846: 1025it [00:02, 350.90it/s, env_step=4962304, len=28, n/ep=3, n/st=64, player_1/loss=200.352, player_2/loss=335.194, rew=862.67]


Epoch #4846: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4847: 1025it [00:02, 351.74it/s, env_step=4963328, len=18, n/ep=3, n/st=64, player_1/loss=425.326, player_2/loss=642.963, rew=344.67]


Epoch #4847: test_reward: 598.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4848: 1025it [00:02, 355.28it/s, env_step=4964352, len=33, n/ep=2, n/st=64, player_1/loss=492.323, player_2/loss=945.021, rew=1145.00]


Epoch #4848: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4849: 1025it [00:02, 354.30it/s, env_step=4965376, len=38, n/ep=2, n/st=64, player_1/loss=495.060, player_2/loss=1120.660, rew=1481.00]


Epoch #4849: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4850: 1025it [00:02, 357.01it/s, env_step=4966400, len=33, n/ep=2, n/st=64, player_1/loss=597.414, player_2/loss=639.727, rew=1129.00]


Epoch #4850: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4851: 1025it [00:02, 354.17it/s, env_step=4967424, len=24, n/ep=3, n/st=64, player_1/loss=988.963, player_2/loss=302.961, rew=674.00]


Epoch #4851: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4852: 1025it [00:02, 349.23it/s, env_step=4968448, len=37, n/ep=1, n/st=64, player_1/loss=1266.860, player_2/loss=516.622, rew=1404.00]


Epoch #4852: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4853: 1025it [00:02, 354.54it/s, env_step=4969472, len=28, n/ep=2, n/st=64, player_1/loss=637.682, player_2/loss=641.570, rew=826.00]


Epoch #4853: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4854: 1025it [00:02, 355.89it/s, env_step=4970496, len=35, n/ep=2, n/st=64, player_1/loss=845.873, player_2/loss=301.647, rew=1351.00]


Epoch #4854: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4855: 1025it [00:02, 357.88it/s, env_step=4971520, len=32, n/ep=2, n/st=64, player_1/loss=964.651, player_2/loss=803.887, rew=1093.00]


Epoch #4855: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4856: 1025it [00:02, 351.14it/s, env_step=4972544, len=25, n/ep=3, n/st=64, player_1/loss=1083.675, player_2/loss=1404.986, rew=722.67]


Epoch #4856: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4857: 1025it [00:02, 354.79it/s, env_step=4973568, len=37, n/ep=2, n/st=64, player_1/loss=680.373, player_2/loss=1514.834, rew=1404.00]


Epoch #4857: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4858: 1025it [00:02, 356.76it/s, env_step=4974592, len=33, n/ep=2, n/st=64, player_1/loss=144.315, player_2/loss=461.538, rew=1174.00]


Epoch #4858: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4859: 1025it [00:02, 357.63it/s, env_step=4975616, len=38, n/ep=1, n/st=64, player_1/loss=487.685, player_2/loss=452.396, rew=1480.00]


Epoch #4859: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4860: 1025it [00:02, 357.01it/s, env_step=4976640, len=28, n/ep=3, n/st=64, player_1/loss=500.559, player_2/loss=294.959, rew=944.67]


Epoch #4860: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4861: 1025it [00:02, 349.70it/s, env_step=4977664, len=37, n/ep=2, n/st=64, player_1/loss=329.134, player_2/loss=56.154, rew=1404.00]


Epoch #4861: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4862: 1025it [00:02, 355.16it/s, env_step=4978688, len=35, n/ep=2, n/st=64, player_1/loss=588.383, player_2/loss=97.736, rew=1300.00]


Epoch #4862: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4863: 1025it [00:02, 354.54it/s, env_step=4979712, len=26, n/ep=3, n/st=64, player_1/loss=547.643, player_2/loss=91.035, rew=802.67]


Epoch #4863: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4864: 1025it [00:02, 357.88it/s, env_step=4980736, len=35, n/ep=1, n/st=64, player_2/loss=722.930, rew=1258.00] 


Epoch #4864: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4865: 1025it [00:02, 349.94it/s, env_step=4981760, len=39, n/ep=2, n/st=64, player_1/loss=416.341, player_2/loss=1090.047, rew=1619.00]


Epoch #4865: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4866: 1025it [00:02, 355.03it/s, env_step=4982784, len=35, n/ep=1, n/st=64, player_1/loss=148.224, player_2/loss=406.080, rew=1258.00]


Epoch #4866: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4867: 1025it [00:02, 355.89it/s, env_step=4983808, len=30, n/ep=2, n/st=64, player_1/loss=274.794, player_2/loss=77.642, rew=992.00]


Epoch #4867: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4868: 1025it [00:02, 355.77it/s, env_step=4984832, len=20, n/ep=4, n/st=64, player_2/loss=84.451, rew=567.00]   


Epoch #4868: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4869: 1025it [00:02, 355.65it/s, env_step=4985856, len=10, n/ep=6, n/st=64, player_1/loss=1457.188, player_2/loss=655.639, rew=109.00]


Epoch #4869: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4870: 1025it [00:02, 351.86it/s, env_step=4986880, len=18, n/ep=3, n/st=64, player_1/loss=1422.626, rew=340.67] 


Epoch #4870: test_reward: 270.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4871: 1025it [00:02, 354.79it/s, env_step=4987904, len=14, n/ep=5, n/st=64, player_1/loss=548.215, player_2/loss=649.103, rew=232.40]


Epoch #4871: test_reward: 208.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4872: 1025it [00:02, 356.02it/s, env_step=4988928, len=23, n/ep=4, n/st=64, player_1/loss=356.993, player_2/loss=688.481, rew=721.50]


Epoch #4872: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4873: 1025it [00:02, 355.77it/s, env_step=4989952, len=37, n/ep=2, n/st=64, player_1/loss=832.254, player_2/loss=472.038, rew=1442.00]


Epoch #4873: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4874: 1025it [00:02, 357.63it/s, env_step=4990976, len=40, n/ep=1, n/st=64, player_1/loss=1271.867, player_2/loss=108.612, rew=1638.00]


Epoch #4874: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4875: 1025it [00:02, 351.38it/s, env_step=4992000, len=29, n/ep=3, n/st=64, player_1/loss=931.202, player_2/loss=414.408, rew=1025.33]


Epoch #4875: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4876: 1025it [00:02, 350.78it/s, env_step=4993024, len=34, n/ep=2, n/st=64, player_1/loss=422.784, player_2/loss=989.792, rew=1279.00]


Epoch #4876: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4877: 1025it [00:02, 356.76it/s, env_step=4994048, len=12, n/ep=6, n/st=64, player_1/loss=412.134, player_2/loss=768.335, rew=269.33]


Epoch #4877: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4878: 1025it [00:02, 354.66it/s, env_step=4995072, len=29, n/ep=3, n/st=64, player_1/loss=283.961, player_2/loss=770.232, rew=1148.67]


Epoch #4878: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4879: 1025it [00:02, 356.76it/s, env_step=4996096, len=23, n/ep=2, n/st=64, player_1/loss=554.609, player_2/loss=718.454, rew=775.00]


Epoch #4879: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4880: 1025it [00:02, 351.62it/s, env_step=4997120, len=14, n/ep=4, n/st=64, player_1/loss=482.258, player_2/loss=1315.707, rew=289.50]


Epoch #4880: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4881: 1025it [00:02, 357.51it/s, env_step=4998144, len=27, n/ep=3, n/st=64, player_1/loss=227.114, player_2/loss=865.648, rew=970.67]


Epoch #4881: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4882: 1025it [00:02, 355.15it/s, env_step=4999168, len=31, n/ep=2, n/st=64, player_1/loss=176.708, player_2/loss=420.596, rew=1034.00]


Epoch #4882: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4883: 1025it [00:02, 356.76it/s, env_step=5000192, len=20, n/ep=3, n/st=64, player_1/loss=356.321, player_2/loss=524.436, rew=602.00]


Epoch #4883: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4884: 1025it [00:02, 354.66it/s, env_step=5001216, len=14, n/ep=4, n/st=64, player_1/loss=495.472, player_2/loss=519.056, rew=241.50]


Epoch #4884: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4885: 1025it [00:02, 351.62it/s, env_step=5002240, len=21, n/ep=3, n/st=64, player_1/loss=723.408, player_2/loss=90.390, rew=477.33]


Epoch #4885: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4886: 1025it [00:02, 355.77it/s, env_step=5003264, len=29, n/ep=3, n/st=64, player_1/loss=632.446, player_2/loss=335.664, rew=902.67]


Epoch #4886: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4887: 1025it [00:02, 352.11it/s, env_step=5004288, len=32, n/ep=2, n/st=64, player_1/loss=590.036, player_2/loss=783.607, rew=1058.00]


Epoch #4887: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4888: 1025it [00:02, 358.13it/s, env_step=5005312, len=24, n/ep=2, n/st=64, player_1/loss=538.944, player_2/loss=872.027, rew=779.00]


Epoch #4888: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4889: 1025it [00:02, 355.77it/s, env_step=5006336, len=10, n/ep=6, n/st=64, player_1/loss=541.553, player_2/loss=561.513, rew=123.33]


Epoch #4889: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4890: 1025it [00:02, 352.35it/s, env_step=5007360, len=10, n/ep=6, n/st=64, player_1/loss=499.619, player_2/loss=534.764, rew=122.00]


Epoch #4890: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4891: 1025it [00:02, 355.28it/s, env_step=5008384, len=21, n/ep=3, n/st=64, player_1/loss=206.793, player_2/loss=434.611, rew=477.33]


Epoch #4891: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4892: 1025it [00:02, 355.77it/s, env_step=5009408, len=37, n/ep=1, n/st=64, player_1/loss=695.595, player_2/loss=289.709, rew=1404.00]


Epoch #4892: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4893: 1025it [00:02, 354.30it/s, env_step=5010432, len=32, n/ep=2, n/st=64, player_1/loss=543.529, player_2/loss=632.915, rew=1129.00]


Epoch #4893: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4894: 1025it [00:02, 350.18it/s, env_step=5011456, len=19, n/ep=4, n/st=64, player_1/loss=111.553, player_2/loss=535.362, rew=458.00]


Epoch #4894: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4895: 1025it [00:02, 357.01it/s, env_step=5012480, len=38, n/ep=1, n/st=64, player_2/loss=380.471, rew=1480.00] 


Epoch #4895: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4896: 1025it [00:02, 354.30it/s, env_step=5013504, len=20, n/ep=3, n/st=64, player_1/loss=842.323, player_2/loss=481.347, rew=456.67]


Epoch #4896: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4897: 1025it [00:02, 356.64it/s, env_step=5014528, len=26, n/ep=2, n/st=64, player_1/loss=597.289, player_2/loss=456.584, rew=799.00]


Epoch #4897: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4898: 1025it [00:02, 352.23it/s, env_step=5015552, len=32, n/ep=2, n/st=64, player_1/loss=769.613, player_2/loss=57.943, rew=1055.00]


Epoch #4898: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4899: 1025it [00:02, 351.14it/s, env_step=5016576, len=36, n/ep=2, n/st=64, player_1/loss=642.076, player_2/loss=231.348, rew=1334.00]


Epoch #4899: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4900: 1025it [00:02, 354.05it/s, env_step=5017600, len=34, n/ep=2, n/st=64, player_1/loss=917.946, player_2/loss=226.470, rew=1229.00]


Epoch #4900: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4901: 1025it [00:02, 357.01it/s, env_step=5018624, len=37, n/ep=1, n/st=64, player_1/loss=580.991, player_2/loss=254.445, rew=1404.00]


Epoch #4901: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4902: 1025it [00:02, 354.17it/s, env_step=5019648, len=14, n/ep=5, n/st=64, player_1/loss=875.837, player_2/loss=564.073, rew=250.80]


Epoch #4902: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4903: 1025it [00:02, 355.77it/s, env_step=5020672, len=19, n/ep=3, n/st=64, player_1/loss=666.830, player_2/loss=471.410, rew=578.00]


Epoch #4903: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4904: 1025it [00:02, 349.23it/s, env_step=5021696, len=20, n/ep=3, n/st=64, player_1/loss=230.822, player_2/loss=500.455, rew=426.67]


Epoch #4904: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4905: 1025it [00:02, 359.01it/s, env_step=5022720, len=21, n/ep=4, n/st=64, player_1/loss=280.782, player_2/loss=596.618, rew=608.00]


Epoch #4905: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4906: 1025it [00:02, 348.04it/s, env_step=5023744, len=15, n/ep=5, n/st=64, player_1/loss=271.069, player_2/loss=480.387, rew=255.60]


Epoch #4906: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4907: 1025it [00:02, 352.83it/s, env_step=5024768, len=28, n/ep=3, n/st=64, player_1/loss=157.177, player_2/loss=347.181, rew=903.33]


Epoch #4907: test_reward: 418.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4908: 1025it [00:02, 355.65it/s, env_step=5025792, len=27, n/ep=2, n/st=64, player_1/loss=229.474, player_2/loss=160.092, rew=875.00]


Epoch #4908: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4909: 1025it [00:02, 350.90it/s, env_step=5026816, len=11, n/ep=6, n/st=64, player_1/loss=553.446, player_2/loss=158.028, rew=138.00]


Epoch #4909: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4910: 1025it [00:02, 354.30it/s, env_step=5027840, len=22, n/ep=3, n/st=64, player_1/loss=415.954, player_2/loss=498.962, rew=589.33]


Epoch #4910: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4911: 1025it [00:02, 355.89it/s, env_step=5028864, len=25, n/ep=3, n/st=64, player_1/loss=244.325, player_2/loss=495.295, rew=745.33]


Epoch #4911: test_reward: 1258.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4912: 1025it [00:02, 355.40it/s, env_step=5029888, len=10, n/ep=6, n/st=64, player_1/loss=283.135, player_2/loss=180.615, rew=116.33]


Epoch #4912: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4913: 1025it [00:02, 355.89it/s, env_step=5030912, len=16, n/ep=4, n/st=64, player_1/loss=610.927, player_2/loss=423.539, rew=287.00]


Epoch #4913: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4914: 1025it [00:02, 350.90it/s, env_step=5031936, len=38, n/ep=2, n/st=64, player_1/loss=753.436, player_2/loss=375.469, rew=1480.00]


Epoch #4914: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4915: 1025it [00:02, 355.52it/s, env_step=5032960, len=28, n/ep=3, n/st=64, player_1/loss=614.106, player_2/loss=318.405, rew=858.67]


Epoch #4915: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4916: 1025it [00:02, 356.27it/s, env_step=5033984, len=24, n/ep=2, n/st=64, player_1/loss=256.797, player_2/loss=263.965, rew=805.00]


Epoch #4916: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4917: 1025it [00:02, 356.02it/s, env_step=5035008, len=19, n/ep=4, n/st=64, player_1/loss=234.338, player_2/loss=424.303, rew=521.50]


Epoch #4917: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4918: 1025it [00:02, 355.77it/s, env_step=5036032, len=31, n/ep=2, n/st=64, player_1/loss=266.201, player_2/loss=506.773, rew=1054.00]


Epoch #4918: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4919: 1025it [00:02, 355.03it/s, env_step=5037056, len=25, n/ep=2, n/st=64, player_1/loss=261.107, player_2/loss=525.156, rew=694.00]


Epoch #4919: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4920: 1025it [00:02, 355.40it/s, env_step=5038080, len=10, n/ep=5, n/st=64, player_1/loss=250.194, player_2/loss=496.280, rew=116.00]


Epoch #4920: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4921: 1025it [00:02, 352.11it/s, env_step=5039104, len=32, n/ep=2, n/st=64, player_1/loss=280.300, player_2/loss=546.819, rew=1063.00]


Epoch #4921: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4922: 1025it [00:02, 355.52it/s, env_step=5040128, len=37, n/ep=2, n/st=64, player_1/loss=374.575, player_2/loss=137.075, rew=1448.00]


Epoch #4922: test_reward: 460.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4923: 1025it [00:02, 356.27it/s, env_step=5041152, len=21, n/ep=3, n/st=64, player_1/loss=279.716, player_2/loss=188.063, rew=492.67]


Epoch #4923: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4924: 1025it [00:02, 348.16it/s, env_step=5042176, len=21, n/ep=3, n/st=64, player_1/loss=157.067, player_2/loss=616.040, rew=478.00]


Epoch #4924: test_reward: 648.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4925: 1025it [00:02, 355.65it/s, env_step=5043200, len=23, n/ep=3, n/st=64, player_1/loss=722.144, rew=604.00]  


Epoch #4925: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4926: 1025it [00:02, 354.17it/s, env_step=5044224, len=29, n/ep=3, n/st=64, player_1/loss=951.179, player_2/loss=190.959, rew=1057.33]


Epoch #4926: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4927: 1025it [00:02, 353.08it/s, env_step=5045248, len=31, n/ep=2, n/st=64, player_1/loss=520.992, player_2/loss=479.866, rew=991.00]


Epoch #4927: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4928: 1025it [00:02, 354.17it/s, env_step=5046272, len=25, n/ep=2, n/st=64, player_1/loss=252.460, player_2/loss=530.379, rew=748.00]


Epoch #4928: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4929: 1025it [00:02, 349.35it/s, env_step=5047296, len=29, n/ep=3, n/st=64, player_1/loss=169.612, player_2/loss=506.137, rew=958.67]


Epoch #4929: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4930: 1025it [00:02, 354.66it/s, env_step=5048320, len=20, n/ep=3, n/st=64, player_1/loss=452.243, player_2/loss=433.341, rew=446.67]


Epoch #4930: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4931: 1025it [00:02, 355.28it/s, env_step=5049344, len=30, n/ep=2, n/st=64, player_1/loss=1059.682, player_2/loss=249.839, rew=1106.00]


Epoch #4931: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4932: 1025it [00:02, 354.42it/s, env_step=5050368, len=42, n/ep=1, n/st=64, player_1/loss=1281.794, player_2/loss=121.328, rew=1834.00]


Epoch #4932: test_reward: 1804.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4933: 1025it [00:02, 351.26it/s, env_step=5051392, len=16, n/ep=4, n/st=64, player_1/loss=664.743, player_2/loss=293.988, rew=432.00]


Epoch #4933: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4934: 1025it [00:02, 354.91it/s, env_step=5052416, len=20, n/ep=3, n/st=64, player_1/loss=356.598, player_2/loss=588.546, rew=493.33]


Epoch #4934: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4935: 1025it [00:02, 354.79it/s, env_step=5053440, len=23, n/ep=3, n/st=64, player_1/loss=239.175, player_2/loss=510.960, rew=700.00]


Epoch #4935: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4936: 1025it [00:02, 355.77it/s, env_step=5054464, len=27, n/ep=2, n/st=64, player_1/loss=75.262, player_2/loss=856.549, rew=754.00]


Epoch #4936: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4937: 1025it [00:02, 355.03it/s, env_step=5055488, len=35, n/ep=1, n/st=64, player_1/loss=162.243, player_2/loss=894.370, rew=1258.00]


Epoch #4937: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4938: 1025it [00:02, 350.66it/s, env_step=5056512, len=25, n/ep=3, n/st=64, player_1/loss=168.962, player_2/loss=619.118, rew=650.67]


Epoch #4938: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4939: 1025it [00:02, 354.91it/s, env_step=5057536, len=33, n/ep=1, n/st=64, player_1/loss=162.079, player_2/loss=493.169, rew=1120.00]


Epoch #4939: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4940: 1025it [00:02, 355.16it/s, env_step=5058560, len=38, n/ep=2, n/st=64, player_1/loss=140.732, player_2/loss=119.649, rew=1521.00]


Epoch #4940: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4941: 1025it [00:02, 356.02it/s, env_step=5059584, len=26, n/ep=3, n/st=64, player_1/loss=54.701, player_2/loss=246.734, rew=824.00]


Epoch #4941: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4942: 1025it [00:02, 355.52it/s, env_step=5060608, len=23, n/ep=2, n/st=64, player_1/loss=371.862, player_2/loss=710.532, rew=576.00]


Epoch #4942: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4943: 1025it [00:02, 352.23it/s, env_step=5061632, len=21, n/ep=3, n/st=64, player_2/loss=512.927, rew=520.67]  


Epoch #4943: test_reward: 340.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4944: 1025it [00:02, 356.51it/s, env_step=5062656, len=23, n/ep=3, n/st=64, player_1/loss=671.179, player_2/loss=483.884, rew=715.33]


Epoch #4944: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4945: 1025it [00:02, 358.01it/s, env_step=5063680, len=20, n/ep=4, n/st=64, player_1/loss=721.496, player_2/loss=402.875, rew=599.50]


Epoch #4945: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4946: 1025it [00:02, 344.07it/s, env_step=5064704, len=28, n/ep=1, n/st=64, player_1/loss=332.771, player_2/loss=584.259, rew=810.00]


Epoch #4946: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4947: 1025it [00:02, 353.08it/s, env_step=5065728, len=25, n/ep=2, n/st=64, player_1/loss=34.254, player_2/loss=880.914, rew=657.00]


Epoch #4947: test_reward: 1120.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4948: 1025it [00:02, 349.11it/s, env_step=5066752, len=30, n/ep=3, n/st=64, player_1/loss=57.610, player_2/loss=560.084, rew=1028.67]


Epoch #4948: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4949: 1025it [00:02, 354.17it/s, env_step=5067776, len=24, n/ep=2, n/st=64, player_1/loss=114.207, player_2/loss=576.406, rew=733.00]


Epoch #4949: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4950: 1025it [00:02, 354.66it/s, env_step=5068800, len=34, n/ep=2, n/st=64, player_1/loss=103.676, player_2/loss=517.794, rew=1197.00]


Epoch #4950: test_reward: 1638.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4951: 1025it [00:03, 339.85it/s, env_step=5069824, len=31, n/ep=3, n/st=64, player_1/loss=714.141, player_2/loss=780.343, rew=1090.00]


Epoch #4951: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4952: 1025it [00:02, 351.86it/s, env_step=5070848, len=18, n/ep=3, n/st=64, player_1/loss=1279.976, player_2/loss=985.028, rew=412.67]


Epoch #4952: test_reward: 154.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4953: 1025it [00:02, 354.30it/s, env_step=5071872, len=27, n/ep=2, n/st=64, player_1/loss=887.683, player_2/loss=325.561, rew=892.00]


Epoch #4953: test_reward: 304.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4954: 1025it [00:02, 354.91it/s, env_step=5072896, len=34, n/ep=2, n/st=64, player_1/loss=443.440, player_2/loss=1111.631, rew=1243.00]


Epoch #4954: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4955: 1025it [00:02, 353.68it/s, env_step=5073920, len=13, n/ep=3, n/st=64, player_1/loss=200.902, player_2/loss=1343.618, rew=184.67]


Epoch #4955: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4956: 1025it [00:02, 354.17it/s, env_step=5074944, len=28, n/ep=2, n/st=64, player_1/loss=391.328, player_2/loss=489.665, rew=931.00]


Epoch #4956: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4957: 1025it [00:02, 349.23it/s, env_step=5075968, len=34, n/ep=1, n/st=64, player_1/loss=541.813, player_2/loss=256.583, rew=1188.00]


Epoch #4957: test_reward: 550.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4958: 1025it [00:03, 339.63it/s, env_step=5076992, len=38, n/ep=2, n/st=64, player_1/loss=293.585, player_2/loss=146.107, rew=1481.00]


Epoch #4958: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4959: 1025it [00:02, 356.89it/s, env_step=5078016, len=26, n/ep=2, n/st=64, player_1/loss=152.543, player_2/loss=236.087, rew=909.00]


Epoch #4959: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4960: 1025it [00:02, 355.15it/s, env_step=5079040, len=20, n/ep=3, n/st=64, player_1/loss=322.353, player_2/loss=782.272, rew=638.67]


Epoch #4960: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4961: 1025it [00:02, 352.47it/s, env_step=5080064, len=29, n/ep=3, n/st=64, player_1/loss=216.084, player_2/loss=898.462, rew=1074.00]


Epoch #4961: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4962: 1025it [00:02, 349.59it/s, env_step=5081088, len=27, n/ep=3, n/st=64, player_1/loss=125.656, player_2/loss=507.403, rew=852.67]


Epoch #4962: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4963: 1025it [00:03, 336.95it/s, env_step=5082112, len=31, n/ep=2, n/st=64, player_1/loss=403.369, player_2/loss=358.910, rew=1147.00]


Epoch #4963: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4964: 1025it [00:02, 353.81it/s, env_step=5083136, len=31, n/ep=2, n/st=64, player_1/loss=442.085, player_2/loss=383.390, rew=1054.00]


Epoch #4964: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4965: 1025it [00:02, 353.20it/s, env_step=5084160, len=35, n/ep=2, n/st=64, player_1/loss=238.851, player_2/loss=707.367, rew=1300.00]


Epoch #4965: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4966: 1025it [00:02, 357.63it/s, env_step=5085184, len=27, n/ep=2, n/st=64, player_1/loss=458.454, player_2/loss=638.818, rew=779.00]


Epoch #4966: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4967: 1025it [00:02, 350.42it/s, env_step=5086208, len=32, n/ep=2, n/st=64, player_1/loss=373.813, player_2/loss=538.948, rew=1089.00]


Epoch #4967: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4968: 1025it [00:02, 355.52it/s, env_step=5087232, len=33, n/ep=2, n/st=64, player_1/loss=368.997, player_2/loss=514.704, rew=1166.00]


Epoch #4968: test_reward: 928.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4969: 1025it [00:02, 354.54it/s, env_step=5088256, len=19, n/ep=4, n/st=64, player_1/loss=363.222, player_2/loss=104.307, rew=515.00]


Epoch #4969: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4970: 1025it [00:02, 357.26it/s, env_step=5089280, len=11, n/ep=6, n/st=64, player_1/loss=424.364, player_2/loss=197.329, rew=145.67]


Epoch #4970: test_reward: 88.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4971: 1025it [00:02, 351.26it/s, env_step=5090304, len=33, n/ep=2, n/st=64, player_1/loss=529.408, player_2/loss=330.939, rew=1124.00]


Epoch #4971: test_reward: 1330.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4972: 1025it [00:02, 355.28it/s, env_step=5091328, len=19, n/ep=4, n/st=64, player_1/loss=472.310, player_2/loss=281.653, rew=381.50]


Epoch #4972: test_reward: 504.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4973: 1025it [00:02, 353.93it/s, env_step=5092352, len=40, n/ep=1, n/st=64, player_1/loss=396.771, player_2/loss=1126.439, rew=1638.00]


Epoch #4973: test_reward: 238.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4974: 1025it [00:02, 356.51it/s, env_step=5093376, len=30, n/ep=2, n/st=64, player_1/loss=307.826, player_2/loss=1378.598, rew=953.00]


Epoch #4974: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4975: 1025it [00:02, 352.83it/s, env_step=5094400, len=25, n/ep=2, n/st=64, player_1/loss=111.809, player_2/loss=1200.149, rew=844.00]


Epoch #4975: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4976: 1025it [00:02, 348.40it/s, env_step=5095424, len=23, n/ep=4, n/st=64, player_1/loss=237.055, player_2/loss=710.570, rew=776.00]


Epoch #4976: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4977: 1025it [00:02, 351.14it/s, env_step=5096448, len=22, n/ep=3, n/st=64, player_1/loss=213.691, player_2/loss=350.999, rew=586.00]


Epoch #4977: test_reward: 54.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4978: 1025it [00:02, 354.17it/s, env_step=5097472, len=19, n/ep=4, n/st=64, player_1/loss=70.648, player_2/loss=523.605, rew=463.00]


Epoch #4978: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4979: 1025it [00:02, 354.91it/s, env_step=5098496, len=11, n/ep=6, n/st=64, player_1/loss=45.741, player_2/loss=607.138, rew=139.67]


Epoch #4979: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4980: 1025it [00:02, 355.52it/s, env_step=5099520, len=34, n/ep=2, n/st=64, player_1/loss=66.228, player_2/loss=561.411, rew=1267.00]


Epoch #4980: test_reward: 990.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4981: 1025it [00:02, 351.62it/s, env_step=5100544, len=28, n/ep=3, n/st=64, player_1/loss=449.161, player_2/loss=198.812, rew=837.33]


Epoch #4981: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4982: 1025it [00:02, 352.95it/s, env_step=5101568, len=31, n/ep=3, n/st=64, player_1/loss=731.789, player_2/loss=122.313, rew=1118.00]


Epoch #4982: test_reward: 130.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4983: 1025it [00:02, 354.66it/s, env_step=5102592, len=31, n/ep=2, n/st=64, player_1/loss=1053.284, player_2/loss=108.168, rew=1026.00]


Epoch #4983: test_reward: 700.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4984: 1025it [00:02, 354.54it/s, env_step=5103616, len=26, n/ep=2, n/st=64, player_1/loss=518.839, player_2/loss=90.339, rew=729.00]


Epoch #4984: test_reward: 810.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4985: 1025it [00:02, 356.14it/s, env_step=5104640, len=34, n/ep=2, n/st=64, player_1/loss=174.896, player_2/loss=227.050, rew=1229.00]


Epoch #4985: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4986: 1025it [00:02, 354.91it/s, env_step=5105664, len=21, n/ep=2, n/st=64, player_1/loss=30.916, player_2/loss=205.700, rew=524.00]


Epoch #4986: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4987: 1025it [00:02, 354.66it/s, env_step=5106688, len=37, n/ep=1, n/st=64, player_1/loss=444.816, player_2/loss=300.693, rew=1404.00]


Epoch #4987: test_reward: 1834.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4988: 1025it [00:02, 353.68it/s, env_step=5107712, len=32, n/ep=2, n/st=64, player_1/loss=450.070, player_2/loss=269.169, rew=1117.00]


Epoch #4988: test_reward: 1404.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4989: 1025it [00:02, 355.89it/s, env_step=5108736, len=15, n/ep=4, n/st=64, player_1/loss=172.880, player_2/loss=289.718, rew=248.00]


Epoch #4989: test_reward: 180.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4990: 1025it [00:02, 354.91it/s, env_step=5109760, len=39, n/ep=1, n/st=64, player_1/loss=252.467, player_2/loss=633.790, rew=1558.00]


Epoch #4990: test_reward: 1558.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4991: 1025it [00:02, 350.30it/s, env_step=5110784, len=28, n/ep=2, n/st=64, player_1/loss=276.412, player_2/loss=893.577, rew=869.00]


Epoch #4991: test_reward: 1188.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4992: 1025it [00:02, 358.26it/s, env_step=5111808, len=35, n/ep=2, n/st=64, player_1/loss=813.569, player_2/loss=722.672, rew=1274.00]


Epoch #4992: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4993: 1025it [00:02, 355.77it/s, env_step=5112832, len=26, n/ep=3, n/st=64, player_1/loss=870.569, player_2/loss=529.779, rew=784.67]


Epoch #4993: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4994: 1025it [00:02, 355.03it/s, env_step=5113856, len=30, n/ep=2, n/st=64, player_1/loss=248.228, player_2/loss=655.167, rew=932.00]


Epoch #4994: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4995: 1025it [00:02, 350.66it/s, env_step=5114880, len=19, n/ep=3, n/st=64, player_1/loss=75.069, player_2/loss=707.422, rew=439.33]


Epoch #4995: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4996: 1025it [00:02, 354.91it/s, env_step=5115904, len=19, n/ep=4, n/st=64, player_1/loss=337.996, player_2/loss=464.387, rew=379.50]


Epoch #4996: test_reward: 378.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4997: 1025it [00:02, 355.65it/s, env_step=5116928, len=25, n/ep=2, n/st=64, player_1/loss=427.734, player_2/loss=422.389, rew=704.00]


Epoch #4997: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4998: 1025it [00:02, 358.01it/s, env_step=5117952, len=22, n/ep=2, n/st=64, player_1/loss=309.126, player_2/loss=465.796, rew=505.00]


Epoch #4998: test_reward: 1480.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770


Epoch #4999: 1025it [00:02, 354.05it/s, env_step=5118976, len=27, n/ep=2, n/st=64, player_1/loss=502.188, player_2/loss=240.465, rew=824.00]

Epoch #4999: test_reward: 1054.000000 ± 0.000000, best_reward: 1834.000000 ± 0.000000 in #770





In [16]:
####################################################
# EXPERIMENT: VIEWING THE BEST LEARNED POLICY
####################################################

# Get the environment settings
env = get_env()
observation_space = env.observation_space['observation'] if isinstance(env.observation_space, gym.spaces.Dict) else env.observation_space
state_shape = observation_space.shape or observation_space.n
action_shape = env.action_space.shape or env.action_space.n

# Configure the best agent
best_agent1 = cf_dqn_policy(state_shape= state_shape,
                            action_shape= action_shape)
best_agent1.load_state_dict(torch.load("./saved_variables/paper_notebooks/6/dqn_vs_dqn_cnn_based/best_policy_agent1.pth"))
best_agent1.set_eps(0)


best_agent2 = cf_dqn_policy(state_shape= state_shape,
                            action_shape= action_shape)
best_agent2.load_state_dict(torch.load("./saved_variables/paper_notebooks/6/dqn_vs_dqn_cnn_based/best_policy_agent2.pth"))
best_agent2.set_eps(0)

# Watch the best agent at work
watch(numer_of_games= 3,
      render_speed= 0.3,
      agent_player1= best_agent1,
      agent_player2= best_agent2)



Average steps of game:  24.0
Final mean reward agent 1: 274.0, std: 0.0
Final mean reward agent 2: 324.0, std: 0.0


In [17]:
####################################################
# EXPERIMENT: VIEWING THE LAST LEARNED POLICY
####################################################

# Configure the final agent
final_agent_player1 = cf_dqn_policy(state_shape= state_shape,
                            action_shape= action_shape)
final_agent_player1.load_state_dict(torch.load("./saved_variables/paper_notebooks/6/dqn_vs_dqn_cnn_based/final_policy_agent1.pth"))
best_agent1.set_eps(0)

final_agent_player2 = cf_dqn_policy(state_shape= state_shape,
                            action_shape= action_shape)
final_agent_player2.load_state_dict(torch.load("./saved_variables/paper_notebooks/6/dqn_vs_dqn_cnn_based/final_policy_agent2.pth"))
best_agent2.set_eps(0)

# Watch the best agent at work
watch(numer_of_games= 3,
      render_speed= 0.3,
      agent_player1= final_agent_player1,
      agent_player2= final_agent_player2)



Average steps of game:  32.333333333333336
Final mean reward agent 1: 573.0, std: 256.8436619164792
Final mean reward agent 2: 573.0, std: 272.70619110439475


<hr><hr>

## Discussion

The performance of this model based on a CNN is similar to the previous model used.
Before addressing the issue of training both agents simultaneously, we will look into using the rainbow algorithm.

In [None]:
####################################################
# CLEAN VARIABLES
####################################################

del action_shape
del agent1
del agent2
del best_agent1
del best_agent2
del env
del final_agent_player1
del final_agent_player2
del observation_space
del off_policy_traininer_results
del state_shape
