# Pyrat Deep Q-Learning Processing

## Setup Environment

Required libraries for PyRat Q-Learning

In [1]:
import json
import numpy as np
import time
import random
import pickle
from tqdm import tqdm
from AIs import manh, numpy_rl_reload
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim

### The game.py file describes the simulation environment, including the generation of reward and the observation that is fed to the agent.
import game

### The rl.py file describes the reinforcement learning procedure, including Q-learning, Experience replay, and a pytorch model to learn the Q-function.
### SGD is used to approximate the Q-function.
import rl

Libraries for training the Convolutional Neural Network

In [2]:
# Import libraries

import torch.nn.functional as F
import inspect

# Personal libraries

from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader, TensorDataset

This is a very unknown but cool library that can help you build a neural network.

It helps you **calculate the shape of the tensor outputs** of network operations.

[Tensorshape Library Documentation](https://pypi.org/project/torchshape/0.0.8/#description)

In [3]:
import subprocess # For importing missing libraries real-time
try:
    from torchshape import tensorshape
except:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'torchshape'])
    from torchshape import tensorshape

Define our **device** as the first visible CUDA device if we have CUDA available:

In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device '+ str(device))

Using device cuda


## PyRat Game Specifications

Details of the game:

1️⃣ The **opponent** plays using a deterministic strategy: a **greedy algorithm** always targetting the closest piece of cheese as next target. - The distance to pieces of cheese is calculated using Manhattan Distance (= L1 distance). The code is in AIs/manh.py.

2️⃣ The maze does **not** include **walls** (option -d 0)

3️⃣ The maze does **not** include **mud** (option -md 0)

4️⃣The **dimension** of the maze is **21 x 15** (default parameter)

5️⃣The number of **pieces of cheese** is **40** (option -p 40)

6️⃣ The **maze** is non **symmetric** (option --nonsymmetric)

You can therefore run a 1000 game test simulations using the following command:

<pre>python pyrat.py -d 0 -md 0 -p 40 -x 21 -y 15 --rat AIs/manh.py --python AIs/YOURAIHERE --nonsymmetric --nodrawing --tests 1000 --synchronous</pre>

Furthermore, you can run a visual game simulation with a command following the next structure:

<pre>python pyrat.py -p 40 -x 21 -y 15 -d 0 -md 0 --rat AIs/manh.py --python AIs/YOURAIHERE --nonsymmetric</pre>

## Train a model to approximate the Q-function

Definitions :
- An iteration of training is called an **Epoch**. It correspond to a full play of a PyRat game. 
- An **experience** is a set of  vectors < s, a, r, s’ > describing the consequence of being in state s, doing action a, receiving reward r, and ending up in state s'.
- Look at the file rl.py to see how the **experience replay buffer** is implemented. 
- A **batch** is a set of experiences we use for training during one epoch. We draw batches from the experience replay buffer.

### Create the simulated game environment

Set the parameters for the Pyrat game simulated environment.

In [5]:
width = 21  # Size of the playing field
height = 15  # Size of the playing field
cheeses = 40  # Number of cheeses in the game
opponent = manh  # AI used for the opponent

Create the Pyrat simulated environment.

In [6]:
env = game.PyRat(width=width, height=height, opponent=opponent, cheeses=cheeses)

Show the **shape of an observation**.

In [7]:
test_observation = torch.FloatTensor(env.observe())
test_observation.shape

torch.Size([1, 29, 41, 2])

We have to be careful with this default shape since convolutional layers expect as inputs tensors in the form of:

<pre>(batch size, number of channels, height, width)</pre>

The environment throws the tensor in the shape, which is **WRONG**:

<pre>(batch size, height, width, number of channels)</pre>

We can transform the tensor in the following way:

In [8]:
test_observation = test_observation.permute(0, 3, 1, 2)
test_observation.shape

torch.Size([1, 2, 29, 41])

### Create the experience replay buffer

Set the parameters for the experience replay buffer.

In [9]:
max_memory = 1000  # Maximum number of experiences we are storing
discount_factor=.97 # Discount factor for future rewards

Create the experience replay buffer.

In [10]:
exp_replay = rl.ExperienceReplay(max_memory=max_memory, discount=discount_factor)
exp_replay.discount

0.97

### Q-Function Approximation Model Topologies

#### Model 1

**Simple regressor** to predict the Q-values. Base Topology used and tested in course laboratory.

In [11]:
class MultiRegressor1FC(nn.Module):
    def __init__(self, x_example, number_of_channels=1, number_of_regressors=4):
        super(MultiRegressor1FC, self).__init__()
        in_features = x_example.reshape(-1).shape[0]
        self.nb_channels = number_of_channels
        self.linear = nn.Linear(in_features, number_of_regressors)
    
    def forward(self, x):
        x = x.reshape(x.shape[0], -1)
        return self.linear(x)

    def load(self):
        if self.nb_channels == 1:
            self.load_state_dict(torch.load('save_rl/weights_ANN1FC_1channel.pt'))
        else:
            self.load_state_dict(torch.load('save_rl/weights_ANN1FC_2channel.pt'))

    def save(self):
        if self.nb_channels == 1:
            torch.save(self.state_dict(), 'save_rl/weights_ANN1FC_1channel.pt')
        else:
            torch.save(self.state_dict(), 'save_rl/weights_ANN1FC_2channel.pt')

#### Model 2

**1 hidden layer regressor network** to predict the Q-values.

In [12]:
class MultiRegressor2FC(nn.Module):
    def __init__(self, x_example, number_of_channels=1, number_of_regressors=4):
        super(MultiRegressor2FC, self).__init__()
        in_features = x_example.reshape(-1).shape[0]
        self.nb_channels = number_of_channels
        self.fc1 = nn.Linear(in_features, 16)
        self.selu = nn.SELU()
        self.linear = nn.Linear(16, number_of_regressors)
    
    def forward(self, x):
        x = x.reshape(x.shape[0], -1)
        x = self.fc1(x)
        x = self.selu(x)
        return self.linear(x)

    def load(self):
        if self.nb_channels == 1:
            self.load_state_dict(torch.load('save_rl/weights_ANN2FC_1channel.pt'))
        else:
            self.load_state_dict(torch.load('save_rl/weights_ANN2FC_2channel.pt'))

    def save(self):
        if self.nb_channels == 1:
            torch.save(self.state_dict(), 'save_rl/weights_ANN2FC_1channel.pt')
        else:
            torch.save(self.state_dict(), 'save_rl/weights_ANN2FC_2channel.pt')

#### Model 3

A **CNN multi-regressor** integrating **1 fully connected layer**. Expects **1 or 2 channels** as input.

In [13]:
class MultiRegressorCNN1FC(nn.Module):
    def __init__(self, number_of_channels=1):
        super().__init__()
        self.nb_channels = number_of_channels
        self.conv1 = nn.Conv2d(number_of_channels, 16, kernel_size=3) # output_shape = (1, 16, 27, 39)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2) # output_shape = (1, 16, 13, 19)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3) # output_shape = (1, 32, 11, 17)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2) # output_shape = (1, 32, 5, 8)
        self.fc = nn.Linear(32 * 5 * 8, 4) 
        
    def forward(self, x):
        x = x.permute(0, 3, 1, 2)
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool2(x)
        x = x.reshape(x.shape[0],-1) # output_shape = (1280)
        x = self.fc(x)
        return x
    
    def load(self):
        if self.nb_channels == 1:
            self.load_state_dict(torch.load('save_rl/weights_CNN1FC_1channel.pt'))
        else:
            self.load_state_dict(torch.load('save_rl/weights_CNN1FC_2channel.pt'))

    def save(self):
        if self.nb_channels == 1:
            torch.save(self.state_dict(), 'save_rl/weights_CNN1FC_1channel.pt')
        else:
            torch.save(self.state_dict(), 'save_rl/weights_CNN1FC_2channel.pt')

Sanity check to confirm the dimensions of the tensors after each convolutional layer operations.

In [14]:
# Input shape which is the size of the canvas

x_shape = (1, 1, 29, 41)

# Default not passed parameters for Conv2d
# stride=(1,1), padding=(0,0), dilation=(1,1), groups=1

# First convolution operation
op = nn.Conv2d(1, 16, kernel_size=3)
x_shape = tensorshape(op, x_shape)
print(f'Shape after first Conv2d: {x_shape}')

# First maxpool operation
op = nn.MaxPool2d(kernel_size=2)
x_shape = tensorshape(op, x_shape)
print(f'Shape after first MaxPool2d: {x_shape}')

# Second convolution operation
op = nn.Conv2d(16, 32, kernel_size=3)
x_shape = tensorshape(op, x_shape)
print(f'Shape after second Conv2d: {x_shape}')

# Second maxpool operation
op = nn.MaxPool2d(kernel_size=2)
x_shape = tensorshape(op, x_shape)
print(f'Shape after second MaxPool2d: {x_shape}')

Shape after first Conv2d: (1, 16, 27, 39)
Shape after first MaxPool2d: (1, 16, 13, 19)
Shape after second Conv2d: (1, 32, 11, 17)
Shape after second MaxPool2d: (1, 32, 5, 8)


Check that this model fits well the data with a small sanity check.

In [15]:
# Create a test instance of the model
test_model = MultiRegressorCNN1FC(env.observe().shape[3])

# Get a sample observation of the game environment
test_input_tensor = torch.FloatTensor(env.observe())
print(f'Shape of the raw input tensor: {test_input_tensor.shape}')

# Get an output given the sample input and an untrained model just to validate teh correct output size
test_output_tensor = test_model(test_input_tensor)
print(f'Shape of the output tensor: {test_output_tensor.shape}')

Shape of the raw input tensor: torch.Size([1, 29, 41, 2])
Shape of the output tensor: torch.Size([1, 4])


#### Model 4

A **CNN multi-regressor** integrating **2 fully connected layers**. Expects **1 or 2 channels** as input.

In [16]:
class MultiRegressorCNN2FC(nn.Module):
    def __init__(self, number_of_channels=1):
        super().__init__()
        self.nb_channels = number_of_channels
        self.conv1 = nn.Conv2d(number_of_channels, 16, kernel_size=3) # output_shape = (1, 16, 27, 39)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2) # output_shape = (1, 16, 13, 19)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3) # output_shape = (1, 32, 11, 17)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2) # output_shape = (1, 32, 5, 8)
        self.fc1 = nn.Linear(32 * 5 * 8, 16)
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(16, 4)
        self.dropout = nn.Dropout(p=0.3)
        
    def forward(self, x):
        x = x.permute(0, 3, 1, 2)
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool2(x)
        #x = self.dropout(x)
        
        x = x.reshape(x.shape[0],-1) # output_shape = (1280)
        x = self.fc1(x)
        x = self.relu3(x)
        x = self.dropout(x)
        x = self.fc2(x)
        return x
    
    def load(self):
        if self.nb_channels == 1:
            self.load_state_dict(torch.load('save_rl/weights_CNN2FC_1channel.pt'))
        else:
            self.load_state_dict(torch.load('save_rl/weights_CNN2FC_2channel.pt'))

    def save(self):
        if self.nb_channels == 1:
            torch.save(self.state_dict(), 'save_rl/weights_CNN2FC_1channel.pt')
        else:
            torch.save(self.state_dict(), 'save_rl/weights_CNN2FC_2channel.pt')

Check that this model fits well the data with a small sanity check.

In [17]:
# Create a test instance of the model
test_model = MultiRegressorCNN2FC(env.observe().shape[3])

# Get a sample observation of the game environment
test_input_tensor = torch.FloatTensor(env.observe())
print(f'Shape of the raw input tensor: {test_input_tensor.shape}')

# Get an output given the sample input and an untrained model just to validate teh correct output size
test_output_tensor = test_model(test_input_tensor)
print(f'Shape of the output tensor: {test_output_tensor.shape}')

Shape of the raw input tensor: torch.Size([1, 29, 41, 2])
Shape of the output tensor: torch.Size([1, 4])


#### Model 5

A **CNN multi-regressor** integrating **3 fully connected layers**. Expects **1 or 2 channels** as input.

In [18]:
class MultiRegressorCNN3FC(nn.Module):
    def __init__(self, number_of_channels=1):
        super().__init__()
        self.nb_channels = number_of_channels
        self.conv1 = nn.Conv2d(number_of_channels, 16, kernel_size=3) # output_shape = (1, 16, 27, 39)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2) # output_shape = (1, 16, 13, 19)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3) # output_shape = (1, 32, 11, 17)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2) # output_shape = (1, 32, 5, 8)
        self.fc1 = nn.Linear(32 * 5 * 8, 10)
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(10, 10)
        self.relu4 = nn.ReLU()
        self.fc3 = nn.Linear(10, 4)
        self.dropout = nn.Dropout(p=0.2)
        
    def forward(self, x):
        x = x.permute(0, 3, 1, 2)
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.maxpool1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool2(x)
        #x = self.dropout(x)
        
        x = x.reshape(x.shape[0],-1) # output_shape = (1280)
        x = self.fc1(x)
        x = self.relu3(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu4(x)
        x = self.dropout(x)
        x = self.fc3(x)
        return x
    
    def load(self):
        if self.nb_channels == 1:
            self.load_state_dict(torch.load('save_rl/weights_CNN3FC_1channel.pt'))
        else:
            self.load_state_dict(torch.load('save_rl/weights_CNN3FC_2channel.pt'))

    def save(self):
        if self.nb_channels == 1:
            torch.save(self.state_dict(), 'save_rl/weights_CNN3FC_1channel.pt')
        else:
            torch.save(self.state_dict(), 'save_rl/weights_CNN3FC_2channel.pt')

Check that this model fits well the data with a small sanity check.

In [19]:
# Create a test instance of the model
test_model = MultiRegressorCNN3FC(env.observe().shape[3])

# Get a sample observation of the game environment
test_input_tensor = torch.FloatTensor(env.observe())
print(f'Shape of the raw input tensor: {test_input_tensor.shape}')

# Get an output given the sample input and an untrained model just to validate teh correct output size
test_output_tensor = test_model(test_input_tensor)
print(f'Shape of the output tensor: {test_output_tensor.shape}')

Shape of the raw input tensor: torch.Size([1, 29, 41, 2])
Shape of the output tensor: torch.Size([1, 4])


### Initialize Q-Function Approximation Model

Let's **initialize** the neural network of your choice!

Un-comment / comment to choose your network!

In [20]:
# Define model parameters
nb_channels = env.observe().shape[3]
print(f'Number of channels: {nb_channels}')

# Instantiate chosen model and move it to device

# Simple regressor with 1 fully connected layer
#model = MultiRegressor1FC(env.observe()[0], number_of_channels=nb_channels)

# Simple regressor with 2 fully connected layers
model = MultiRegressor2FC(env.observe()[0], number_of_channels=nb_channels)

# CNN regressor with 1 fully connected layer
#model = MultiRegressorCNN1FC(number_of_channels=nb_channels)

# CNN regressor with 2 fully connected layers
#model = MultiRegressorCNN2FC(number_of_channels=nb_channels)

# CNN regressor with 3 fully connected layers
#model = MultiRegressorCNN3FC(number_of_channels=nb_channels)

#model.to(device=device)

model

Number of channels: 2


MultiRegressor2FC(
  (fc1): Linear(in_features=2378, out_features=16, bias=True)
  (selu): SELU()
  (linear): Linear(in_features=16, out_features=4, bias=True)
)

### Loss Function and Optimizer

Define a **loss function** and **optimizer**.

In [21]:
# Define the loss function as cross-entropy
criterion = nn.MSELoss()

# Set stochastic gradient descent as the optimizer
#optimizer = torch.optim.SGD(model.parameters(),lr = 0.01)

# Set Adam as the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.001)

### Train the model

Training parameters.

In [22]:
number_of_batches = 8  # Number of batches per epoch
batch_size = 32  # Number of experiences we use for training per batch

Reset global maximum cheese counter.

In [23]:
max_cheese = 0
max_reward = 0

In [24]:
max_cheese

0

In [35]:
max_reward

2104.0

Training routine definition.

In [26]:
def play(model, epochs, criterion, optimizer=None, train=True):

    win_cnt = 0
    lose_cnt = 0
    draw_cnt = 0
    win_hist = []
    cheeses = []
    
    # Addition to track rewards
    reward_cnt = 0
    rewards = []
    
    steps = 0.
    last_W = 0
    last_D = 0
    last_L = 0
    
    global max_cheese
    global max_reward
    
    for e in tqdm(range(epochs)):
        env.reset()
        game_over = False

        # Get the current state of the environment
        state = env.observe()
        
        # Play a full game until game is over
        while not game_over:
            # Do not forget to transform the input of model into torch tensor
            state = torch.FloatTensor(state)

            # Predict the Q value for the current state
            q_values = model(state)

            # Pick the next action that maximizes the Q value
            action = torch.argmax(q_values)
            
            # Apply action, get rewards and new state
            previous_state = state.detach().clone()
            state, reward, game_over= env.act(action)
            
            # Statistics            
            reward_cnt += reward
            
            if game_over:
                steps += env.round
                if env.score > env.enemy_score:
                    win_cnt += 1
                elif env.score == env.enemy_score:
                    draw_cnt += 1
                else:
                    lose_cnt += 1
                cheese = env.score

            # Create an experience array using previous state, the performed action, the obtained reward and the new state. The vector has to be in this order.
            # Store in the experience replay buffer an experience and end game.
            # Do not forget to transform the previous state and the new state into torch tensor.
            # Create an experience array
            experience = [torch.FloatTensor(previous_state), action, reward, torch.FloatTensor(state)]

            # Store the experience in the experience replay buffer
            exp_replay.remember(experience, game_over)
            
        win_hist.append(win_cnt)  # Statistics
        cheeses.append(cheese)  # Statistics
        
        # Save the total reward of this episode and reset the episode reward counter
        rewards.append(reward_cnt)        
        reward_cnt = 0

        if train:

            # Train using experience replay. For each batch, get a set of experiences (state, action, new state) that were stored in the buffer. 
            # Use this batch to train the model.
            
            running_loss = 0
            for b in range(number_of_batches):
                # Get the batch data
                states, Q = exp_replay.get_batch(model, batch_size=batch_size)
                
                # Fit the training data in mounted device
                #states = states.to(device=device)                
                #Q = Q.to(device=device)                
               
                # Compute the loss
                loss = rl.train_on_batch(model, states, Q, criterion, optimizer)
                
                # statistics
                running_loss += loss
            #print('[%d] loss: %.3f' % (e + 1, running_loss / number_of_batches))
            running_loss = 0.0

            '''if e > 100 :  # Check to save
                cheese_np = np.array(cheeses)
                if cheese_np[-100:].sum() > max_cheese:
                    max_cheese = cheese_np[-100:].sum()
                    print(f"New maximum cheese: {max_cheese}.\nSaving model...")
                    model.save()'''
                    
            if e > 100 :  # Check to save
                rewards_np = np.array(rewards)
                if rewards_np[-100:].sum() > max_reward:
                    max_reward = rewards_np[-100:].sum()
                    print(f"New maximum rewards: {max_reward}.\nSaving model...")
                    model.save()   
        
        if (e+1) % 100 == 0:  # Statistics every 100 epochs
            cheese_np = np.array(cheeses)
            rewards_np = np.array(rewards)
            string = "Epoch {:03d}/{:03d} | Last 100 Reward {} | Last 100 Cheese {}| W/D/L {}/{}/{} | 100 W/D/L {}/{}/{} | 100 Steps {}".format(
                        e,epochs, rewards_np[-100:].sum(), 
                        cheese_np[-100:].sum(), win_cnt, draw_cnt, lose_cnt, 
                        win_cnt-last_W, draw_cnt-last_D, lose_cnt-last_L, steps/100)
            print(string)
        
            steps = 0.
            last_W = win_cnt
            last_D = draw_cnt
            last_L = lose_cnt  

### Train the Q-learner with Experience replay

If load, then the last saved result is loaded and training is continued. Otherwise, training is performed from scratch starting with random parameters.

In [33]:
load = False

if load:
    model.load()

Train the model.

In [34]:
epoch = 10000  # Total number of epochs that will be done

print("Training")
play(model, epoch, criterion, optimizer, True)
print("Training done")

#model.save()

Training


  1%|▊                                                                             | 102/10000 [00:08<12:41, 13.00it/s]

Epoch 099/10000 | Last 100 Reward 1970.0 | Last 100 Cheese 1940.0| W/D/L 51/17/32 | 100 W/D/L 51/17/32 | 100 Steps 67.53


  2%|█▌                                                                            | 202/10000 [00:16<14:24, 11.34it/s]

Epoch 199/10000 | Last 100 Reward 2006.0 | Last 100 Cheese 1957.5| W/D/L 107/31/62 | 100 W/D/L 56/14/30 | 100 Steps 66.9


  3%|██▎                                                                           | 302/10000 [00:24<13:10, 12.27it/s]

Epoch 299/10000 | Last 100 Reward 2041.0 | Last 100 Cheese 1985.5| W/D/L 160/52/88 | 100 W/D/L 53/21/26 | 100 Steps 70.25


  4%|███▏                                                                          | 402/10000 [00:33<13:34, 11.78it/s]

Epoch 399/10000 | Last 100 Reward 2005.0 | Last 100 Cheese 1961.0| W/D/L 208/72/120 | 100 W/D/L 48/20/32 | 100 Steps 69.8


  5%|███▉                                                                          | 502/10000 [00:41<13:37, 11.62it/s]

Epoch 499/10000 | Last 100 Reward 2020.0 | Last 100 Cheese 1958.5| W/D/L 262/88/150 | 100 W/D/L 54/16/30 | 100 Steps 68.45


  6%|████▋                                                                         | 602/10000 [00:50<13:18, 11.76it/s]

Epoch 599/10000 | Last 100 Reward 1972.0 | Last 100 Cheese 1936.0| W/D/L 313/101/186 | 100 W/D/L 51/13/36 | 100 Steps 68.54


  7%|█████▍                                                                        | 702/10000 [00:58<12:33, 12.33it/s]

Epoch 699/10000 | Last 100 Reward 2052.0 | Last 100 Cheese 1997.0| W/D/L 374/116/210 | 100 W/D/L 61/15/24 | 100 Steps 67.87


  8%|██████▎                                                                       | 802/10000 [01:07<12:38, 12.13it/s]

Epoch 799/10000 | Last 100 Reward 2001.0 | Last 100 Cheese 1968.5| W/D/L 428/129/243 | 100 W/D/L 54/13/33 | 100 Steps 67.44


  9%|███████                                                                       | 902/10000 [01:15<12:04, 12.57it/s]

Epoch 899/10000 | Last 100 Reward 1999.0 | Last 100 Cheese 1952.5| W/D/L 481/141/278 | 100 W/D/L 53/12/35 | 100 Steps 68.69


 10%|███████▋                                                                     | 1000/10000 [01:23<12:29, 12.01it/s]

Epoch 999/10000 | Last 100 Reward 2040.0 | Last 100 Cheese 1984.0| W/D/L 546/149/305 | 100 W/D/L 65/8/27 | 100 Steps 64.67


 11%|████████▍                                                                    | 1102/10000 [01:32<13:07, 11.30it/s]

Epoch 1099/10000 | Last 100 Reward 2016.0 | Last 100 Cheese 1959.5| W/D/L 601/165/334 | 100 W/D/L 55/16/29 | 100 Steps 66.81


 12%|█████████▎                                                                   | 1202/10000 [01:40<12:03, 12.15it/s]

Epoch 1199/10000 | Last 100 Reward 2024.0 | Last 100 Cheese 1980.0| W/D/L 654/180/366 | 100 W/D/L 53/15/32 | 100 Steps 68.61


 13%|██████████                                                                   | 1302/10000 [01:49<11:52, 12.21it/s]

Epoch 1299/10000 | Last 100 Reward 2019.0 | Last 100 Cheese 1966.0| W/D/L 714/196/390 | 100 W/D/L 60/16/24 | 100 Steps 66.07


 14%|██████████▊                                                                  | 1402/10000 [01:57<11:39, 12.28it/s]

Epoch 1399/10000 | Last 100 Reward 1940.0 | Last 100 Cheese 1917.5| W/D/L 760/208/432 | 100 W/D/L 46/12/42 | 100 Steps 67.48


 15%|███████████▌                                                                 | 1502/10000 [02:05<12:19, 11.50it/s]

Epoch 1499/10000 | Last 100 Reward 1997.0 | Last 100 Cheese 1953.5| W/D/L 813/223/464 | 100 W/D/L 53/15/32 | 100 Steps 65.61


 16%|████████████▎                                                                | 1602/10000 [02:14<11:42, 11.95it/s]

Epoch 1599/10000 | Last 100 Reward 2006.0 | Last 100 Cheese 1950.5| W/D/L 867/240/493 | 100 W/D/L 54/17/29 | 100 Steps 69.13


 17%|█████████████                                                                | 1702/10000 [02:22<11:09, 12.39it/s]

Epoch 1699/10000 | Last 100 Reward 2022.0 | Last 100 Cheese 1964.0| W/D/L 917/258/525 | 100 W/D/L 50/18/32 | 100 Steps 70.73


 18%|█████████████▉                                                               | 1802/10000 [02:31<10:53, 12.55it/s]

Epoch 1799/10000 | Last 100 Reward 1992.0 | Last 100 Cheese 1959.5| W/D/L 971/271/558 | 100 W/D/L 54/13/33 | 100 Steps 66.98


 19%|██████████████▋                                                              | 1902/10000 [02:39<11:26, 11.80it/s]

Epoch 1899/10000 | Last 100 Reward 1952.0 | Last 100 Cheese 1927.0| W/D/L 1016/288/596 | 100 W/D/L 45/17/38 | 100 Steps 69.39


 20%|███████████████▍                                                             | 2002/10000 [02:48<11:22, 11.73it/s]

Epoch 1999/10000 | Last 100 Reward 2041.0 | Last 100 Cheese 1987.5| W/D/L 1074/303/623 | 100 W/D/L 58/15/27 | 100 Steps 69.7


 21%|████████████████▏                                                            | 2102/10000 [02:56<11:10, 11.78it/s]

Epoch 2099/10000 | Last 100 Reward 2012.0 | Last 100 Cheese 1972.0| W/D/L 1132/316/652 | 100 W/D/L 58/13/29 | 100 Steps 68.94


 22%|████████████████▉                                                            | 2202/10000 [03:04<10:39, 12.20it/s]

Epoch 2199/10000 | Last 100 Reward 2007.0 | Last 100 Cheese 1968.5| W/D/L 1184/335/681 | 100 W/D/L 52/19/29 | 100 Steps 68.14


 23%|█████████████████▋                                                           | 2302/10000 [03:13<10:45, 11.92it/s]

Epoch 2299/10000 | Last 100 Reward 2058.0 | Last 100 Cheese 2000.0| W/D/L 1244/353/703 | 100 W/D/L 60/18/22 | 100 Steps 68.32


 23%|█████████████████▊                                                           | 2308/10000 [03:13<10:59, 11.67it/s]

New maximum rewards: 2072.0.
Saving model...


 24%|██████████████████▍                                                          | 2402/10000 [03:21<10:40, 11.86it/s]

Epoch 2399/10000 | Last 100 Reward 2011.0 | Last 100 Cheese 1958.5| W/D/L 1298/370/732 | 100 W/D/L 54/17/29 | 100 Steps 68.87


 25%|███████████████████▎                                                         | 2500/10000 [03:29<10:29, 11.91it/s]

Epoch 2499/10000 | Last 100 Reward 2012.0 | Last 100 Cheese 1962.0| W/D/L 1351/387/762 | 100 W/D/L 53/17/30 | 100 Steps 67.81


 26%|████████████████████                                                         | 2602/10000 [03:38<10:22, 11.88it/s]

Epoch 2599/10000 | Last 100 Reward 1992.0 | Last 100 Cheese 1959.5| W/D/L 1401/407/792 | 100 W/D/L 50/20/30 | 100 Steps 68.83


 27%|████████████████████▊                                                        | 2702/10000 [03:46<10:06, 12.03it/s]

Epoch 2699/10000 | Last 100 Reward 1956.0 | Last 100 Cheese 1909.5| W/D/L 1448/416/836 | 100 W/D/L 47/9/44 | 100 Steps 70.65


 28%|█████████████████████▌                                                       | 2802/10000 [03:55<09:49, 12.22it/s]

Epoch 2799/10000 | Last 100 Reward 1954.0 | Last 100 Cheese 1922.5| W/D/L 1497/429/874 | 100 W/D/L 49/13/38 | 100 Steps 68.27


 29%|██████████████████████▎                                                      | 2900/10000 [04:03<11:18, 10.46it/s]

Epoch 2899/10000 | Last 100 Reward 2018.0 | Last 100 Cheese 1976.5| W/D/L 1551/448/901 | 100 W/D/L 54/19/27 | 100 Steps 68.78


 30%|██████████████████████▊                                                      | 2960/10000 [04:08<09:42, 12.08it/s]

New maximum rewards: 2074.0.
Saving model...


 30%|██████████████████████▊                                                      | 2966/10000 [04:09<09:56, 11.78it/s]

New maximum rewards: 2082.0.
Saving model...


 30%|██████████████████████▉                                                      | 2974/10000 [04:09<09:51, 11.88it/s]

New maximum rewards: 2088.0.
Saving model...
New maximum rewards: 2092.0.
Saving model...


 30%|██████████████████████▉                                                      | 2976/10000 [04:10<09:56, 11.77it/s]

New maximum rewards: 2093.0.
Saving model...
New maximum rewards: 2094.0.
Saving model...


 30%|██████████████████████▉                                                      | 2980/10000 [04:10<09:58, 11.72it/s]

New maximum rewards: 2098.0.
Saving model...
New maximum rewards: 2104.0.
Saving model...


 30%|███████████████████████                                                      | 3002/10000 [04:12<09:37, 12.11it/s]

Epoch 2999/10000 | Last 100 Reward 2073.0 | Last 100 Cheese 1994.5| W/D/L 1615/459/926 | 100 W/D/L 64/11/25 | 100 Steps 68.29


 31%|███████████████████████▉                                                     | 3102/10000 [04:20<09:50, 11.68it/s]

Epoch 3099/10000 | Last 100 Reward 2046.0 | Last 100 Cheese 1982.5| W/D/L 1682/468/950 | 100 W/D/L 67/9/24 | 100 Steps 66.19


 32%|████████████████████████▋                                                    | 3202/10000 [04:28<09:15, 12.24it/s]

Epoch 3199/10000 | Last 100 Reward 1993.0 | Last 100 Cheese 1950.0| W/D/L 1735/483/982 | 100 W/D/L 53/15/32 | 100 Steps 68.12


 33%|█████████████████████████▍                                                   | 3302/10000 [04:37<09:08, 12.22it/s]

Epoch 3299/10000 | Last 100 Reward 2007.0 | Last 100 Cheese 1971.5| W/D/L 1790/501/1009 | 100 W/D/L 55/18/27 | 100 Steps 68.08


 34%|██████████████████████████▏                                                  | 3402/10000 [04:45<09:07, 12.05it/s]

Epoch 3399/10000 | Last 100 Reward 1998.0 | Last 100 Cheese 1960.5| W/D/L 1843/513/1044 | 100 W/D/L 53/12/35 | 100 Steps 66.94


 35%|██████████████████████████▉                                                  | 3502/10000 [04:54<08:51, 12.23it/s]

Epoch 3499/10000 | Last 100 Reward 1949.0 | Last 100 Cheese 1906.5| W/D/L 1887/524/1089 | 100 W/D/L 44/11/45 | 100 Steps 70.0


 36%|███████████████████████████▋                                                 | 3602/10000 [05:02<08:52, 12.02it/s]

Epoch 3599/10000 | Last 100 Reward 2073.0 | Last 100 Cheese 1995.5| W/D/L 1949/535/1116 | 100 W/D/L 62/11/27 | 100 Steps 68.63


 37%|████████████████████████████▌                                                | 3702/10000 [05:11<08:51, 11.85it/s]

Epoch 3699/10000 | Last 100 Reward 2063.0 | Last 100 Cheese 1982.5| W/D/L 2016/543/1141 | 100 W/D/L 67/8/25 | 100 Steps 67.66


 38%|█████████████████████████████▎                                               | 3802/10000 [05:19<08:30, 12.15it/s]

Epoch 3799/10000 | Last 100 Reward 2045.0 | Last 100 Cheese 1990.0| W/D/L 2076/558/1166 | 100 W/D/L 60/15/25 | 100 Steps 68.93


 39%|██████████████████████████████                                               | 3902/10000 [05:27<08:22, 12.13it/s]

Epoch 3899/10000 | Last 100 Reward 2033.0 | Last 100 Cheese 1973.0| W/D/L 2133/573/1194 | 100 W/D/L 57/15/28 | 100 Steps 67.1


 40%|██████████████████████████████▊                                              | 4002/10000 [05:36<08:35, 11.64it/s]

Epoch 3999/10000 | Last 100 Reward 1993.0 | Last 100 Cheese 1960.5| W/D/L 2188/589/1223 | 100 W/D/L 55/16/29 | 100 Steps 67.22


 41%|███████████████████████████████▌                                             | 4102/10000 [05:44<08:28, 11.60it/s]

Epoch 4099/10000 | Last 100 Reward 2015.0 | Last 100 Cheese 1972.0| W/D/L 2242/601/1257 | 100 W/D/L 54/12/34 | 100 Steps 67.96


 42%|████████████████████████████████▎                                            | 4202/10000 [05:52<07:54, 12.22it/s]

Epoch 4199/10000 | Last 100 Reward 1984.0 | Last 100 Cheese 1944.0| W/D/L 2298/611/1291 | 100 W/D/L 56/10/34 | 100 Steps 67.16


 43%|█████████████████████████████████▏                                           | 4302/10000 [06:01<08:02, 11.82it/s]

Epoch 4299/10000 | Last 100 Reward 1988.0 | Last 100 Cheese 1953.5| W/D/L 2351/625/1324 | 100 W/D/L 53/14/33 | 100 Steps 70.29


 44%|█████████████████████████████████▉                                           | 4402/10000 [06:09<08:07, 11.47it/s]

Epoch 4399/10000 | Last 100 Reward 2018.0 | Last 100 Cheese 1967.5| W/D/L 2408/640/1352 | 100 W/D/L 57/15/28 | 100 Steps 69.44


 45%|██████████████████████████████████▋                                          | 4500/10000 [06:17<07:42, 11.88it/s]

Epoch 4499/10000 | Last 100 Reward 2017.0 | Last 100 Cheese 1967.0| W/D/L 2471/647/1382 | 100 W/D/L 63/7/30 | 100 Steps 66.0


 46%|███████████████████████████████████▍                                         | 4602/10000 [06:26<07:36, 11.83it/s]

Epoch 4599/10000 | Last 100 Reward 1968.0 | Last 100 Cheese 1936.0| W/D/L 2519/662/1419 | 100 W/D/L 48/15/37 | 100 Steps 68.25


 47%|████████████████████████████████████▏                                        | 4702/10000 [06:34<07:31, 11.74it/s]

Epoch 4699/10000 | Last 100 Reward 2025.0 | Last 100 Cheese 1976.5| W/D/L 2575/674/1451 | 100 W/D/L 56/12/32 | 100 Steps 67.39


 48%|████████████████████████████████████▉                                        | 4802/10000 [06:43<07:26, 11.63it/s]

Epoch 4799/10000 | Last 100 Reward 2073.0 | Last 100 Cheese 2013.5| W/D/L 2635/694/1471 | 100 W/D/L 60/20/20 | 100 Steps 68.84


 49%|█████████████████████████████████████▋                                       | 4900/10000 [06:51<06:56, 12.24it/s]

Epoch 4899/10000 | Last 100 Reward 2013.0 | Last 100 Cheese 1967.5| W/D/L 2695/710/1495 | 100 W/D/L 60/16/24 | 100 Steps 67.37


 50%|██████████████████████████████████████▌                                      | 5002/10000 [07:00<07:10, 11.61it/s]

Epoch 4999/10000 | Last 100 Reward 2052.0 | Last 100 Cheese 1984.5| W/D/L 2756/721/1523 | 100 W/D/L 61/11/28 | 100 Steps 70.09


 51%|███████████████████████████████████████▎                                     | 5100/10000 [07:08<06:43, 12.14it/s]

Epoch 5099/10000 | Last 100 Reward 2034.0 | Last 100 Cheese 1977.0| W/D/L 2818/733/1549 | 100 W/D/L 62/12/26 | 100 Steps 67.19


 52%|████████████████████████████████████████                                     | 5202/10000 [07:16<07:00, 11.41it/s]

Epoch 5199/10000 | Last 100 Reward 1979.0 | Last 100 Cheese 1942.5| W/D/L 2871/746/1583 | 100 W/D/L 53/13/34 | 100 Steps 67.33


 53%|████████████████████████████████████████▊                                    | 5302/10000 [07:25<06:40, 11.73it/s]

Epoch 5299/10000 | Last 100 Reward 2012.0 | Last 100 Cheese 1970.5| W/D/L 2924/762/1614 | 100 W/D/L 53/16/31 | 100 Steps 70.45


 54%|█████████████████████████████████████████▌                                   | 5402/10000 [07:33<06:23, 11.99it/s]

Epoch 5399/10000 | Last 100 Reward 2025.0 | Last 100 Cheese 1974.0| W/D/L 2986/769/1645 | 100 W/D/L 62/7/31 | 100 Steps 68.59


 55%|██████████████████████████████████████████▎                                  | 5500/10000 [07:42<06:14, 12.03it/s]

Epoch 5499/10000 | Last 100 Reward 1977.0 | Last 100 Cheese 1953.0| W/D/L 3033/787/1680 | 100 W/D/L 47/18/35 | 100 Steps 68.28


 56%|███████████████████████████████████████████                                  | 5600/10000 [07:50<06:08, 11.94it/s]

Epoch 5599/10000 | Last 100 Reward 1999.0 | Last 100 Cheese 1954.0| W/D/L 3079/806/1715 | 100 W/D/L 46/19/35 | 100 Steps 67.68


 57%|███████████████████████████████████████████▉                                 | 5702/10000 [07:59<06:03, 11.82it/s]

Epoch 5699/10000 | Last 100 Reward 2075.0 | Last 100 Cheese 1987.5| W/D/L 3136/822/1742 | 100 W/D/L 57/16/27 | 100 Steps 68.53


 58%|████████████████████████████████████████████▋                                | 5802/10000 [08:07<05:43, 12.21it/s]

Epoch 5799/10000 | Last 100 Reward 2040.0 | Last 100 Cheese 1973.5| W/D/L 3193/836/1771 | 100 W/D/L 57/14/29 | 100 Steps 66.61


 59%|█████████████████████████████████████████████▍                               | 5902/10000 [08:15<05:42, 11.96it/s]

Epoch 5899/10000 | Last 100 Reward 1994.0 | Last 100 Cheese 1967.0| W/D/L 3242/858/1800 | 100 W/D/L 49/22/29 | 100 Steps 67.87


 60%|██████████████████████████████████████████████▏                              | 6002/10000 [08:24<05:38, 11.81it/s]

Epoch 5999/10000 | Last 100 Reward 2009.0 | Last 100 Cheese 1959.5| W/D/L 3292/879/1829 | 100 W/D/L 50/21/29 | 100 Steps 69.26


 61%|██████████████████████████████████████████████▉                              | 6102/10000 [08:32<05:29, 11.85it/s]

Epoch 6099/10000 | Last 100 Reward 2006.0 | Last 100 Cheese 1966.0| W/D/L 3345/896/1859 | 100 W/D/L 53/17/30 | 100 Steps 69.09


 62%|███████████████████████████████████████████████▊                             | 6202/10000 [08:41<05:22, 11.76it/s]

Epoch 6199/10000 | Last 100 Reward 2074.0 | Last 100 Cheese 1990.0| W/D/L 3408/913/1879 | 100 W/D/L 63/17/20 | 100 Steps 68.5


 63%|████████████████████████████████████████████████▌                            | 6302/10000 [08:49<05:03, 12.18it/s]

Epoch 6299/10000 | Last 100 Reward 2037.0 | Last 100 Cheese 1988.0| W/D/L 3469/926/1905 | 100 W/D/L 61/13/26 | 100 Steps 66.98


 64%|█████████████████████████████████████████████████▎                           | 6402/10000 [08:57<04:55, 12.16it/s]

Epoch 6399/10000 | Last 100 Reward 1976.0 | Last 100 Cheese 1943.0| W/D/L 3521/942/1937 | 100 W/D/L 52/16/32 | 100 Steps 68.74


 65%|██████████████████████████████████████████████████                           | 6502/10000 [09:06<04:55, 11.83it/s]

Epoch 6499/10000 | Last 100 Reward 2049.0 | Last 100 Cheese 1981.5| W/D/L 3576/957/1967 | 100 W/D/L 55/15/30 | 100 Steps 68.66


 66%|██████████████████████████████████████████████████▊                          | 6602/10000 [09:14<04:36, 12.30it/s]

Epoch 6599/10000 | Last 100 Reward 1991.0 | Last 100 Cheese 1950.5| W/D/L 3627/970/2003 | 100 W/D/L 51/13/36 | 100 Steps 69.02


 67%|███████████████████████████████████████████████████▌                         | 6700/10000 [09:22<04:44, 11.60it/s]

Epoch 6699/10000 | Last 100 Reward 2091.0 | Last 100 Cheese 2011.0| W/D/L 3692/980/2028 | 100 W/D/L 65/10/25 | 100 Steps 66.36


 68%|████████████████████████████████████████████████████▍                        | 6802/10000 [09:31<04:42, 11.30it/s]

Epoch 6799/10000 | Last 100 Reward 2067.0 | Last 100 Cheese 1988.0| W/D/L 3754/995/2051 | 100 W/D/L 62/15/23 | 100 Steps 67.56


 69%|█████████████████████████████████████████████████████▏                       | 6900/10000 [09:39<04:11, 12.31it/s]

Epoch 6899/10000 | Last 100 Reward 2019.0 | Last 100 Cheese 1973.5| W/D/L 3815/1006/2079 | 100 W/D/L 61/11/28 | 100 Steps 65.99


 70%|█████████████████████████████████████████████████████▉                       | 7002/10000 [09:48<04:11, 11.91it/s]

Epoch 6999/10000 | Last 100 Reward 1972.0 | Last 100 Cheese 1933.5| W/D/L 3870/1019/2111 | 100 W/D/L 55/13/32 | 100 Steps 68.86


 71%|██████████████████████████████████████████████████████▋                      | 7102/10000 [09:57<04:09, 11.61it/s]

Epoch 7099/10000 | Last 100 Reward 2043.0 | Last 100 Cheese 1969.0| W/D/L 3925/1037/2138 | 100 W/D/L 55/18/27 | 100 Steps 68.29


 72%|███████████████████████████████████████████████████████▍                     | 7202/10000 [10:05<03:48, 12.22it/s]

Epoch 7199/10000 | Last 100 Reward 2013.0 | Last 100 Cheese 1962.0| W/D/L 3976/1056/2168 | 100 W/D/L 51/19/30 | 100 Steps 68.88


 73%|████████████████████████████████████████████████████████▏                    | 7300/10000 [10:13<03:48, 11.83it/s]

Epoch 7299/10000 | Last 100 Reward 2001.0 | Last 100 Cheese 1965.0| W/D/L 4033/1069/2198 | 100 W/D/L 57/13/30 | 100 Steps 68.11


 74%|████████████████████████████████████████████████████████▉                    | 7400/10000 [10:22<03:39, 11.83it/s]

Epoch 7399/10000 | Last 100 Reward 1965.0 | Last 100 Cheese 1917.5| W/D/L 4079/1082/2239 | 100 W/D/L 46/13/41 | 100 Steps 68.14


 75%|█████████████████████████████████████████████████████████▊                   | 7502/10000 [10:30<03:29, 11.92it/s]

Epoch 7499/10000 | Last 100 Reward 2025.0 | Last 100 Cheese 1966.5| W/D/L 4135/1098/2267 | 100 W/D/L 56/16/28 | 100 Steps 67.8


 76%|██████████████████████████████████████████████████████████▌                  | 7602/10000 [10:39<03:13, 12.42it/s]

Epoch 7599/10000 | Last 100 Reward 1976.0 | Last 100 Cheese 1931.5| W/D/L 4188/1107/2305 | 100 W/D/L 53/9/38 | 100 Steps 64.65


 77%|███████████████████████████████████████████████████████████▎                 | 7702/10000 [10:47<03:15, 11.76it/s]

Epoch 7699/10000 | Last 100 Reward 2001.0 | Last 100 Cheese 1963.5| W/D/L 4243/1122/2335 | 100 W/D/L 55/15/30 | 100 Steps 66.67


 78%|████████████████████████████████████████████████████████████                 | 7800/10000 [10:55<03:01, 12.15it/s]

Epoch 7799/10000 | Last 100 Reward 1999.0 | Last 100 Cheese 1969.0| W/D/L 4293/1138/2369 | 100 W/D/L 50/16/34 | 100 Steps 68.47


 79%|████████████████████████████████████████████████████████████▊                | 7902/10000 [11:04<02:48, 12.45it/s]

Epoch 7899/10000 | Last 100 Reward 1993.0 | Last 100 Cheese 1967.5| W/D/L 4341/1154/2405 | 100 W/D/L 48/16/36 | 100 Steps 69.53


 80%|█████████████████████████████████████████████████████████████▌               | 8000/10000 [11:12<02:58, 11.21it/s]

Epoch 7999/10000 | Last 100 Reward 2020.0 | Last 100 Cheese 1962.0| W/D/L 4395/1170/2435 | 100 W/D/L 54/16/30 | 100 Steps 67.77


 81%|██████████████████████████████████████████████████████████████▍              | 8102/10000 [11:21<02:38, 11.99it/s]

Epoch 8099/10000 | Last 100 Reward 2044.0 | Last 100 Cheese 1987.0| W/D/L 4453/1181/2466 | 100 W/D/L 58/11/31 | 100 Steps 68.43


 82%|███████████████████████████████████████████████████████████████▏             | 8202/10000 [11:29<02:31, 11.85it/s]

Epoch 8199/10000 | Last 100 Reward 1962.0 | Last 100 Cheese 1921.0| W/D/L 4500/1192/2508 | 100 W/D/L 47/11/42 | 100 Steps 67.4


 83%|███████████████████████████████████████████████████████████████▉             | 8302/10000 [11:38<02:18, 12.23it/s]

Epoch 8299/10000 | Last 100 Reward 1937.0 | Last 100 Cheese 1912.5| W/D/L 4547/1201/2552 | 100 W/D/L 47/9/44 | 100 Steps 68.17


 84%|████████████████████████████████████████████████████████████████▋            | 8402/10000 [11:46<02:13, 11.96it/s]

Epoch 8399/10000 | Last 100 Reward 2046.0 | Last 100 Cheese 1993.0| W/D/L 4606/1215/2579 | 100 W/D/L 59/14/27 | 100 Steps 67.9


 85%|█████████████████████████████████████████████████████████████████▍           | 8502/10000 [11:54<02:04, 12.07it/s]

Epoch 8499/10000 | Last 100 Reward 2008.0 | Last 100 Cheese 1960.5| W/D/L 4664/1227/2609 | 100 W/D/L 58/12/30 | 100 Steps 67.86


 86%|██████████████████████████████████████████████████████████████████▏          | 8600/10000 [12:03<01:57, 11.93it/s]

Epoch 8599/10000 | Last 100 Reward 1999.0 | Last 100 Cheese 1953.5| W/D/L 4720/1236/2644 | 100 W/D/L 56/9/35 | 100 Steps 67.3


 87%|███████████████████████████████████████████████████████████████████          | 8702/10000 [12:11<01:46, 12.21it/s]

Epoch 8699/10000 | Last 100 Reward 2054.0 | Last 100 Cheese 1993.5| W/D/L 4779/1257/2664 | 100 W/D/L 59/21/20 | 100 Steps 68.46


 88%|███████████████████████████████████████████████████████████████████▊         | 8800/10000 [12:19<01:40, 11.89it/s]

Epoch 8799/10000 | Last 100 Reward 2030.0 | Last 100 Cheese 1974.5| W/D/L 4833/1272/2695 | 100 W/D/L 54/15/31 | 100 Steps 69.62


 89%|████████████████████████████████████████████████████████████████████▌        | 8902/10000 [12:28<01:32, 11.83it/s]

Epoch 8899/10000 | Last 100 Reward 1997.0 | Last 100 Cheese 1931.0| W/D/L 4889/1285/2726 | 100 W/D/L 56/13/31 | 100 Steps 66.54


 90%|█████████████████████████████████████████████████████████████████████▎       | 9002/10000 [12:36<01:24, 11.79it/s]

Epoch 8999/10000 | Last 100 Reward 2006.0 | Last 100 Cheese 1956.0| W/D/L 4941/1302/2757 | 100 W/D/L 52/17/31 | 100 Steps 68.15


 91%|██████████████████████████████████████████████████████████████████████       | 9102/10000 [12:45<01:13, 12.20it/s]

Epoch 9099/10000 | Last 100 Reward 2004.0 | Last 100 Cheese 1943.5| W/D/L 5000/1313/2787 | 100 W/D/L 59/11/30 | 100 Steps 65.16


 92%|██████████████████████████████████████████████████████████████████████▊      | 9202/10000 [12:53<01:06, 11.93it/s]

Epoch 9199/10000 | Last 100 Reward 1988.0 | Last 100 Cheese 1960.0| W/D/L 5053/1331/2816 | 100 W/D/L 53/18/29 | 100 Steps 67.3


 93%|███████████████████████████████████████████████████████████████████████▋     | 9302/10000 [13:02<00:56, 12.34it/s]

Epoch 9299/10000 | Last 100 Reward 2011.0 | Last 100 Cheese 1946.5| W/D/L 5101/1346/2853 | 100 W/D/L 48/15/37 | 100 Steps 69.51


 94%|████████████████████████████████████████████████████████████████████████▍    | 9402/10000 [13:10<00:48, 12.35it/s]

Epoch 9399/10000 | Last 100 Reward 2055.0 | Last 100 Cheese 1987.5| W/D/L 5163/1364/2873 | 100 W/D/L 62/18/20 | 100 Steps 66.42


 95%|█████████████████████████████████████████████████████████████████████████▏   | 9500/10000 [13:18<00:41, 12.04it/s]

Epoch 9499/10000 | Last 100 Reward 2020.0 | Last 100 Cheese 1973.0| W/D/L 5223/1376/2901 | 100 W/D/L 60/12/28 | 100 Steps 66.38


 96%|█████████████████████████████████████████████████████████████████████████▉   | 9602/10000 [13:27<00:33, 11.95it/s]

Epoch 9599/10000 | Last 100 Reward 2020.0 | Last 100 Cheese 1977.5| W/D/L 5277/1393/2930 | 100 W/D/L 54/17/29 | 100 Steps 66.29


 97%|██████████████████████████████████████████████████████████████████████████▋  | 9702/10000 [13:35<00:24, 11.94it/s]

Epoch 9699/10000 | Last 100 Reward 2009.0 | Last 100 Cheese 1957.5| W/D/L 5323/1412/2965 | 100 W/D/L 46/19/35 | 100 Steps 69.83


 98%|███████████████████████████████████████████████████████████████████████████▍ | 9802/10000 [13:44<00:16, 11.89it/s]

Epoch 9799/10000 | Last 100 Reward 2023.0 | Last 100 Cheese 1960.0| W/D/L 5375/1430/2995 | 100 W/D/L 52/18/30 | 100 Steps 69.57


 99%|████████████████████████████████████████████████████████████████████████████▏| 9902/10000 [13:52<00:08, 12.17it/s]

Epoch 9899/10000 | Last 100 Reward 1989.0 | Last 100 Cheese 1951.5| W/D/L 5425/1448/3027 | 100 W/D/L 50/18/32 | 100 Steps 66.62


100%|████████████████████████████████████████████████████████████████████████████| 10000/10000 [14:00<00:00, 11.90it/s]

Epoch 9999/10000 | Last 100 Reward 2014.0 | Last 100 Cheese 1964.0| W/D/L 5483/1461/3056 | 100 W/D/L 58/13/29 | 100 Steps 66.31
Training done





### Evaluate the Q-learner model

Load the best performant weight parameters.

In [29]:
# Evaluate previous model
load = True

if load:
    model.load()

Evaluate the model.

In [41]:
epoch = 10000  # Total number of epochs that will be done

print("Testing")
play(model, epoch, criterion, optimizer, False)
print("Testing done")

Testing


  1%|▊                                                                             | 108/10000 [00:01<02:30, 65.71it/s]

Epoch 099/10000 | Last 100 Reward 1989.0 | Last 100 Cheese 1956.0| W/D/L 47/16/37 | 100 W/D/L 47/16/37 | 100 Steps 68.83


  2%|█▋                                                                            | 210/10000 [00:03<02:18, 70.85it/s]

Epoch 199/10000 | Last 100 Reward 2006.0 | Last 100 Cheese 1960.5| W/D/L 102/24/74 | 100 W/D/L 55/8/37 | 100 Steps 67.01


  3%|██▍                                                                           | 312/10000 [00:04<02:15, 71.47it/s]

Epoch 299/10000 | Last 100 Reward 2071.0 | Last 100 Cheese 2002.0| W/D/L 164/37/99 | 100 W/D/L 62/13/25 | 100 Steps 67.68


  4%|███▏                                                                          | 411/10000 [00:05<02:22, 67.43it/s]

Epoch 399/10000 | Last 100 Reward 2031.0 | Last 100 Cheese 1981.5| W/D/L 220/54/126 | 100 W/D/L 56/17/27 | 100 Steps 69.19


  5%|███▉                                                                          | 510/10000 [00:07<02:10, 72.88it/s]

Epoch 499/10000 | Last 100 Reward 2050.0 | Last 100 Cheese 1984.0| W/D/L 283/63/154 | 100 W/D/L 63/9/28 | 100 Steps 67.39


  6%|████▋                                                                         | 608/10000 [00:08<02:11, 71.30it/s]

Epoch 599/10000 | Last 100 Reward 2012.0 | Last 100 Cheese 1961.5| W/D/L 347/70/183 | 100 W/D/L 64/7/29 | 100 Steps 66.03


  7%|█████▌                                                                        | 713/10000 [00:10<02:07, 73.00it/s]

Epoch 699/10000 | Last 100 Reward 2028.0 | Last 100 Cheese 1974.0| W/D/L 405/79/216 | 100 W/D/L 58/9/33 | 100 Steps 67.27


  8%|██████▎                                                                       | 808/10000 [00:11<02:17, 66.95it/s]

Epoch 799/10000 | Last 100 Reward 2039.0 | Last 100 Cheese 1972.0| W/D/L 462/96/242 | 100 W/D/L 57/17/26 | 100 Steps 67.83


  9%|███████                                                                       | 912/10000 [00:13<02:11, 69.14it/s]

Epoch 899/10000 | Last 100 Reward 2059.0 | Last 100 Cheese 2000.5| W/D/L 522/113/265 | 100 W/D/L 60/17/23 | 100 Steps 69.28


 10%|███████▊                                                                     | 1007/10000 [00:14<02:20, 64.21it/s]

Epoch 999/10000 | Last 100 Reward 2008.0 | Last 100 Cheese 1967.5| W/D/L 577/128/295 | 100 W/D/L 55/15/30 | 100 Steps 69.29


 11%|████████▌                                                                    | 1110/10000 [00:16<02:10, 68.01it/s]

Epoch 1099/10000 | Last 100 Reward 1984.0 | Last 100 Cheese 1965.5| W/D/L 624/142/334 | 100 W/D/L 47/14/39 | 100 Steps 69.83


 12%|█████████▎                                                                   | 1210/10000 [00:17<02:01, 72.14it/s]

Epoch 1199/10000 | Last 100 Reward 2031.0 | Last 100 Cheese 1986.0| W/D/L 682/162/356 | 100 W/D/L 58/20/22 | 100 Steps 66.98


 13%|██████████                                                                   | 1307/10000 [00:18<02:08, 67.40it/s]

Epoch 1299/10000 | Last 100 Reward 2033.0 | Last 100 Cheese 1972.5| W/D/L 739/174/387 | 100 W/D/L 57/12/31 | 100 Steps 66.78


 14%|██████████▊                                                                  | 1410/10000 [00:20<02:08, 66.82it/s]

Epoch 1399/10000 | Last 100 Reward 1967.0 | Last 100 Cheese 1932.0| W/D/L 787/189/424 | 100 W/D/L 48/15/37 | 100 Steps 68.43


 15%|███████████▋                                                                 | 1510/10000 [00:22<02:08, 65.97it/s]

Epoch 1499/10000 | Last 100 Reward 2052.0 | Last 100 Cheese 1996.0| W/D/L 845/206/449 | 100 W/D/L 58/17/25 | 100 Steps 69.21


 16%|████████████▎                                                                | 1607/10000 [00:23<02:09, 64.59it/s]

Epoch 1599/10000 | Last 100 Reward 2044.0 | Last 100 Cheese 1995.5| W/D/L 902/224/474 | 100 W/D/L 57/18/25 | 100 Steps 67.4


 17%|█████████████▏                                                               | 1708/10000 [00:25<02:07, 65.27it/s]

Epoch 1699/10000 | Last 100 Reward 2052.0 | Last 100 Cheese 1977.0| W/D/L 962/238/500 | 100 W/D/L 60/14/26 | 100 Steps 69.1


 18%|█████████████▉                                                               | 1808/10000 [00:26<02:10, 62.96it/s]

Epoch 1799/10000 | Last 100 Reward 2017.0 | Last 100 Cheese 1970.0| W/D/L 1018/253/529 | 100 W/D/L 56/15/29 | 100 Steps 70.24


 19%|██████████████▋                                                              | 1906/10000 [00:28<02:25, 55.79it/s]

Epoch 1899/10000 | Last 100 Reward 2056.0 | Last 100 Cheese 1994.0| W/D/L 1074/269/557 | 100 W/D/L 56/16/28 | 100 Steps 71.43


 20%|███████████████▍                                                             | 2008/10000 [00:29<01:53, 70.52it/s]

Epoch 1999/10000 | Last 100 Reward 2006.0 | Last 100 Cheese 1949.5| W/D/L 1131/281/588 | 100 W/D/L 57/12/31 | 100 Steps 65.27


 21%|████████████████▏                                                            | 2106/10000 [00:31<01:58, 66.55it/s]

Epoch 2099/10000 | Last 100 Reward 2060.0 | Last 100 Cheese 1988.0| W/D/L 1187/298/615 | 100 W/D/L 56/17/27 | 100 Steps 69.04


 22%|█████████████████                                                            | 2208/10000 [00:32<01:53, 68.69it/s]

Epoch 2199/10000 | Last 100 Reward 2063.0 | Last 100 Cheese 1993.5| W/D/L 1247/312/641 | 100 W/D/L 60/14/26 | 100 Steps 68.7


 23%|█████████████████▊                                                           | 2312/10000 [00:34<01:51, 69.01it/s]

Epoch 2299/10000 | Last 100 Reward 1998.0 | Last 100 Cheese 1947.5| W/D/L 1303/324/673 | 100 W/D/L 56/12/32 | 100 Steps 66.64


 24%|██████████████████▌                                                          | 2408/10000 [00:35<01:52, 67.25it/s]

Epoch 2399/10000 | Last 100 Reward 2044.0 | Last 100 Cheese 1975.5| W/D/L 1367/331/702 | 100 W/D/L 64/7/29 | 100 Steps 68.07


 25%|███████████████████▎                                                         | 2511/10000 [00:37<01:55, 64.97it/s]

Epoch 2499/10000 | Last 100 Reward 2030.0 | Last 100 Cheese 1978.0| W/D/L 1426/345/729 | 100 W/D/L 59/14/27 | 100 Steps 68.94


 26%|████████████████████                                                         | 2608/10000 [00:38<01:53, 65.07it/s]

Epoch 2599/10000 | Last 100 Reward 2024.0 | Last 100 Cheese 1975.0| W/D/L 1484/357/759 | 100 W/D/L 58/12/30 | 100 Steps 66.74


 27%|████████████████████▊                                                        | 2706/10000 [00:39<01:46, 68.72it/s]

Epoch 2699/10000 | Last 100 Reward 1983.0 | Last 100 Cheese 1946.5| W/D/L 1537/371/792 | 100 W/D/L 53/14/33 | 100 Steps 67.8


 28%|█████████████████████▋                                                       | 2810/10000 [00:41<01:50, 65.25it/s]

Epoch 2799/10000 | Last 100 Reward 2060.0 | Last 100 Cheese 1996.0| W/D/L 1601/383/816 | 100 W/D/L 64/12/24 | 100 Steps 67.42


 29%|██████████████████████▍                                                      | 2908/10000 [00:43<01:42, 69.51it/s]

Epoch 2899/10000 | Last 100 Reward 2041.0 | Last 100 Cheese 1991.5| W/D/L 1657/405/838 | 100 W/D/L 56/22/22 | 100 Steps 68.38


 30%|███████████████████████▏                                                     | 3009/10000 [00:44<01:47, 64.90it/s]

Epoch 2999/10000 | Last 100 Reward 2010.0 | Last 100 Cheese 1951.0| W/D/L 1709/417/874 | 100 W/D/L 52/12/36 | 100 Steps 68.63


 31%|███████████████████████▉                                                     | 3113/10000 [00:46<01:46, 64.37it/s]

Epoch 3099/10000 | Last 100 Reward 2024.0 | Last 100 Cheese 1985.5| W/D/L 1765/433/902 | 100 W/D/L 56/16/28 | 100 Steps 68.44


 32%|████████████████████████▋                                                    | 3209/10000 [00:47<01:40, 67.39it/s]

Epoch 3199/10000 | Last 100 Reward 1999.0 | Last 100 Cheese 1953.0| W/D/L 1818/446/936 | 100 W/D/L 53/13/34 | 100 Steps 68.24


 33%|█████████████████████████▍                                                   | 3310/10000 [00:49<01:44, 64.24it/s]

Epoch 3299/10000 | Last 100 Reward 2047.0 | Last 100 Cheese 1992.5| W/D/L 1877/465/958 | 100 W/D/L 59/19/22 | 100 Steps 68.62


 34%|██████████████████████████▏                                                  | 3406/10000 [00:50<01:46, 61.65it/s]

Epoch 3399/10000 | Last 100 Reward 2017.0 | Last 100 Cheese 1969.5| W/D/L 1933/474/993 | 100 W/D/L 56/9/35 | 100 Steps 68.74


 35%|███████████████████████████                                                  | 3508/10000 [00:52<01:42, 63.29it/s]

Epoch 3499/10000 | Last 100 Reward 2050.0 | Last 100 Cheese 2009.0| W/D/L 1995/491/1014 | 100 W/D/L 62/17/21 | 100 Steps 66.67


 36%|███████████████████████████▊                                                 | 3605/10000 [00:53<01:34, 67.97it/s]

Epoch 3599/10000 | Last 100 Reward 2018.0 | Last 100 Cheese 1958.5| W/D/L 2054/501/1045 | 100 W/D/L 59/10/31 | 100 Steps 67.89


 37%|████████████████████████████▌                                                | 3710/10000 [00:55<01:27, 71.72it/s]

Epoch 3699/10000 | Last 100 Reward 2028.0 | Last 100 Cheese 1976.0| W/D/L 2112/515/1073 | 100 W/D/L 58/14/28 | 100 Steps 67.32


 38%|█████████████████████████████▎                                               | 3808/10000 [00:56<01:33, 66.19it/s]

Epoch 3799/10000 | Last 100 Reward 1996.0 | Last 100 Cheese 1954.0| W/D/L 2169/528/1103 | 100 W/D/L 57/13/30 | 100 Steps 66.14


 39%|██████████████████████████████                                               | 3905/10000 [00:58<01:46, 57.16it/s]

Epoch 3899/10000 | Last 100 Reward 2032.0 | Last 100 Cheese 1982.0| W/D/L 2230/541/1129 | 100 W/D/L 61/13/26 | 100 Steps 66.88


 40%|██████████████████████████████▊                                              | 4004/10000 [00:59<01:37, 61.27it/s]

Epoch 3999/10000 | Last 100 Reward 2017.0 | Last 100 Cheese 1964.5| W/D/L 2286/554/1160 | 100 W/D/L 56/13/31 | 100 Steps 69.34


 41%|███████████████████████████████▋                                             | 4109/10000 [01:01<01:31, 64.61it/s]

Epoch 4099/10000 | Last 100 Reward 1967.0 | Last 100 Cheese 1935.5| W/D/L 2332/569/1199 | 100 W/D/L 46/15/39 | 100 Steps 67.94


 42%|████████████████████████████████▍                                            | 4209/10000 [01:03<01:25, 67.51it/s]

Epoch 4199/10000 | Last 100 Reward 2044.0 | Last 100 Cheese 1974.5| W/D/L 2390/583/1227 | 100 W/D/L 58/14/28 | 100 Steps 67.84


 43%|█████████████████████████████████▏                                           | 4307/10000 [01:04<01:34, 60.16it/s]

Epoch 4299/10000 | Last 100 Reward 1989.0 | Last 100 Cheese 1955.0| W/D/L 2445/596/1259 | 100 W/D/L 55/13/32 | 100 Steps 67.46


 44%|█████████████████████████████████▉                                           | 4411/10000 [01:06<01:23, 66.80it/s]

Epoch 4399/10000 | Last 100 Reward 2047.0 | Last 100 Cheese 1982.5| W/D/L 2505/605/1290 | 100 W/D/L 60/9/31 | 100 Steps 68.45


 45%|██████████████████████████████████▋                                          | 4510/10000 [01:08<01:25, 64.44it/s]

Epoch 4499/10000 | Last 100 Reward 2042.0 | Last 100 Cheese 1977.5| W/D/L 2566/615/1319 | 100 W/D/L 61/10/29 | 100 Steps 67.79


 46%|███████████████████████████████████▍                                         | 4607/10000 [01:09<01:19, 67.59it/s]

Epoch 4599/10000 | Last 100 Reward 2009.0 | Last 100 Cheese 1958.0| W/D/L 2618/633/1349 | 100 W/D/L 52/18/30 | 100 Steps 68.57


 47%|████████████████████████████████████▎                                        | 4714/10000 [01:11<01:18, 67.56it/s]

Epoch 4699/10000 | Last 100 Reward 2006.0 | Last 100 Cheese 1955.0| W/D/L 2673/648/1379 | 100 W/D/L 55/15/30 | 100 Steps 65.42


 48%|█████████████████████████████████████                                        | 4811/10000 [01:12<01:23, 62.27it/s]

Epoch 4799/10000 | Last 100 Reward 2051.0 | Last 100 Cheese 1993.0| W/D/L 2731/663/1406 | 100 W/D/L 58/15/27 | 100 Steps 66.37


 49%|█████████████████████████████████████▊                                       | 4910/10000 [01:14<01:16, 66.51it/s]

Epoch 4899/10000 | Last 100 Reward 2052.0 | Last 100 Cheese 1994.0| W/D/L 2791/678/1431 | 100 W/D/L 60/15/25 | 100 Steps 65.73


 50%|██████████████████████████████████████▌                                      | 5009/10000 [01:15<01:14, 67.09it/s]

Epoch 4999/10000 | Last 100 Reward 1978.0 | Last 100 Cheese 1945.5| W/D/L 2845/690/1465 | 100 W/D/L 54/12/34 | 100 Steps 66.62


 51%|███████████████████████████████████████▎                                     | 5110/10000 [01:17<01:14, 66.03it/s]

Epoch 5099/10000 | Last 100 Reward 2055.0 | Last 100 Cheese 1986.5| W/D/L 2901/708/1491 | 100 W/D/L 56/18/26 | 100 Steps 68.56


 52%|████████████████████████████████████████                                     | 5211/10000 [01:18<01:16, 62.38it/s]

Epoch 5199/10000 | Last 100 Reward 2057.0 | Last 100 Cheese 1998.0| W/D/L 2961/722/1517 | 100 W/D/L 60/14/26 | 100 Steps 68.89


 53%|████████████████████████████████████████▊                                    | 5308/10000 [01:20<01:09, 67.65it/s]

Epoch 5299/10000 | Last 100 Reward 2033.0 | Last 100 Cheese 1975.5| W/D/L 3015/738/1547 | 100 W/D/L 54/16/30 | 100 Steps 67.93


 54%|█████████████████████████████████████████▋                                   | 5408/10000 [01:21<01:06, 69.35it/s]

Epoch 5399/10000 | Last 100 Reward 2061.0 | Last 100 Cheese 2000.0| W/D/L 3081/749/1570 | 100 W/D/L 66/11/23 | 100 Steps 67.8


 55%|██████████████████████████████████████████▍                                  | 5509/10000 [01:23<01:07, 66.21it/s]

Epoch 5499/10000 | Last 100 Reward 1986.0 | Last 100 Cheese 1958.0| W/D/L 3124/771/1605 | 100 W/D/L 43/22/35 | 100 Steps 70.99


 56%|███████████████████████████████████████████▏                                 | 5607/10000 [01:24<01:04, 67.75it/s]

Epoch 5599/10000 | Last 100 Reward 1984.0 | Last 100 Cheese 1939.0| W/D/L 3181/780/1639 | 100 W/D/L 57/9/34 | 100 Steps 67.18


 57%|███████████████████████████████████████████▉                                 | 5709/10000 [01:26<01:06, 64.85it/s]

Epoch 5699/10000 | Last 100 Reward 2013.0 | Last 100 Cheese 1970.5| W/D/L 3234/795/1671 | 100 W/D/L 53/15/32 | 100 Steps 68.95


 58%|████████████████████████████████████████████▋                                | 5808/10000 [01:27<01:02, 67.28it/s]

Epoch 5799/10000 | Last 100 Reward 2033.0 | Last 100 Cheese 1984.5| W/D/L 3290/812/1698 | 100 W/D/L 56/17/27 | 100 Steps 68.19


 59%|█████████████████████████████████████████████▍                               | 5906/10000 [01:29<00:57, 70.82it/s]

Epoch 5899/10000 | Last 100 Reward 2017.0 | Last 100 Cheese 1963.0| W/D/L 3342/831/1727 | 100 W/D/L 52/19/29 | 100 Steps 68.67


 60%|██████████████████████████████████████████████▎                              | 6010/10000 [01:30<00:57, 69.54it/s]

Epoch 5999/10000 | Last 100 Reward 2047.0 | Last 100 Cheese 1982.0| W/D/L 3400/846/1754 | 100 W/D/L 58/15/27 | 100 Steps 66.35


 61%|███████████████████████████████████████████████                              | 6109/10000 [01:32<01:00, 63.98it/s]

Epoch 6099/10000 | Last 100 Reward 2082.0 | Last 100 Cheese 1995.0| W/D/L 3463/857/1780 | 100 W/D/L 63/11/26 | 100 Steps 69.68


 62%|███████████████████████████████████████████████▊                             | 6209/10000 [01:33<00:58, 65.26it/s]

Epoch 6199/10000 | Last 100 Reward 2010.0 | Last 100 Cheese 1972.5| W/D/L 3514/875/1811 | 100 W/D/L 51/18/31 | 100 Steps 70.12


 63%|████████████████████████████████████████████████▌                            | 6312/10000 [01:35<00:54, 67.28it/s]

Epoch 6299/10000 | Last 100 Reward 1978.0 | Last 100 Cheese 1953.5| W/D/L 3564/887/1849 | 100 W/D/L 50/12/38 | 100 Steps 66.62


 64%|█████████████████████████████████████████████████▎                           | 6408/10000 [01:36<00:52, 68.36it/s]

Epoch 6399/10000 | Last 100 Reward 2046.0 | Last 100 Cheese 2003.0| W/D/L 3621/903/1876 | 100 W/D/L 57/16/27 | 100 Steps 69.55


 65%|██████████████████████████████████████████████████▏                          | 6513/10000 [01:38<00:51, 67.59it/s]

Epoch 6499/10000 | Last 100 Reward 2028.0 | Last 100 Cheese 1975.5| W/D/L 3679/913/1908 | 100 W/D/L 58/10/32 | 100 Steps 67.37


 66%|██████████████████████████████████████████████████▉                          | 6608/10000 [01:39<00:50, 66.94it/s]

Epoch 6599/10000 | Last 100 Reward 2034.0 | Last 100 Cheese 1975.5| W/D/L 3737/923/1940 | 100 W/D/L 58/10/32 | 100 Steps 68.96


 67%|███████████████████████████████████████████████████▋                         | 6712/10000 [01:41<00:45, 72.93it/s]

Epoch 6699/10000 | Last 100 Reward 1991.0 | Last 100 Cheese 1961.0| W/D/L 3790/942/1968 | 100 W/D/L 53/19/28 | 100 Steps 67.18


 68%|████████████████████████████████████████████████████▍                        | 6807/10000 [01:42<00:49, 64.02it/s]

Epoch 6799/10000 | Last 100 Reward 1994.0 | Last 100 Cheese 1953.0| W/D/L 3843/961/1996 | 100 W/D/L 53/19/28 | 100 Steps 67.11


 69%|█████████████████████████████████████████████████████▏                       | 6904/10000 [01:44<00:47, 64.66it/s]

Epoch 6899/10000 | Last 100 Reward 2022.0 | Last 100 Cheese 1970.5| W/D/L 3903/968/2029 | 100 W/D/L 60/7/33 | 100 Steps 67.88


 70%|█████████████████████████████████████████████████████▉                       | 7006/10000 [01:45<00:46, 65.08it/s]

Epoch 6999/10000 | Last 100 Reward 2004.0 | Last 100 Cheese 1957.0| W/D/L 3956/986/2058 | 100 W/D/L 53/18/29 | 100 Steps 68.38


 71%|██████████████████████████████████████████████████████▊                      | 7113/10000 [01:47<00:39, 72.49it/s]

Epoch 7099/10000 | Last 100 Reward 1984.0 | Last 100 Cheese 1945.5| W/D/L 4011/1000/2089 | 100 W/D/L 55/14/31 | 100 Steps 65.28


 72%|███████████████████████████████████████████████████████▍                     | 7206/10000 [01:48<00:42, 65.37it/s]

Epoch 7199/10000 | Last 100 Reward 2042.0 | Last 100 Cheese 1978.0| W/D/L 4071/1015/2114 | 100 W/D/L 60/15/25 | 100 Steps 68.08


 73%|████████████████████████████████████████████████████████▎                    | 7309/10000 [01:50<00:40, 66.38it/s]

Epoch 7299/10000 | Last 100 Reward 2040.0 | Last 100 Cheese 1993.0| W/D/L 4131/1031/2138 | 100 W/D/L 60/16/24 | 100 Steps 66.56


 74%|█████████████████████████████████████████████████████████                    | 7408/10000 [01:51<00:35, 72.04it/s]

Epoch 7399/10000 | Last 100 Reward 2025.0 | Last 100 Cheese 1966.5| W/D/L 4187/1044/2169 | 100 W/D/L 56/13/31 | 100 Steps 66.12


 75%|█████████████████████████████████████████████████████████▊                   | 7511/10000 [01:53<00:36, 67.95it/s]

Epoch 7499/10000 | Last 100 Reward 2033.0 | Last 100 Cheese 1984.5| W/D/L 4244/1058/2198 | 100 W/D/L 57/14/29 | 100 Steps 67.4


 76%|██████████████████████████████████████████████████████████▌                  | 7612/10000 [01:54<00:34, 69.69it/s]

Epoch 7599/10000 | Last 100 Reward 2046.0 | Last 100 Cheese 1986.0| W/D/L 4308/1069/2223 | 100 W/D/L 64/11/25 | 100 Steps 67.22


 77%|███████████████████████████████████████████████████████████▎                 | 7708/10000 [01:56<00:36, 62.59it/s]

Epoch 7699/10000 | Last 100 Reward 1951.0 | Last 100 Cheese 1925.0| W/D/L 4356/1083/2261 | 100 W/D/L 48/14/38 | 100 Steps 66.43


 78%|████████████████████████████████████████████████████████████▏                | 7810/10000 [01:57<00:33, 66.35it/s]

Epoch 7799/10000 | Last 100 Reward 2018.0 | Last 100 Cheese 1955.0| W/D/L 4413/1095/2292 | 100 W/D/L 57/12/31 | 100 Steps 67.31


 79%|████████████████████████████████████████████████████████████▉                | 7914/10000 [01:59<00:29, 70.36it/s]

Epoch 7899/10000 | Last 100 Reward 2046.0 | Last 100 Cheese 1990.0| W/D/L 4474/1113/2313 | 100 W/D/L 61/18/21 | 100 Steps 65.77


 80%|█████████████████████████████████████████████████████████████▋               | 8010/10000 [02:00<00:30, 65.50it/s]

Epoch 7999/10000 | Last 100 Reward 1966.0 | Last 100 Cheese 1940.5| W/D/L 4526/1127/2347 | 100 W/D/L 52/14/34 | 100 Steps 68.74


 81%|██████████████████████████████████████████████████████████████▍              | 8110/10000 [02:02<00:27, 69.08it/s]

Epoch 8099/10000 | Last 100 Reward 2019.0 | Last 100 Cheese 1965.0| W/D/L 4585/1139/2376 | 100 W/D/L 59/12/29 | 100 Steps 66.44


 82%|███████████████████████████████████████████████████████████████▎             | 8215/10000 [02:03<00:24, 71.72it/s]

Epoch 8199/10000 | Last 100 Reward 2034.0 | Last 100 Cheese 1965.0| W/D/L 4639/1156/2405 | 100 W/D/L 54/17/29 | 100 Steps 68.09


 83%|███████████████████████████████████████████████████████████████▉             | 8310/10000 [02:05<00:24, 68.06it/s]

Epoch 8299/10000 | Last 100 Reward 2017.0 | Last 100 Cheese 1951.0| W/D/L 4698/1164/2438 | 100 W/D/L 59/8/33 | 100 Steps 67.06


 84%|████████████████████████████████████████████████████████████████▋            | 8409/10000 [02:06<00:24, 66.27it/s]

Epoch 8399/10000 | Last 100 Reward 2049.0 | Last 100 Cheese 1996.0| W/D/L 4753/1188/2459 | 100 W/D/L 55/24/21 | 100 Steps 69.64


 85%|█████████████████████████████████████████████████████████████████▌           | 8511/10000 [02:08<00:22, 65.64it/s]

Epoch 8499/10000 | Last 100 Reward 2043.0 | Last 100 Cheese 1981.0| W/D/L 4815/1201/2484 | 100 W/D/L 62/13/25 | 100 Steps 66.36


 86%|██████████████████████████████████████████████████████████████████▎          | 8609/10000 [02:09<00:20, 67.65it/s]

Epoch 8599/10000 | Last 100 Reward 2068.0 | Last 100 Cheese 1994.0| W/D/L 4878/1214/2508 | 100 W/D/L 63/13/24 | 100 Steps 66.89


 87%|███████████████████████████████████████████████████████████████████          | 8706/10000 [02:10<00:18, 69.48it/s]

Epoch 8699/10000 | Last 100 Reward 2005.0 | Last 100 Cheese 1955.5| W/D/L 4931/1229/2540 | 100 W/D/L 53/15/32 | 100 Steps 68.32


 88%|███████████████████████████████████████████████████████████████████▊         | 8809/10000 [02:12<00:17, 68.31it/s]

Epoch 8799/10000 | Last 100 Reward 1996.0 | Last 100 Cheese 1952.0| W/D/L 4981/1244/2575 | 100 W/D/L 50/15/35 | 100 Steps 67.61


 89%|████████████████████████████████████████████████████████████████████▌        | 8905/10000 [02:13<00:17, 63.61it/s]

Epoch 8899/10000 | Last 100 Reward 2036.0 | Last 100 Cheese 1971.5| W/D/L 5037/1264/2599 | 100 W/D/L 56/20/24 | 100 Steps 67.68


 90%|█████████████████████████████████████████████████████████████████████▎       | 9007/10000 [02:15<00:15, 65.91it/s]

Epoch 8999/10000 | Last 100 Reward 2018.0 | Last 100 Cheese 1961.5| W/D/L 5090/1280/2630 | 100 W/D/L 53/16/31 | 100 Steps 68.97


 91%|██████████████████████████████████████████████████████████████████████▏      | 9113/10000 [02:17<00:13, 66.18it/s]

Epoch 9099/10000 | Last 100 Reward 2079.0 | Last 100 Cheese 2006.0| W/D/L 5151/1298/2651 | 100 W/D/L 61/18/21 | 100 Steps 68.18


 92%|██████████████████████████████████████████████████████████████████████▉      | 9213/10000 [02:18<00:12, 65.38it/s]

Epoch 9199/10000 | Last 100 Reward 2024.0 | Last 100 Cheese 1971.0| W/D/L 5204/1315/2681 | 100 W/D/L 53/17/30 | 100 Steps 68.38


 93%|███████████████████████████████████████████████████████████████████████▋     | 9311/10000 [02:20<00:10, 66.70it/s]

Epoch 9299/10000 | Last 100 Reward 2064.0 | Last 100 Cheese 1994.5| W/D/L 5269/1326/2705 | 100 W/D/L 65/11/24 | 100 Steps 68.27


 94%|████████████████████████████████████████████████████████████████████████▍    | 9410/10000 [02:21<00:08, 66.89it/s]

Epoch 9399/10000 | Last 100 Reward 2071.0 | Last 100 Cheese 2005.5| W/D/L 5329/1345/2726 | 100 W/D/L 60/19/21 | 100 Steps 69.2


 95%|█████████████████████████████████████████████████████████████████████████▏   | 9508/10000 [02:22<00:07, 66.95it/s]

Epoch 9499/10000 | Last 100 Reward 2053.0 | Last 100 Cheese 1990.0| W/D/L 5387/1357/2756 | 100 W/D/L 58/12/30 | 100 Steps 67.58


 96%|█████████████████████████████████████████████████████████████████████████▉   | 9609/10000 [02:24<00:05, 65.38it/s]

Epoch 9599/10000 | Last 100 Reward 2061.0 | Last 100 Cheese 2006.0| W/D/L 5446/1376/2778 | 100 W/D/L 59/19/22 | 100 Steps 68.83


 97%|██████████████████████████████████████████████████████████████████████████▊  | 9708/10000 [02:26<00:04, 64.08it/s]

Epoch 9699/10000 | Last 100 Reward 2058.0 | Last 100 Cheese 1983.0| W/D/L 5512/1387/2801 | 100 W/D/L 66/11/23 | 100 Steps 66.28


 98%|███████████████████████████████████████████████████████████████████████████▍ | 9805/10000 [02:27<00:02, 66.13it/s]

Epoch 9799/10000 | Last 100 Reward 2059.0 | Last 100 Cheese 1987.5| W/D/L 5573/1400/2827 | 100 W/D/L 61/13/26 | 100 Steps 67.78


 99%|████████████████████████████████████████████████████████████████████████████▎| 9908/10000 [02:29<00:01, 70.74it/s]

Epoch 9899/10000 | Last 100 Reward 2003.0 | Last 100 Cheese 1966.5| W/D/L 5625/1410/2865 | 100 W/D/L 52/10/38 | 100 Steps 68.21


100%|████████████████████████████████████████████████████████████████████████████| 10000/10000 [02:30<00:00, 66.49it/s]

Epoch 9999/10000 | Last 100 Reward 2050.0 | Last 100 Cheese 1984.0| W/D/L 5686/1421/2893 | 100 W/D/L 61/11/28 | 100 Steps 67.73
Testing done



