# Fisher FIFO

In the `fisher-fifo-v3` notebook, we set a code to use the Fisher information effectively in neural netwrks using the *first-in-first-out* strategy to store gradients.

Here our objective is to assess the effect of the FIFO buffer in the quality of the model. Also, we implement the partitioning strategy to make our algorithm generalizable to larger networks, as well as datasets with larger instances (like images).

To enhance the partition effectiveness, we proceed to use the "maximum-block-update", to make the algorithm faster.

---

in fisher `fisher-fifo-v4` notebook we are trying to implement the algorithm in a more efficient way by using the [torch.bmm](https://pytorch.org/docs/stable/generated/torch.bmm.html). The main idea is to execute the matrix-multiplications of more than one block in an optimized way.

---

in `fisher-fifo-v4.2` notebook, we are trying to make the algorithm even faster using a single centralized object responsible for storing all the matrices (and their inverses) as well as all gradients and buffers. The main idea here is to make the most of vectorization using Pytorch utilities for GPU.

---

in `fisher-fifo-v4.3` notebook, we try to take the algorithm one step further in efficiency. We are implementing the strategy to retrieve the partitions in sets, instead of individually.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import time
import math
import os
import json

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from skopt import gp_minimize

from scipy import stats

In [2]:
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/bayes-opt-results.npz
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/results_step_20.json
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/__results__.html
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/__notebook__.ipynb
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/__output__.json
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/custom.css
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/data/cifar-100-python.tar.gz
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/data/cifar-100-python/meta
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/data/cifar-100-python/file.txt~
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/data/cifar-100-python/test
/kaggle/input/fisher-fifo-v4-3-comparing-cifar100-resnet18/data/cifar-100-python/train


In [3]:
import torch
from torch import nn
from torch.nn import functional as F
from torch.utils.data import TensorDataset, DataLoader

import torchvision
import torchvision.datasets as datasets

In [4]:
def get_device():    
    if torch.cuda.is_available():

        device = torch.device('cuda')
        print( torch.cuda.get_device_name(device) )
        print( torch.cuda.get_device_properties(device) )

    else:
        device = torch.device('cpu')
        print(device)
        
    return device

In [5]:
!pip install torchsummary
import torchsummary

Collecting torchsummary
  Downloading torchsummary-1.5.1-py3-none-any.whl (2.8 kB)
Installing collected packages: torchsummary
Successfully installed torchsummary-1.5.1
[0m

In [6]:
class cfg:
    # n_features = 28 * 28
    img_size = (32, 32)
    img_channels = 3
    n_classes = 100  ## we have 100 classes in CIFAR100
    
    # device = torch.device('cpu')
    device = get_device()
    
    max_loss = 20.0

Tesla P100-PCIE-16GB
_CudaDeviceProperties(name='Tesla P100-PCIE-16GB', major=6, minor=0, total_memory=16280MB, multi_processor_count=56)


# create the dataset

In [7]:
def generate_dataset_mnist(batch_size):
    print(f'generating MNIST data with {cfg.n_classes} classes')
    
    transf_ = torchvision.transforms.Compose([
        # torchvision.transforms.Resize(size=[14, 14]),
        torchvision.transforms.ToTensor()
    ])
    
    mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=transf_)
    mnist_test  = datasets.MNIST(root='./data', train=False, download=True, transform=transf_)
    
    mnist_train_dataloader = DataLoader(dataset=mnist_train, batch_size=batch_size, shuffle=True)
    mnist_test_dataloader  = DataLoader(dataset=mnist_test, batch_size=batch_size, shuffle=False)

    return mnist_train_dataloader, mnist_test_dataloader

In [8]:
def generate_dataset_cifar10(batch_size):
    print(f'generating CIFAR10 data with {cfg.n_classes} classes')
    
    transf_ = torchvision.transforms.Compose([
        # torchvision.transforms.Resize(size=[14, 14]),
        torchvision.transforms.ToTensor()
    ])
    
    cifar10_train = datasets.CIFAR10(root='./data', train=True, download=True, transform=transf_)
    cifar10_test  = datasets.CIFAR10(root='./data', train=False, download=True, transform=transf_)
    
    cifar10_train_dataloader = DataLoader(dataset=cifar10_train, batch_size=batch_size, shuffle=True)
    cifar10_test_dataloader  = DataLoader(dataset=cifar10_test, batch_size=batch_size, shuffle=False)

    return cifar10_train_dataloader, cifar10_test_dataloader

In [9]:
def generate_dataset_cifar100(batch_size):
    print(f'generating CIFAR100 data with {cfg.n_classes} classes')
    
    transf_ = torchvision.transforms.Compose([
        torchvision.transforms.ToTensor()
    ])
    
    cifar100_train = datasets.CIFAR100(root='./data', train=True, download=True, transform=transf_)
    cifar100_test  = datasets.CIFAR100(root='./data', train=False, download=True, transform=transf_)
    
    cifar100_train_dataloader = DataLoader(dataset=cifar100_train, batch_size=batch_size, shuffle=True)
    cifar100_test_dataloader  = DataLoader(dataset=cifar100_test, batch_size=batch_size, shuffle=False)

    return cifar100_train_dataloader, cifar100_test_dataloader

## declaring network architecture

In [10]:
def get_default_network(c=16, device=cfg.device):
    net = nn.Sequential(
        nn.Flatten(),
        nn.Linear(in_features=cfg.n_features, out_features=c),
        nn.ReLU(),
        nn.Linear(in_features=c, out_features=c),
        nn.ReLU(),
        nn.Linear(in_features=c, out_features=c),
        nn.ReLU(),
        nn.Linear(in_features=c, out_features=cfg.n_classes)
    )
    
    torchsummary.summary(net, input_size=[[cfg.n_features]], device='cpu')
    
    return net

In [11]:
def get_cnn_network(in_channels=cfg.img_channels, c=16, p_drop=0.1, device=cfg.device):
    
    img_flat_size = (4 * c * (cfg.img_size[0] // 8) * (cfg.img_size[1] // 8) )
    print(img_flat_size)
    net = nn.Sequential(
        nn.Conv2d(in_channels=3, out_channels=c, kernel_size=5, stride=2, padding=2),
        nn.ReLU(),

        nn.Conv2d(in_channels=c, out_channels=(2 * c), kernel_size=3, padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2),
        nn.Dropout2d(p=p_drop),
        
        nn.Conv2d(in_channels=(2 * c), out_channels=(4 * c), kernel_size=3, padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2),
        nn.Dropout2d(p=p_drop),
        
        nn.Flatten(),
        
        nn.Linear(in_features=img_flat_size, out_features=(8 * c) ),
        nn.ReLU(),
        nn.Dropout(p=p_drop),
        
        nn.Linear(in_features=(8 * c), out_features=(4 * c) ),
        nn.ReLU(),
        nn.Dropout(p=p_drop),
        
        nn.Linear(in_features=(4 * c), out_features=cfg.n_classes)
    )
    
    torchsummary.summary(net, input_size=[[cfg.img_channels, *cfg.img_size]], device='cpu')
    
    return net

In [12]:
def get_cnn_network_v2(in_channels=cfg.img_channels, p_drop=0.1, device=cfg.device):
    
    net = nn.Sequential(
        nn.Conv2d(in_channels=in_channels, out_channels=96, kernel_size=5, padding=2),
        nn.MaxPool2d(kernel_size=2),
        nn.ReLU(),

        nn.Conv2d(in_channels=96, out_channels=80, kernel_size=5, padding=2),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2),
        nn.Dropout2d(p=p_drop),
        
        nn.Conv2d(in_channels=80, out_channels=96, kernel_size=5, padding=2),
        nn.ReLU(),
        nn.Dropout2d(p=p_drop),
        
        nn.Conv2d(in_channels=96, out_channels=64, kernel_size=5, padding=2),
        nn.ReLU(),
        nn.Dropout2d(p=p_drop),
        
        nn.Flatten(),
        
        # nn.Linear(in_features=4096, out_features=256 ),
        nn.Linear(in_features=(cfg.img_size[0] // 4) * (cfg.img_size[1] // 4) * 64, out_features=256 ),
        nn.ReLU(),
        nn.Dropout(p=p_drop),
        
        nn.Linear(in_features=256, out_features=cfg.n_classes)
    )
    
    torchsummary.summary(net, input_size=[[cfg.img_channels, *cfg.img_size]], device='cpu')
    
    return net

In [13]:
def get_resnet18(device=cfg.device):
    
    net = torchvision.models.resnet18(num_classes=cfg.n_classes)
    torchsummary.summary(net, input_size=[[cfg.img_channels, *cfg.img_size]], device='cpu')
    
    return net

# object for calculation of the metrics

In [14]:
class Metrics():
    def __init__(self, value_round=None, time_round=None):
        self.metrics_dict = {}
        self.set_initial_time()
        self.val_round = value_round
        self.time_round = time_round
        
    def set_initial_time(self):
        self.init_time = time.time()
        
    def get_time(self):
        return time.time() - self.init_time
    
    def add(self, key, value, step=None):
        
        if step is None:
            step = np.nan
        
        if key not in self.metrics_dict:
            self.metrics_dict[key] = []
        
        t = self.get_time()
        if self.time_round is not None:
            t = round(t, ndigits=self.time_round)
        
        if self.val_round is not None:
            value = round(value, ndigits=self.val_round)
        
        self.metrics_dict[key].append( (value, step, t) )
    
    def add_(self, dict_, step=None):
        for key, value in dict_.items():
            self.add(key, value, step)
    
    def get(self, key, get_step=False, get_time=False):
        y, x, t = zip(*self.metrics_dict[key])
        y, x, t = list(y), list(x), list(t)
        
        return x, y, t

# Fisher Information calculation objects

In [15]:
class FisherFIFO():
    def __init__(self,
                 named_params,
                 buffer_size,
                 partition_size,
                 block_updates):
        
        self.buffer_size = buffer_size
        self.partition_size = partition_size
        self.block_updates = block_updates
        
        named_params = list(named_params)
        
        self.partition_fisher_list = []
        total_partitions, total_block_upd = 0, 0
        for pi, (n, p) in enumerate( named_params ):
            part_fisher = PartitionerFisherFIFO(param = p,
                                                name = n,
                                                buffer_size = buffer_size,
                                                partition_size = partition_size,
                                                block_updates = block_updates,
                                                parent_fifo = None)
            
            self.partition_fisher_list.append( (p, part_fisher, total_partitions, total_block_upd) )
            
            total_partitions += part_fisher.num_part
            total_block_upd += part_fisher.block_updates
            
        self.num_part = total_partitions
        self.total_block_updates = total_block_upd
        
        print(f'total partitions: {self.num_part} - effective block updates: {self.total_block_updates}')
        
        ## pre-alocate the memory for the tensor that stores the selected gradients (changes every iteration)
        self.g = torch.zeros(size=[self.total_block_updates, partition_size, 1], dtype=torch.float, device=cfg.device)
        
        ## pre-alocate the memory for the tensor that stores the buffer and the tensor for the inverse
        self.buffer = torch.zeros(size=[self.num_part, partition_size, buffer_size], dtype=torch.float, device=cfg.device)
        self.fisher_inv = torch.zeros(size=[self.num_part, partition_size, partition_size], dtype=torch.float, device=cfg.device)        
    
        print('initializing buffers and inverses...')
        ## now we initialize the buffer and the inverse for all partitions
        i = 0
        for _, part_fisher, _, _ in self.partition_fisher_list:
            for _, start, end in part_fisher.ind_fisher_list:

                if i == 0 or ( (i + 1) % 10000 ) == 0 or i == (self.num_part - 1):
                    print(f'partition {i+1}/{self.num_part}')

                n = end - start
                buffer, _, fisher_inv = self.initialize_fisher_partition(param_size=n, buffer_size=self.buffer_size)

                self.buffer[i, :n, :] = buffer
                self.fisher_inv[i, :n, :n] = fisher_inv

                i += 1

    
    def initialize_fisher_partition(self, param_size, buffer_size):
        
        buffer = self.get_initial_buffer_v2(param_size, buffer_size)
            
        ## shuffle buffer across columns
        buffer = buffer[:, torch.randperm(buffer.shape[1]) ]
        
        ## the fisher matrix will be initialized as G @ G.T, in which G is our buffer. We built
        ## our buffer in a smart way so the resulting Fisher info matrix is initialized close to identity
        fisher = buffer @ buffer.T
        
        ## since our Fisher information is diagonal for now, its inverse is given just by the innverted
        ## elements of the diagonal. 
        fisher_inv = torch.diag( 1 / torch.diag(fisher) )
        
        return buffer, fisher, fisher_inv
    
    
    def get_initial_buffer_v2(self, param_size, buffer_size):
        ## here we adopt a faster approach to initialize the buffer. We use n 
        ## identity matrices concatenated column-wise. n is determined by `param_size` and `buffer_size`
        
        n_eye = math.ceil(buffer_size / param_size)
        I = torch.eye(n=param_size, dtype=torch.float, device=cfg.device)
        
        buffer = torch.cat(n_eye * [I], dim=1)[:, :buffer_size]
        
        assert buffer.shape == (param_size, buffer_size)
        
        return buffer


    def get_idx_lists(self):
        run_enc_list = []
        default_idx_list = []
        for p, part_fisher, num_part, block_upd in self.partition_fisher_list:
            init_block, end_block, g_init_idx, g_end_idx = part_fisher.get_random_blocks()
            
            # print(f'param shape: {p.shape} - blocks: {init_block} to {end_block} - grad: {g_init_idx} to {g_end_idx}')
            
            run_enc_list.append( (num_part + init_block, num_part + end_block, g_init_idx, g_end_idx) )
            default_idx_list.append( np.arange(start=num_part + init_block, stop=num_part + end_block + 1) )
            
            
        return run_enc_list, np.concatenate(default_idx_list)
    
    
    def read_gradients(self, idx):
        self_g_start = 0
        for i, (_, _, g_start, g_end) in enumerate(idx):
            n_grad = g_end - g_start
            # self_g_end = min( self_g_start + n_grad, torch.numel(self.g) )
            self_g_end = self_g_start + n_grad
            
            p, _, _, _ = self.partition_fisher_list[i]
            
            self.g.view(-1)[self_g_start:self_g_end] = p.grad.view(-1)[g_start:g_end]
            
            if (n_grad % self.partition_size) > 0:
                extra_zeros = self.partition_size - (n_grad % self.partition_size)
                self.g.view(-1)[self_g_end:(self_g_end + extra_zeros)] = 0.0
            else:
                extra_zeros = 0
            
#             print(f'self_g_start: {self_g_start} - self_g_end: {self_g_end} - self.g.shape: {self.g.view(-1).shape}')
#             print(f'g_start: {g_start} - g_end: {g_end} - p.grad.shape: {p.grad.view(-1).shape}')
#             print(f'n_grad: {n_grad} - part-size: {self.partition_size} - extra-zeros: {extra_zeros}')
#             print()
            
            self_g_start = self_g_end + extra_zeros
            

    def write_gradients(self, idx):
        self_g_start = 0
        for i, (_, _, g_start, g_end) in enumerate(idx):
            n_grad = g_end - g_start
            self_g_end = self_g_start + n_grad
            
            p, _, _, _ = self.partition_fisher_list[i]
            p.grad.view(-1)[g_start:g_end] = self.g.view(-1)[self_g_start:self_g_end]

            if (n_grad % self.partition_size) > 0:
                extra_zeros = self.partition_size - (n_grad % self.partition_size)
            else:
                extra_zeros = 0
            
            self_g_start = self_g_end + extra_zeros

    
    def step(self):
        ## selects the blocks to be updated
        run_enc_idx, default_idx = self.get_idx_lists()
        
        ## read the selected blocks gradients and stores them in self.g
        self.read_gradients(run_enc_idx)
        
        ## get apart the inverses and buffers for the selected blocks
        inv = self.fisher_inv[default_idx, ...]
        buffer = self.buffer[default_idx, ...]
        
        # print(inv.shape, buffer.shape)
        
        ## update the buffer
        
        ## dimensions are: partitions, gradient-size, bufffer-size
        g_old = buffer[:, :, 0:1] 
        
        ## we  update in the third dimension: "buffer-size"
        buffer = torch.cat([buffer[:, :, 1:], self.g], dim=2)
        
        ## update the inverses and modify current gradients
        ## ...
        sqrt_NB = math.sqrt(self.buffer_size)
        
        ## update inverse - phase 1: add new gradient ##
        g_phase1 = self.upd_inverse( (1 / sqrt_NB) * self.g, inv, type_='sum')

        ## update inverse - phase 2: remove old gradient ##
        self.upd_inverse( (1 / sqrt_NB) * g_old, inv, type_='sub')

        ## modify the current gradients
        if False:
            ## use the "phase-1-trick" to get the estimated new gradient
            self.g = g_phase1 * sqrt_NB
        else:
            ## get the modified gradient using "de facto" the new inverses and the gradients
            self.g = self.modify_grad(self.g, inv)
        
        ## return the inverses and buffers to the main tensor
        self.fisher_inv[default_idx, ...] = inv
        self.buffer[default_idx, ...] = buffer
        
        ## return the modified gradients to the parameters
        self.write_gradients(run_enc_idx)


    def upd_inverse(self, g, inverse, type_='sum'):
        ## update the inverse based on the woodbury inversion
        f_inv_g = torch.bmm(inverse, g)

        if type_ == 'sum':
            d = 1 + torch.sum(g * f_inv_g, dim=[1, 2], keepdim=True)
            inverse[:] = inverse - (f_inv_g * torch.transpose(f_inv_g, 1, 2) / d)

        elif type_ == 'sub':
            d = 1 - torch.sum(g * f_inv_g, dim=[1, 2], keepdim=True)
            inverse[:] = inverse + (f_inv_g * torch.transpose(f_inv_g, 1, 2) / d)

        else:
            ## incorrect type
            print('incorrect rank-1 update type: ' + type_)
        
        return f_inv_g


    def modify_grad(self, g, inverse):
        return torch.bmm(inverse, g)

In [16]:
class PartitionerFisherFIFO():
    def __init__(self,
                 param,
                 name,
                 buffer_size,
                 partition_size,
                 block_updates,
                 parent_fifo):
        
        self.param = param
        self.name = name 
        
        if partition_size is None:
            self.partition_size = param.numel()
        else:
            self.partition_size = partition_size
        
        ## calculates the number of partitions required. It is calculated using the param size and
        ## our partition maximum size. The gradient (the same size as param) is going to be partitioned in
        ## equal pieces (except possibly the last one) to be processed individually by our "IndividualFisherFIFO"
        self.param_size = param.numel()
        self.num_part = math.ceil(self.param_size / self.partition_size)
        
        ## the number of blocks (partitions) to update at each iteration. This can be < num_part to make
        ## the algorithm more efficient. (we dont update every partition at every iteration)
        if block_updates is None:
            self.block_updates = self.num_part
        else:
            self.block_updates = min(block_updates, self.num_part)
        
        print(f'FisherPartitioner: param: {self.param_size} - partition: {self.partition_size} - nº part: {self.num_part} - block updates: {self.block_updates}')
                
        ## the list stores the indexes used to partition the gradient
        self.ind_fisher_list = []
        for i in range(self.num_part):
            start = i * self.partition_size
            end = min(start + self.partition_size, self.param_size)
            
            self.ind_fisher_list.append( (i, start, end) )
        
    
    def get_random_blocks(self, num_part=None, block_upd=None):
        
        if num_part is None:
            num_part = self.num_part
        
        if block_upd is None:
            block_upd = self.block_updates
        
        ## choose the initial block randomly
        init_block = np.random.choice(num_part - block_upd + 1)
        
        ## the final block will be necessarily `block_upd` blocks further. This means we select
        ## a contiguous sequence of blocks. This is going to be used for performance reasons
        end_block = init_block + block_upd - 1
        
        ## therefore, the starting and ending index to be used to fetch the gradient positions for the
        ## blocks will be the starting index for the first block and the ending positions for the last block
        _, g_init_idx, _ = self.ind_fisher_list[init_block]
        _, _, g_end_idx = self.ind_fisher_list[end_block]
        
        return init_block, end_block, g_init_idx, g_end_idx

---

# utils function for training

In [17]:
def accuracy_score_tns(y_true, y_pred):
    return torch.mean( (y_true == y_pred).to(dtype=torch.float) ).cpu().item()

In [18]:
def train_iteration(x, y, net, optim, loss, fisher=None):
    net.train()
    net.zero_grad()
    
    y_pred = net(x)
    l = loss(y_pred, y)
    
    l.backward()
    
    if fisher is not None:
        fisher.step()
    
    optim.step()
    
    return l.item(), accuracy_score_tns( y.view(-1), y_pred.argmax(dim=1).view(-1) )

In [19]:
def evaluate(net, dataloader, loss):
    net.eval()
    
    with torch.no_grad():

        loss_list = []
        y_pred_list = []
        y_label_list = []
        for x, y in dataloader:
            
            x = x.to(cfg.device)
            y = y.to(cfg.device)

            y_pred = net(x)
            l = loss(y_pred, y)

            loss_list.append( l.cpu().item() )
            y_pred_list.append( y_pred.argmax(dim=1).view(-1) )
            y_label_list.append( y.view(-1) )

        y_pred_list = torch.cat(y_pred_list).view(-1)
        y_label_list = torch.cat(y_label_list).view(-1)

    return np.mean(loss_list), accuracy_score_tns(y_label_list, y_pred_list)

# training

In [20]:
def train_network_fisher_optimization(batch_size = 32,
                                      lr = 1e-3,
                                      momentum = 0.9,
                                      epochs = 30,
                                      buffer_size = 1000,
                                      partition_size = 256,
                                      block_updates = 4,
                                      net_params = {'c':16, 'p':0.1},
                                      apply_fisher = True,
                                      # gpu_memory_check = 20,
                                      time_limit_secs = 600,
                                      interval_print = 100):

    ## declare (instantiate) the dataset
    # train_dataloader, test_dataloader = generate_dataset_cifar10(batch_size = batch_size)
    # train_dataloader, test_dataloader = generate_dataset_mnist(batch_size = batch_size)
    train_dataloader, test_dataloader = generate_dataset_cifar100(batch_size = batch_size)

    ## instantiate the network
    # net = get_cnn_network_v2(p_drop = net_params['p']).to(device=cfg.device)
    net = get_resnet18().to(device=cfg.device)
    
    if apply_fisher:
        ## instantiate FisherFIFO object to create and update the Fisher info matrix
        fisher_fifo = FisherFIFO(named_params = net.named_parameters(),
                                 buffer_size = buffer_size,
                                 partition_size = partition_size,
                                 block_updates = block_updates)
    else:
        fisher_fifo = None

    ## create loss object: we multiply by our constant to stabilize norms
    # cross_entropy = nn.CrossEntropyLoss(reduction='mean') # standard version
    cross_entropy_standard = nn.CrossEntropyLoss(reduction='mean')
    cross_entropy = lambda y_pred, y: math.sqrt(batch_size) * cross_entropy_standard(y_pred, y)
    
    ## create optimize objects
    optim = torch.optim.SGD(params=net.parameters(), lr=lr, momentum=momentum)

    default_metrics = Metrics(value_round=3, time_round=2)

    ini_time = time.time()

    step = 0
    training_finished = False
    for epc in range(1, epochs + 1):
        
        if training_finished:
            break
        
        print(f'starting epoch: {epc}/{epochs}')

        for nbt, (x, y) in enumerate(train_dataloader):

            if training_finished:
                break

            x = x.to(cfg.device)
            y = y.to(cfg.device)

            train_loss, train_acc = train_iteration(x, y, net, optim, cross_entropy, fisher_fifo)
            default_metrics.add_({'train-loss': train_loss, 'train-acc': train_acc}, step=step)
            
            ## check time limit
            t = int(time.time() - ini_time)
            if t > time_limit_secs:
                print('time is up! finishing training')
                training_finished = True

            if ( (nbt + 1) % interval_print ) == 0 or (nbt + 1) == len(train_dataloader) or training_finished:
                avg_train_loss = np.mean( default_metrics.get('train-loss')[1][-interval_print:] )
                avg_train_acc = np.mean( default_metrics.get('train-acc')[1][-interval_print:] )
                
                test_loss, test_acc = evaluate(net, test_dataloader, cross_entropy)
                default_metrics.add_({'test-loss': test_loss, 'test-acc': test_acc}, step=step)

                m, s = t // 60, t % 60

                print(f'batch: {nbt + 1}/{len(train_dataloader)}', end='')
                print(f' - train loss: {avg_train_loss:.4f} - test loss: {test_loss:.4f}', end='')
                print(f' - train acc: {avg_train_acc:.4f} - test acc: {test_acc:.4f}', end='')
                print(f' - {m}m {s}s')
                
            step += 1

        ## check for GPU memory consumption
        if torch.cuda.is_available():
            mem_alloc_gb = torch.cuda.memory_allocated(cfg.device) / 1024**3
            mem_res_gb = torch.cuda.memory_reserved(cfg.device) / 1024**3
            max_mem_alloc_gb = torch.cuda.max_memory_allocated(cfg.device) / 1024**3
            max_mem_res_gb = torch.cuda.max_memory_reserved(cfg.device) / 1024**3

            print(f'GPU memory used: {mem_alloc_gb:.2f} GB - max: {max_mem_alloc_gb:.2f} GB - memory reserved: {mem_res_gb:.2f} GB - max: {max_mem_res_gb:.2f} GB')

            # torch.cuda.empty_cache()

    return default_metrics, fisher_fifo

In [21]:
# train_network_fisher_optimization(apply_fisher = True,
#                                    buffer_size = 32,
#                                    partition_size = 16,
#                                    block_updates = 100,
#                                    net_params = {'p': 0.1},
#                                    epochs = 10,
#                                    time_limit_secs = 15 * 60)

In [22]:
# net = torchvision.models.resnet18()
# torchsummary.summary(net, input_size=[[3, 64, 64]], device='cpu')

In [23]:
last_step_saved = None

def results_list_to_json(results_list, out_dir='/kaggle/working', step=0):
    global last_step_saved

    json_results = []

    for metrics, bs, ps, bu in results_list:
        json_results.append({
            'buffer-size': bs,
            'partition-size': ps,
            'blocks-updates': bu,
            'metrics': metrics.metrics_dict
        })

    with open( os.path.join(out_dir, f'results_step_{step}.json'), 'w' ) as fp:
        json.dump(json_results, fp)
    
    if last_step_saved is not None:
        old_file = os.path.join(out_dir, f'results_step_{last_step_saved}.json')
        if os.path.exists(old_file):
            os.remove(old_file)
    
    last_step_saved = step

In [24]:
def get_min_test_loss(metrics):
    _, test_loss, _ = metrics.get('test-loss')
    return min(test_loss)

---

In [25]:
nruns = 20

In [26]:
results_list = []
step_i = 0

buffer_size = 140
partition_size = 30
block_updates = 360

for _ in range(nruns):

    if torch.cuda.is_available():
        torch.cuda.empty_cache()

    print(f'testing - buf size: {buffer_size} - part size: {partition_size} - block upd: {block_updates} - combination nº: {step_i + 1}')

    default_metrics, _ = train_network_fisher_optimization(apply_fisher = True,
                                                           buffer_size = buffer_size,
                                                           partition_size = partition_size,
                                                           block_updates = block_updates,
                                                           net_params = {'p': 0.1},
                                                           epochs = 100,
                                                           time_limit_secs = 1200)

    results_list.append( (default_metrics, buffer_size, partition_size, block_updates) )
    results_list_to_json(results_list, step=step_i)
    step_i += 1
    
    print()

testing - buf size: 140 - part size: 30 - block upd: 360 - combination nº: 1
generating CIFAR100 data with 100 classes
Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ./data/cifar-100-python.tar.gz


  0%|          | 0/169001437 [00:00<?, ?it/s]

Extracting ./data/cifar-100-python.tar.gz to ./data
Files already downloaded and verified
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 16, 16]           9,408
       BatchNorm2d-2           [-1, 64, 16, 16]             128
              ReLU-3           [-1, 64, 16, 16]               0
         MaxPool2d-4             [-1, 64, 8, 8]               0
            Conv2d-5             [-1, 64, 8, 8]          36,864
       BatchNorm2d-6             [-1, 64, 8, 8]             128
              ReLU-7             [-1, 64, 8, 8]               0
            Conv2d-8             [-1, 64, 8, 8]          36,864
       BatchNorm2d-9             [-1, 64, 8, 8]             128
             ReLU-10             [-1, 64, 8, 8]               0
       BasicBlock-11             [-1, 64, 8, 8]               0
           Conv2d-12             [-1, 64, 8, 8]          36,864
      BatchNo

initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/374302
partition 250000/374302
partition 260000/374302
partition 270000/374302
partition 280000/374302
partition 290000/374302
partition 300000/374302
partition 310000/374302
partition 320000/374302
partition 330000/374302
partition 340000/374302
partition 350000/374302
partition 360000/374302
partition 370000/374302
partition 374302/374302
starting epoch: 1/100
batch: 100/1563 - t

batch: 100/1563 - train loss: 11.5483 - test loss: 14.4987 - train acc: 0.4485 - test acc: 0.3596 - 5m 40s
batch: 200/1563 - train loss: 11.8536 - test loss: 14.2795 - train acc: 0.4372 - test acc: 0.3688 - 5m 45s
batch: 300/1563 - train loss: 11.9146 - test loss: 14.1461 - train acc: 0.4300 - test acc: 0.3701 - 5m 51s
batch: 400/1563 - train loss: 11.8092 - test loss: 14.7023 - train acc: 0.4435 - test acc: 0.3617 - 5m 56s
batch: 500/1563 - train loss: 12.1262 - test loss: 14.2434 - train acc: 0.4322 - test acc: 0.3691 - 6m 1s
batch: 600/1563 - train loss: 11.9621 - test loss: 14.6337 - train acc: 0.4425 - test acc: 0.3552 - 6m 6s
batch: 700/1563 - train loss: 12.0691 - test loss: 14.9162 - train acc: 0.4222 - test acc: 0.3575 - 6m 12s
batch: 800/1563 - train loss: 11.9389 - test loss: 14.6133 - train acc: 0.4338 - test acc: 0.3497 - 6m 17s
batch: 900/1563 - train loss: 11.9853 - test loss: 14.9499 - train acc: 0.4400 - test acc: 0.3540 - 6m 22s
batch: 1000/1563 - train loss: 12.0297 

batch: 1000/1563 - train loss: 7.6567 - test loss: 14.6821 - train acc: 0.6153 - test acc: 0.4044 - 12m 0s
batch: 1100/1563 - train loss: 8.4627 - test loss: 14.2077 - train acc: 0.5637 - test acc: 0.4050 - 12m 5s
batch: 1200/1563 - train loss: 8.0714 - test loss: 14.1906 - train acc: 0.5872 - test acc: 0.4076 - 12m 10s
batch: 1300/1563 - train loss: 8.0824 - test loss: 13.7809 - train acc: 0.5875 - test acc: 0.4172 - 12m 15s
batch: 1400/1563 - train loss: 8.1665 - test loss: 14.1929 - train acc: 0.5993 - test acc: 0.4037 - 12m 20s
batch: 1500/1563 - train loss: 8.2131 - test loss: 13.8615 - train acc: 0.5878 - test acc: 0.4126 - 12m 26s
batch: 1563/1563 - train loss: 8.3318 - test loss: 13.5141 - train acc: 0.5787 - test acc: 0.4226 - 12m 31s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 10/100
batch: 100/1563 - train loss: 5.7367 - test loss: 14.0895 - train acc: 0.7066 - test acc: 0.4238 - 12m 36s
batch: 200/1563 - train loss: 5.61

batch: 200/1563 - train loss: 3.1088 - test loss: 16.4597 - train acc: 0.8219 - test acc: 0.4348 - 18m 14s
batch: 300/1563 - train loss: 2.9350 - test loss: 17.0242 - train acc: 0.8403 - test acc: 0.4290 - 18m 20s
batch: 400/1563 - train loss: 3.2403 - test loss: 17.1984 - train acc: 0.8141 - test acc: 0.4152 - 18m 25s
batch: 500/1563 - train loss: 3.5648 - test loss: 17.1037 - train acc: 0.7939 - test acc: 0.4243 - 18m 30s
batch: 600/1563 - train loss: 3.3808 - test loss: 16.9939 - train acc: 0.8071 - test acc: 0.4241 - 18m 35s
batch: 700/1563 - train loss: 3.7905 - test loss: 18.0141 - train acc: 0.7850 - test acc: 0.4113 - 18m 40s
batch: 800/1563 - train loss: 3.7821 - test loss: 17.2948 - train acc: 0.7828 - test acc: 0.4143 - 18m 46s
batch: 900/1563 - train loss: 3.9447 - test loss: 16.6679 - train acc: 0.7796 - test acc: 0.4263 - 18m 51s
batch: 1000/1563 - train loss: 4.0294 - test loss: 17.3205 - train acc: 0.7808 - test acc: 0.4121 - 18m 56s
batch: 1100/1563 - train loss: 4.038

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 2359296 - partition: 30 - nº part: 78644 - block updates: 360
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140

batch: 1200/1563 - train loss: 12.8159 - test loss: 14.3271 - train acc: 0.4053 - test acc: 0.3576 - 5m 6s
batch: 1300/1563 - train loss: 13.1249 - test loss: 14.9015 - train acc: 0.4031 - test acc: 0.3431 - 5m 11s
batch: 1400/1563 - train loss: 13.1138 - test loss: 14.0316 - train acc: 0.3909 - test acc: 0.3696 - 5m 16s
batch: 1500/1563 - train loss: 12.8874 - test loss: 15.3673 - train acc: 0.4015 - test acc: 0.3322 - 5m 22s
batch: 1563/1563 - train loss: 12.8539 - test loss: 14.0765 - train acc: 0.4085 - test acc: 0.3677 - 5m 26s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.5686 - test loss: 14.4657 - train acc: 0.4519 - test acc: 0.3585 - 5m 31s
batch: 200/1563 - train loss: 11.6238 - test loss: 14.4339 - train acc: 0.4416 - test acc: 0.3703 - 5m 37s
batch: 300/1563 - train loss: 11.9325 - test loss: 14.7248 - train acc: 0.4335 - test acc: 0.3489 - 5m 42s
batch: 400/1563 - train loss: 11.828

batch: 400/1563 - train loss: 7.4765 - test loss: 14.4263 - train acc: 0.6193 - test acc: 0.4071 - 11m 18s
batch: 500/1563 - train loss: 7.6146 - test loss: 15.6537 - train acc: 0.6106 - test acc: 0.3793 - 11m 23s
batch: 600/1563 - train loss: 7.6228 - test loss: 14.8979 - train acc: 0.6141 - test acc: 0.3871 - 11m 28s
batch: 700/1563 - train loss: 7.6032 - test loss: 14.4621 - train acc: 0.6087 - test acc: 0.4059 - 11m 33s
batch: 800/1563 - train loss: 7.7442 - test loss: 13.9600 - train acc: 0.6054 - test acc: 0.4188 - 11m 38s
batch: 900/1563 - train loss: 8.1646 - test loss: 14.2549 - train acc: 0.5834 - test acc: 0.4128 - 11m 44s
batch: 1000/1563 - train loss: 7.9430 - test loss: 14.2221 - train acc: 0.5991 - test acc: 0.4169 - 11m 49s
batch: 1100/1563 - train loss: 7.8539 - test loss: 13.6851 - train acc: 0.6022 - test acc: 0.4365 - 11m 54s
batch: 1200/1563 - train loss: 8.0178 - test loss: 14.1875 - train acc: 0.5941 - test acc: 0.4194 - 11m 59s
batch: 1300/1563 - train loss: 8.2

batch: 1300/1563 - train loss: 4.7614 - test loss: 15.6227 - train acc: 0.7428 - test acc: 0.4354 - 17m 37s
batch: 1400/1563 - train loss: 4.9992 - test loss: 16.2024 - train acc: 0.7328 - test acc: 0.4238 - 17m 42s
batch: 1500/1563 - train loss: 5.5672 - test loss: 15.5072 - train acc: 0.7022 - test acc: 0.4236 - 17m 47s
batch: 1563/1563 - train loss: 5.3881 - test loss: 15.8968 - train acc: 0.7066 - test acc: 0.4230 - 17m 52s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 3.1511 - test loss: 15.6233 - train acc: 0.8269 - test acc: 0.4356 - 17m 57s
batch: 200/1563 - train loss: 3.1632 - test loss: 15.9125 - train acc: 0.8274 - test acc: 0.4404 - 18m 2s
batch: 300/1563 - train loss: 3.1765 - test loss: 15.9972 - train acc: 0.8178 - test acc: 0.4422 - 18m 8s
batch: 400/1563 - train loss: 3.3163 - test loss: 16.6209 - train acc: 0.8143 - test acc: 0.4276 - 18m 13s
batch: 500/1563 - train loss: 3.4742 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 12.9198 - test loss: 14.8790 - train acc: 0.3978 - test acc: 0.3379 - 5m 17s
batch: 1500/1563 - train loss: 13.1352 - test loss: 13.8284 - train acc: 0.3916 - test acc: 0.3778 - 5m 22s
batch: 1563/1563 - train loss: 13.1290 - test loss: 13.9783 - train acc: 0.3947 - test acc: 0.3762 - 5m 26s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.2693 - test loss: 14.3162 - train acc: 0.4578 - test acc: 0.3651 - 5m 32s
batch: 200/1563 - train loss: 11.4261 - test loss: 15.7709 - train acc: 0.4472 - test acc: 0.3290 - 5m 37s
batch: 300/1563 - train loss: 11.3631 - test loss: 14.2277 - train acc: 0.4557 - test acc: 0.3694 - 5m 42s
batch: 400/1563 - train loss: 11.5982 - test loss: 13.7966 - train acc: 0.4478 - test acc: 0.3858 - 5m 47s
batch: 500/1563 - train loss: 11.5016 - test loss: 13.8420 - train acc: 0.4462 - test acc: 0.3856 - 5m 52s
batch: 600/1563 - train loss: 11.7196

batch: 600/1563 - train loss: 7.5932 - test loss: 15.7091 - train acc: 0.6106 - test acc: 0.3848 - 11m 27s
batch: 700/1563 - train loss: 7.3360 - test loss: 14.6131 - train acc: 0.6260 - test acc: 0.4059 - 11m 32s
batch: 800/1563 - train loss: 7.5507 - test loss: 14.1568 - train acc: 0.6066 - test acc: 0.4146 - 11m 38s
batch: 900/1563 - train loss: 7.9049 - test loss: 14.9308 - train acc: 0.6041 - test acc: 0.3929 - 11m 43s
batch: 1000/1563 - train loss: 7.7675 - test loss: 13.6657 - train acc: 0.6040 - test acc: 0.4226 - 11m 48s
batch: 1100/1563 - train loss: 7.8597 - test loss: 13.7885 - train acc: 0.6150 - test acc: 0.4188 - 11m 53s
batch: 1200/1563 - train loss: 7.9882 - test loss: 14.3661 - train acc: 0.6028 - test acc: 0.4079 - 11m 58s
batch: 1300/1563 - train loss: 7.9614 - test loss: 13.7124 - train acc: 0.6078 - test acc: 0.4207 - 12m 4s
batch: 1400/1563 - train loss: 7.7877 - test loss: 14.1831 - train acc: 0.6066 - test acc: 0.4138 - 12m 9s
batch: 1500/1563 - train loss: 7.9

batch: 1500/1563 - train loss: 5.1090 - test loss: 16.1710 - train acc: 0.7216 - test acc: 0.4145 - 17m 45s
batch: 1563/1563 - train loss: 5.0075 - test loss: 16.6111 - train acc: 0.7329 - test acc: 0.4196 - 17m 50s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 3.0732 - test loss: 16.7309 - train acc: 0.8237 - test acc: 0.4255 - 17m 55s
batch: 200/1563 - train loss: 3.0215 - test loss: 16.8017 - train acc: 0.8328 - test acc: 0.4258 - 18m 0s
batch: 300/1563 - train loss: 3.0839 - test loss: 16.3775 - train acc: 0.8309 - test acc: 0.4308 - 18m 5s
batch: 400/1563 - train loss: 3.2378 - test loss: 17.8164 - train acc: 0.8155 - test acc: 0.4005 - 18m 11s
batch: 500/1563 - train loss: 3.1539 - test loss: 16.5045 - train acc: 0.8240 - test acc: 0.4301 - 18m 16s
batch: 600/1563 - train loss: 3.3603 - test loss: 16.8174 - train acc: 0.8137 - test acc: 0.4235 - 18m 21s
batch: 700/1563 - train loss: 3.4051 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.0808 - test loss: 13.8530 - train acc: 0.3863 - test acc: 0.3754 - 5m 17s
batch: 1500/1563 - train loss: 13.1445 - test loss: 14.7994 - train acc: 0.3925 - test acc: 0.3438 - 5m 22s
batch: 1563/1563 - train loss: 13.0196 - test loss: 14.4316 - train acc: 0.4010 - test acc: 0.3610 - 5m 27s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.3106 - test loss: 13.9998 - train acc: 0.4500 - test acc: 0.3816 - 5m 32s
batch: 200/1563 - train loss: 11.5535 - test loss: 13.9805 - train acc: 0.4481 - test acc: 0.3761 - 5m 37s
batch: 300/1563 - train loss: 11.9335 - test loss: 14.4794 - train acc: 0.4488 - test acc: 0.3610 - 5m 42s
batch: 400/1563 - train loss: 12.1250 - test loss: 15.6124 - train acc: 0.4303 - test acc: 0.3337 - 5m 48s
batch: 500/1563 - train loss: 11.8908 - test loss: 14.4657 - train acc: 0.4476 - test acc: 0.3641 - 5m 53s
batch: 600/1563 - train loss: 11.7068

batch: 600/1563 - train loss: 7.6065 - test loss: 14.1388 - train acc: 0.6190 - test acc: 0.4104 - 11m 28s
batch: 700/1563 - train loss: 7.9952 - test loss: 13.8166 - train acc: 0.5900 - test acc: 0.4166 - 11m 33s
batch: 800/1563 - train loss: 7.9359 - test loss: 15.3795 - train acc: 0.5982 - test acc: 0.3845 - 11m 38s
batch: 900/1563 - train loss: 7.9068 - test loss: 14.6748 - train acc: 0.6044 - test acc: 0.3976 - 11m 43s
batch: 1000/1563 - train loss: 7.8859 - test loss: 13.7755 - train acc: 0.6043 - test acc: 0.4178 - 11m 48s
batch: 1100/1563 - train loss: 8.1556 - test loss: 13.5809 - train acc: 0.5938 - test acc: 0.4218 - 11m 54s
batch: 1200/1563 - train loss: 8.3275 - test loss: 14.8515 - train acc: 0.5787 - test acc: 0.3812 - 11m 59s
batch: 1300/1563 - train loss: 8.2885 - test loss: 14.0795 - train acc: 0.5785 - test acc: 0.4095 - 12m 4s
batch: 1400/1563 - train loss: 8.3034 - test loss: 13.9393 - train acc: 0.5781 - test acc: 0.4200 - 12m 9s
batch: 1500/1563 - train loss: 8.3

batch: 1500/1563 - train loss: 4.7647 - test loss: 15.9613 - train acc: 0.7457 - test acc: 0.4196 - 17m 46s
batch: 1563/1563 - train loss: 5.0505 - test loss: 16.3047 - train acc: 0.7322 - test acc: 0.4164 - 17m 50s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 3.2857 - test loss: 15.9788 - train acc: 0.8171 - test acc: 0.4292 - 17m 55s
batch: 200/1563 - train loss: 2.8709 - test loss: 16.0936 - train acc: 0.8343 - test acc: 0.4343 - 18m 0s
batch: 300/1563 - train loss: 3.1815 - test loss: 16.9319 - train acc: 0.8274 - test acc: 0.4182 - 18m 6s
batch: 400/1563 - train loss: 3.4779 - test loss: 16.7319 - train acc: 0.8103 - test acc: 0.4254 - 18m 11s
batch: 500/1563 - train loss: 3.5223 - test loss: 16.5564 - train acc: 0.8022 - test acc: 0.4313 - 18m 16s
batch: 600/1563 - train loss: 3.7369 - test loss: 17.6124 - train acc: 0.7859 - test acc: 0.4075 - 18m 21s
batch: 700/1563 - train loss: 3.9966 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 2359296 - partition: 30 - nº part: 78644 - block updates: 360
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140

batch: 1200/1563 - train loss: 12.9872 - test loss: 14.3991 - train acc: 0.3957 - test acc: 0.3509 - 5m 11s
batch: 1300/1563 - train loss: 12.9784 - test loss: 14.5948 - train acc: 0.3897 - test acc: 0.3469 - 5m 16s
batch: 1400/1563 - train loss: 13.2389 - test loss: 13.9166 - train acc: 0.3922 - test acc: 0.3715 - 5m 22s
batch: 1500/1563 - train loss: 12.9260 - test loss: 14.7977 - train acc: 0.3950 - test acc: 0.3486 - 5m 27s
batch: 1563/1563 - train loss: 12.8267 - test loss: 14.0835 - train acc: 0.3953 - test acc: 0.3665 - 5m 31s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.1011 - test loss: 13.4306 - train acc: 0.4713 - test acc: 0.3963 - 5m 36s
batch: 200/1563 - train loss: 11.5652 - test loss: 13.7528 - train acc: 0.4503 - test acc: 0.3847 - 5m 42s
batch: 300/1563 - train loss: 11.7894 - test loss: 15.2816 - train acc: 0.4391 - test acc: 0.3456 - 5m 47s
batch: 400/1563 - train loss: 11.32

batch: 400/1563 - train loss: 7.0789 - test loss: 14.8660 - train acc: 0.6266 - test acc: 0.4028 - 11m 30s
batch: 500/1563 - train loss: 7.0235 - test loss: 14.3760 - train acc: 0.6460 - test acc: 0.4084 - 11m 36s
batch: 600/1563 - train loss: 7.0557 - test loss: 14.7094 - train acc: 0.6329 - test acc: 0.4136 - 11m 41s
batch: 700/1563 - train loss: 7.3614 - test loss: 14.2556 - train acc: 0.6235 - test acc: 0.4153 - 11m 46s
batch: 800/1563 - train loss: 7.6283 - test loss: 14.0398 - train acc: 0.6141 - test acc: 0.4197 - 11m 52s
batch: 900/1563 - train loss: 7.7456 - test loss: 14.9997 - train acc: 0.6050 - test acc: 0.3893 - 11m 57s
batch: 1000/1563 - train loss: 7.6848 - test loss: 14.2100 - train acc: 0.6153 - test acc: 0.4216 - 12m 2s
batch: 1100/1563 - train loss: 8.0996 - test loss: 14.2839 - train acc: 0.5871 - test acc: 0.4076 - 12m 8s
batch: 1200/1563 - train loss: 7.6084 - test loss: 14.6899 - train acc: 0.6131 - test acc: 0.4019 - 12m 13s
batch: 1300/1563 - train loss: 7.899

batch: 1300/1563 - train loss: 4.5941 - test loss: 15.9329 - train acc: 0.7453 - test acc: 0.4262 - 17m 55s
batch: 1400/1563 - train loss: 4.5010 - test loss: 16.5374 - train acc: 0.7543 - test acc: 0.4185 - 18m 0s
batch: 1500/1563 - train loss: 5.0126 - test loss: 15.6656 - train acc: 0.7310 - test acc: 0.4233 - 18m 6s
batch: 1563/1563 - train loss: 4.8839 - test loss: 16.2688 - train acc: 0.7331 - test acc: 0.4199 - 18m 10s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.9823 - test loss: 15.9853 - train acc: 0.8346 - test acc: 0.4333 - 18m 15s
batch: 200/1563 - train loss: 2.8312 - test loss: 16.6880 - train acc: 0.8437 - test acc: 0.4188 - 18m 21s
batch: 300/1563 - train loss: 2.8277 - test loss: 16.7625 - train acc: 0.8440 - test acc: 0.4305 - 18m 26s
batch: 400/1563 - train loss: 3.0835 - test loss: 17.3267 - train acc: 0.8325 - test acc: 0.4173 - 18m 31s
batch: 500/1563 - train loss: 3.3255 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.1349 - test loss: 13.9475 - train acc: 0.3940 - test acc: 0.3704 - 5m 22s
batch: 1500/1563 - train loss: 13.0985 - test loss: 14.0940 - train acc: 0.4019 - test acc: 0.3579 - 5m 27s
batch: 1563/1563 - train loss: 13.0341 - test loss: 15.8518 - train acc: 0.4044 - test acc: 0.3179 - 5m 32s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.2815 - test loss: 14.2026 - train acc: 0.4682 - test acc: 0.3726 - 5m 37s
batch: 200/1563 - train loss: 11.3720 - test loss: 14.2265 - train acc: 0.4531 - test acc: 0.3653 - 5m 42s
batch: 300/1563 - train loss: 11.2506 - test loss: 14.4869 - train acc: 0.4585 - test acc: 0.3637 - 5m 48s
batch: 400/1563 - train loss: 11.6914 - test loss: 14.7373 - train acc: 0.4453 - test acc: 0.3617 - 5m 53s
batch: 500/1563 - train loss: 11.7910 - test loss: 14.2378 - train acc: 0.4425 - test acc: 0.3657 - 5m 58s
batch: 600/1563 - train loss: 12.2354

batch: 600/1563 - train loss: 7.3289 - test loss: 14.4075 - train acc: 0.6203 - test acc: 0.4153 - 11m 38s
batch: 700/1563 - train loss: 7.6328 - test loss: 15.8237 - train acc: 0.6056 - test acc: 0.3826 - 11m 43s
batch: 800/1563 - train loss: 8.1826 - test loss: 14.2303 - train acc: 0.5881 - test acc: 0.4047 - 11m 48s
batch: 900/1563 - train loss: 7.8477 - test loss: 13.7089 - train acc: 0.5915 - test acc: 0.4228 - 11m 54s
batch: 1000/1563 - train loss: 7.9381 - test loss: 13.7439 - train acc: 0.6000 - test acc: 0.4244 - 11m 59s
batch: 1100/1563 - train loss: 8.0959 - test loss: 13.8564 - train acc: 0.5965 - test acc: 0.4236 - 12m 4s
batch: 1200/1563 - train loss: 8.0258 - test loss: 14.2787 - train acc: 0.5938 - test acc: 0.4144 - 12m 10s
batch: 1300/1563 - train loss: 7.9547 - test loss: 14.1773 - train acc: 0.5975 - test acc: 0.4084 - 12m 15s
batch: 1400/1563 - train loss: 7.9030 - test loss: 13.8526 - train acc: 0.5981 - test acc: 0.4168 - 12m 20s
batch: 1500/1563 - train loss: 8.

batch: 1500/1563 - train loss: 5.0082 - test loss: 15.9222 - train acc: 0.7263 - test acc: 0.4232 - 18m 5s
batch: 1563/1563 - train loss: 4.8071 - test loss: 16.8495 - train acc: 0.7391 - test acc: 0.4150 - 18m 9s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.9895 - test loss: 15.8495 - train acc: 0.8371 - test acc: 0.4273 - 18m 14s
batch: 200/1563 - train loss: 2.6440 - test loss: 16.8652 - train acc: 0.8559 - test acc: 0.4233 - 18m 20s
batch: 300/1563 - train loss: 3.0685 - test loss: 17.0773 - train acc: 0.8368 - test acc: 0.4202 - 18m 25s
batch: 400/1563 - train loss: 3.3474 - test loss: 16.6706 - train acc: 0.8181 - test acc: 0.4240 - 18m 30s
batch: 500/1563 - train loss: 3.1991 - test loss: 16.8054 - train acc: 0.8249 - test acc: 0.4203 - 18m 36s
batch: 600/1563 - train loss: 3.3246 - test loss: 16.6253 - train acc: 0.8093 - test acc: 0.4316 - 18m 41s
batch: 700/1563 - train loss: 3.5654 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.4664 - test loss: 14.2317 - train acc: 0.3756 - test acc: 0.3648 - 5m 24s
batch: 1500/1563 - train loss: 12.8867 - test loss: 13.9075 - train acc: 0.4075 - test acc: 0.3686 - 5m 30s
batch: 1563/1563 - train loss: 12.8083 - test loss: 15.0019 - train acc: 0.4072 - test acc: 0.3435 - 5m 34s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.3976 - test loss: 13.9575 - train acc: 0.4647 - test acc: 0.3771 - 5m 39s
batch: 200/1563 - train loss: 11.5530 - test loss: 14.7152 - train acc: 0.4591 - test acc: 0.3539 - 5m 44s
batch: 300/1563 - train loss: 11.7578 - test loss: 15.4427 - train acc: 0.4410 - test acc: 0.3397 - 5m 50s
batch: 400/1563 - train loss: 11.4060 - test loss: 13.8760 - train acc: 0.4597 - test acc: 0.3776 - 5m 55s
batch: 500/1563 - train loss: 11.7574 - test loss: 14.1373 - train acc: 0.4409 - test acc: 0.3747 - 6m 1s
batch: 600/1563 - train loss: 11.6542 

batch: 600/1563 - train loss: 8.0000 - test loss: 15.2597 - train acc: 0.5913 - test acc: 0.3907 - 11m 44s
batch: 700/1563 - train loss: 7.8999 - test loss: 14.0368 - train acc: 0.5912 - test acc: 0.4172 - 11m 49s
batch: 800/1563 - train loss: 7.8849 - test loss: 13.9254 - train acc: 0.5996 - test acc: 0.4198 - 11m 55s
batch: 900/1563 - train loss: 7.6828 - test loss: 14.5191 - train acc: 0.6172 - test acc: 0.4120 - 12m 0s
batch: 1000/1563 - train loss: 7.5917 - test loss: 14.5859 - train acc: 0.6160 - test acc: 0.3986 - 12m 5s
batch: 1100/1563 - train loss: 8.0198 - test loss: 13.7074 - train acc: 0.5950 - test acc: 0.4199 - 12m 10s
batch: 1200/1563 - train loss: 8.0900 - test loss: 14.0067 - train acc: 0.5881 - test acc: 0.4140 - 12m 16s
batch: 1300/1563 - train loss: 7.9687 - test loss: 13.8218 - train acc: 0.5959 - test acc: 0.4203 - 12m 21s
batch: 1400/1563 - train loss: 8.1734 - test loss: 13.4749 - train acc: 0.5875 - test acc: 0.4239 - 12m 27s
batch: 1500/1563 - train loss: 7.8

batch: 1500/1563 - train loss: 4.9595 - test loss: 16.0163 - train acc: 0.7244 - test acc: 0.4264 - 18m 18s
batch: 1563/1563 - train loss: 5.1058 - test loss: 15.6579 - train acc: 0.7266 - test acc: 0.4376 - 18m 22s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 3.0027 - test loss: 16.7264 - train acc: 0.8366 - test acc: 0.4225 - 18m 27s
batch: 200/1563 - train loss: 3.1289 - test loss: 16.6018 - train acc: 0.8350 - test acc: 0.4291 - 18m 33s
batch: 300/1563 - train loss: 3.0106 - test loss: 16.7048 - train acc: 0.8287 - test acc: 0.4205 - 18m 38s
batch: 400/1563 - train loss: 3.1006 - test loss: 17.0023 - train acc: 0.8250 - test acc: 0.4273 - 18m 44s
batch: 500/1563 - train loss: 3.4105 - test loss: 17.2829 - train acc: 0.8190 - test acc: 0.4210 - 18m 49s
batch: 600/1563 - train loss: 3.3956 - test loss: 16.8602 - train acc: 0.8040 - test acc: 0.4283 - 18m 55s
batch: 700/1563 - train loss: 3.4733 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.2739 - test loss: 13.6927 - train acc: 0.3935 - test acc: 0.3771 - 5m 27s
batch: 1500/1563 - train loss: 12.8351 - test loss: 14.3659 - train acc: 0.4035 - test acc: 0.3544 - 5m 33s
batch: 1563/1563 - train loss: 12.8172 - test loss: 14.1071 - train acc: 0.4069 - test acc: 0.3682 - 5m 37s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.3812 - test loss: 13.9392 - train acc: 0.4541 - test acc: 0.3750 - 5m 43s
batch: 200/1563 - train loss: 11.5934 - test loss: 13.6788 - train acc: 0.4547 - test acc: 0.3865 - 5m 48s
batch: 300/1563 - train loss: 11.5090 - test loss: 13.6610 - train acc: 0.4600 - test acc: 0.3867 - 5m 54s
batch: 400/1563 - train loss: 11.6678 - test loss: 14.1254 - train acc: 0.4491 - test acc: 0.3715 - 5m 59s
batch: 500/1563 - train loss: 11.5978 - test loss: 14.1210 - train acc: 0.4488 - test acc: 0.3729 - 6m 4s
batch: 600/1563 - train loss: 12.0328 

batch: 600/1563 - train loss: 7.4060 - test loss: 14.0304 - train acc: 0.6087 - test acc: 0.4168 - 11m 53s
batch: 700/1563 - train loss: 7.6184 - test loss: 13.6967 - train acc: 0.6094 - test acc: 0.4265 - 11m 59s
batch: 800/1563 - train loss: 7.7042 - test loss: 14.0013 - train acc: 0.6131 - test acc: 0.4181 - 12m 4s
batch: 900/1563 - train loss: 7.8908 - test loss: 14.0954 - train acc: 0.5962 - test acc: 0.4153 - 12m 10s
batch: 1000/1563 - train loss: 7.8546 - test loss: 13.9713 - train acc: 0.6044 - test acc: 0.4230 - 12m 15s
batch: 1100/1563 - train loss: 8.0165 - test loss: 13.9359 - train acc: 0.5941 - test acc: 0.4187 - 12m 20s
batch: 1200/1563 - train loss: 8.1277 - test loss: 13.5190 - train acc: 0.5834 - test acc: 0.4247 - 12m 26s
batch: 1300/1563 - train loss: 8.1867 - test loss: 14.7538 - train acc: 0.5837 - test acc: 0.3911 - 12m 31s
batch: 1400/1563 - train loss: 7.9880 - test loss: 13.7705 - train acc: 0.5952 - test acc: 0.4255 - 12m 36s
batch: 1500/1563 - train loss: 8.

batch: 1500/1563 - train loss: 4.8592 - test loss: 16.1157 - train acc: 0.7341 - test acc: 0.4223 - 18m 36s
batch: 1563/1563 - train loss: 4.9547 - test loss: 15.7235 - train acc: 0.7363 - test acc: 0.4196 - 18m 41s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.9473 - test loss: 15.7749 - train acc: 0.8321 - test acc: 0.4379 - 18m 46s
batch: 200/1563 - train loss: 2.7986 - test loss: 16.5090 - train acc: 0.8412 - test acc: 0.4260 - 18m 52s
batch: 300/1563 - train loss: 2.9360 - test loss: 16.7750 - train acc: 0.8384 - test acc: 0.4276 - 18m 57s
batch: 400/1563 - train loss: 3.1506 - test loss: 16.3970 - train acc: 0.8190 - test acc: 0.4361 - 19m 3s
batch: 500/1563 - train loss: 2.9904 - test loss: 17.3924 - train acc: 0.8300 - test acc: 0.4135 - 19m 8s
batch: 600/1563 - train loss: 3.3192 - test loss: 17.4158 - train acc: 0.8156 - test acc: 0.4217 - 19m 14s
batch: 700/1563 - train loss: 3.4509 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 12.9952 - test loss: 13.8588 - train acc: 0.4034 - test acc: 0.3720 - 5m 34s
batch: 1500/1563 - train loss: 12.9545 - test loss: 14.2874 - train acc: 0.3960 - test acc: 0.3541 - 5m 39s
batch: 1563/1563 - train loss: 12.7545 - test loss: 14.0042 - train acc: 0.4069 - test acc: 0.3704 - 5m 44s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.3788 - test loss: 20.2351 - train acc: 0.4585 - test acc: 0.2609 - 5m 50s
batch: 200/1563 - train loss: 11.0401 - test loss: 14.0059 - train acc: 0.4716 - test acc: 0.3807 - 5m 56s
batch: 300/1563 - train loss: 11.1619 - test loss: 14.6793 - train acc: 0.4638 - test acc: 0.3523 - 6m 1s
batch: 400/1563 - train loss: 11.5825 - test loss: 14.9588 - train acc: 0.4456 - test acc: 0.3493 - 6m 7s
batch: 500/1563 - train loss: 11.8624 - test loss: 14.0303 - train acc: 0.4440 - test acc: 0.3694 - 6m 13s
batch: 600/1563 - train loss: 11.8247 -

batch: 600/1563 - train loss: 7.3174 - test loss: 14.5966 - train acc: 0.6260 - test acc: 0.4140 - 12m 10s
batch: 700/1563 - train loss: 7.3679 - test loss: 14.5649 - train acc: 0.6188 - test acc: 0.4128 - 12m 15s
batch: 800/1563 - train loss: 7.4998 - test loss: 15.0364 - train acc: 0.6172 - test acc: 0.3921 - 12m 21s
batch: 900/1563 - train loss: 8.1796 - test loss: 13.9973 - train acc: 0.5884 - test acc: 0.4205 - 12m 26s
batch: 1000/1563 - train loss: 7.7033 - test loss: 13.6794 - train acc: 0.6137 - test acc: 0.4274 - 12m 32s
batch: 1100/1563 - train loss: 8.0284 - test loss: 14.0793 - train acc: 0.5913 - test acc: 0.4178 - 12m 38s
batch: 1200/1563 - train loss: 7.6917 - test loss: 14.3547 - train acc: 0.6137 - test acc: 0.4155 - 12m 44s
batch: 1300/1563 - train loss: 8.1471 - test loss: 13.6808 - train acc: 0.5788 - test acc: 0.4291 - 12m 49s
batch: 1400/1563 - train loss: 8.0812 - test loss: 13.9148 - train acc: 0.5940 - test acc: 0.4151 - 12m 54s
batch: 1500/1563 - train loss: 7

batch: 1500/1563 - train loss: 4.8854 - test loss: 15.7136 - train acc: 0.7353 - test acc: 0.4334 - 18m 50s
batch: 1563/1563 - train loss: 4.8798 - test loss: 15.8103 - train acc: 0.7297 - test acc: 0.4242 - 18m 54s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.9109 - test loss: 15.7346 - train acc: 0.8409 - test acc: 0.4418 - 19m 0s
batch: 200/1563 - train loss: 2.5946 - test loss: 16.7891 - train acc: 0.8522 - test acc: 0.4213 - 19m 5s
batch: 300/1563 - train loss: 2.8762 - test loss: 16.1674 - train acc: 0.8393 - test acc: 0.4382 - 19m 11s
batch: 400/1563 - train loss: 2.9216 - test loss: 16.9866 - train acc: 0.8391 - test acc: 0.4256 - 19m 16s
batch: 500/1563 - train loss: 3.4541 - test loss: 16.6481 - train acc: 0.8112 - test acc: 0.4303 - 19m 22s
batch: 600/1563 - train loss: 3.2705 - test loss: 17.5932 - train acc: 0.8124 - test acc: 0.4172 - 19m 27s
batch: 700/1563 - train loss: 3.6446 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 2359296 - partition: 30 - nº part: 78644 - block updates: 360
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140

batch: 1200/1563 - train loss: 13.0559 - test loss: 14.5088 - train acc: 0.3922 - test acc: 0.3586 - 5m 26s
batch: 1300/1563 - train loss: 13.2250 - test loss: 14.1723 - train acc: 0.3868 - test acc: 0.3620 - 5m 31s
batch: 1400/1563 - train loss: 12.9659 - test loss: 14.2528 - train acc: 0.4191 - test acc: 0.3603 - 5m 36s
batch: 1500/1563 - train loss: 12.6685 - test loss: 14.1398 - train acc: 0.4150 - test acc: 0.3631 - 5m 42s
batch: 1563/1563 - train loss: 12.9772 - test loss: 14.2298 - train acc: 0.4031 - test acc: 0.3638 - 5m 46s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.3410 - test loss: 13.6200 - train acc: 0.4594 - test acc: 0.3852 - 5m 52s
batch: 200/1563 - train loss: 11.4834 - test loss: 15.1585 - train acc: 0.4532 - test acc: 0.3470 - 5m 58s
batch: 300/1563 - train loss: 11.7172 - test loss: 15.1012 - train acc: 0.4532 - test acc: 0.3415 - 6m 3s
batch: 400/1563 - train loss: 12.056

batch: 400/1563 - train loss: 7.0634 - test loss: 14.4155 - train acc: 0.6331 - test acc: 0.4113 - 11m 59s
batch: 500/1563 - train loss: 7.1771 - test loss: 14.2826 - train acc: 0.6197 - test acc: 0.4116 - 12m 4s
batch: 600/1563 - train loss: 7.4854 - test loss: 14.7363 - train acc: 0.6145 - test acc: 0.3972 - 12m 9s
batch: 700/1563 - train loss: 7.7468 - test loss: 13.9330 - train acc: 0.6075 - test acc: 0.4250 - 12m 15s
batch: 800/1563 - train loss: 7.5627 - test loss: 14.4290 - train acc: 0.6147 - test acc: 0.4101 - 12m 21s
batch: 900/1563 - train loss: 7.7999 - test loss: 14.5260 - train acc: 0.5968 - test acc: 0.4015 - 12m 26s
batch: 1000/1563 - train loss: 8.1099 - test loss: 13.6483 - train acc: 0.5903 - test acc: 0.4235 - 12m 32s
batch: 1100/1563 - train loss: 7.7768 - test loss: 14.6341 - train acc: 0.6062 - test acc: 0.3935 - 12m 37s
batch: 1200/1563 - train loss: 8.0160 - test loss: 14.4299 - train acc: 0.5931 - test acc: 0.4032 - 12m 43s
batch: 1300/1563 - train loss: 8.171

batch: 1300/1563 - train loss: 4.8097 - test loss: 20.0160 - train acc: 0.7422 - test acc: 0.3699 - 18m 35s
batch: 1400/1563 - train loss: 4.9301 - test loss: 15.6338 - train acc: 0.7315 - test acc: 0.4303 - 18m 41s
batch: 1500/1563 - train loss: 4.7651 - test loss: 15.8063 - train acc: 0.7428 - test acc: 0.4295 - 18m 46s
batch: 1563/1563 - train loss: 4.6316 - test loss: 15.9769 - train acc: 0.7522 - test acc: 0.4244 - 18m 51s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.8657 - test loss: 16.3397 - train acc: 0.8296 - test acc: 0.4310 - 18m 56s
batch: 200/1563 - train loss: 2.7529 - test loss: 16.4913 - train acc: 0.8471 - test acc: 0.4385 - 19m 2s
batch: 300/1563 - train loss: 2.9479 - test loss: 17.8014 - train acc: 0.8412 - test acc: 0.3986 - 19m 7s
batch: 400/1563 - train loss: 2.8928 - test loss: 16.6016 - train acc: 0.8384 - test acc: 0.4344 - 19m 13s
batch: 500/1563 - train loss: 3.1847 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.4183 - test loss: 14.4260 - train acc: 0.3885 - test acc: 0.3536 - 5m 31s
batch: 1500/1563 - train loss: 13.1527 - test loss: 15.5320 - train acc: 0.3928 - test acc: 0.3203 - 5m 36s
batch: 1563/1563 - train loss: 13.1786 - test loss: 14.2941 - train acc: 0.3772 - test acc: 0.3621 - 5m 41s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.4322 - test loss: 14.3951 - train acc: 0.4525 - test acc: 0.3629 - 5m 46s
batch: 200/1563 - train loss: 11.1852 - test loss: 14.2194 - train acc: 0.4647 - test acc: 0.3692 - 5m 52s
batch: 300/1563 - train loss: 11.7556 - test loss: 14.6730 - train acc: 0.4353 - test acc: 0.3589 - 5m 57s
batch: 400/1563 - train loss: 11.7041 - test loss: 14.3915 - train acc: 0.4453 - test acc: 0.3667 - 6m 2s
batch: 500/1563 - train loss: 11.8379 - test loss: 13.9582 - train acc: 0.4341 - test acc: 0.3780 - 6m 8s
batch: 600/1563 - train loss: 11.7914 -

batch: 600/1563 - train loss: 7.6231 - test loss: 14.7712 - train acc: 0.6144 - test acc: 0.3943 - 12m 3s
batch: 700/1563 - train loss: 7.7023 - test loss: 14.9476 - train acc: 0.6184 - test acc: 0.3921 - 12m 9s
batch: 800/1563 - train loss: 7.4699 - test loss: 13.9900 - train acc: 0.6219 - test acc: 0.4202 - 12m 14s
batch: 900/1563 - train loss: 7.9476 - test loss: 14.3343 - train acc: 0.5943 - test acc: 0.4038 - 12m 20s
batch: 1000/1563 - train loss: 7.7686 - test loss: 14.2524 - train acc: 0.6022 - test acc: 0.4073 - 12m 25s
batch: 1100/1563 - train loss: 7.8915 - test loss: 14.1289 - train acc: 0.6006 - test acc: 0.4118 - 12m 31s
batch: 1200/1563 - train loss: 8.2709 - test loss: 14.7843 - train acc: 0.5778 - test acc: 0.4038 - 12m 37s
batch: 1300/1563 - train loss: 8.2023 - test loss: 14.0395 - train acc: 0.5934 - test acc: 0.4142 - 12m 42s
batch: 1400/1563 - train loss: 8.0080 - test loss: 14.0388 - train acc: 0.5960 - test acc: 0.4131 - 12m 48s
batch: 1500/1563 - train loss: 8.2

batch: 1500/1563 - train loss: 5.0814 - test loss: 15.7325 - train acc: 0.7235 - test acc: 0.4199 - 18m 43s
batch: 1563/1563 - train loss: 4.9820 - test loss: 15.8029 - train acc: 0.7241 - test acc: 0.4250 - 18m 48s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.8765 - test loss: 15.7443 - train acc: 0.8422 - test acc: 0.4437 - 18m 54s
batch: 200/1563 - train loss: 2.8279 - test loss: 17.0766 - train acc: 0.8453 - test acc: 0.4196 - 19m 0s
batch: 300/1563 - train loss: 3.0189 - test loss: 16.7911 - train acc: 0.8309 - test acc: 0.4251 - 19m 5s
batch: 400/1563 - train loss: 3.0995 - test loss: 16.5934 - train acc: 0.8197 - test acc: 0.4311 - 19m 11s
batch: 500/1563 - train loss: 3.4913 - test loss: 17.4719 - train acc: 0.8109 - test acc: 0.4099 - 19m 17s
batch: 600/1563 - train loss: 3.4368 - test loss: 16.8839 - train acc: 0.8068 - test acc: 0.4199 - 19m 22s
batch: 700/1563 - train loss: 3.6900 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 12.8115 - test loss: 14.0473 - train acc: 0.4063 - test acc: 0.3718 - 5m 30s
batch: 1500/1563 - train loss: 12.8502 - test loss: 13.5873 - train acc: 0.4078 - test acc: 0.3874 - 5m 35s
batch: 1563/1563 - train loss: 12.9926 - test loss: 14.1739 - train acc: 0.3978 - test acc: 0.3692 - 5m 40s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.3408 - test loss: 14.3478 - train acc: 0.4547 - test acc: 0.3653 - 5m 45s
batch: 200/1563 - train loss: 11.5062 - test loss: 13.9527 - train acc: 0.4563 - test acc: 0.3822 - 5m 50s
batch: 300/1563 - train loss: 11.3520 - test loss: 15.1235 - train acc: 0.4594 - test acc: 0.3511 - 5m 56s
batch: 400/1563 - train loss: 11.5833 - test loss: 13.5707 - train acc: 0.4428 - test acc: 0.3859 - 6m 1s
batch: 500/1563 - train loss: 11.1084 - test loss: 14.8120 - train acc: 0.4681 - test acc: 0.3551 - 6m 7s
batch: 600/1563 - train loss: 11.6988 -

batch: 600/1563 - train loss: 7.3109 - test loss: 14.7518 - train acc: 0.6237 - test acc: 0.4103 - 12m 1s
batch: 700/1563 - train loss: 7.3994 - test loss: 13.9521 - train acc: 0.6209 - test acc: 0.4248 - 12m 7s
batch: 800/1563 - train loss: 7.5884 - test loss: 14.0363 - train acc: 0.6163 - test acc: 0.4194 - 12m 12s
batch: 900/1563 - train loss: 7.8003 - test loss: 14.3616 - train acc: 0.6140 - test acc: 0.4136 - 12m 18s
batch: 1000/1563 - train loss: 7.4572 - test loss: 13.6770 - train acc: 0.6213 - test acc: 0.4285 - 12m 24s
batch: 1100/1563 - train loss: 7.8563 - test loss: 13.7119 - train acc: 0.6010 - test acc: 0.4256 - 12m 29s
batch: 1200/1563 - train loss: 7.7225 - test loss: 14.8721 - train acc: 0.6059 - test acc: 0.3979 - 12m 34s
batch: 1300/1563 - train loss: 7.9070 - test loss: 14.1906 - train acc: 0.5940 - test acc: 0.4156 - 12m 39s
batch: 1400/1563 - train loss: 8.3139 - test loss: 13.9361 - train acc: 0.5856 - test acc: 0.4190 - 12m 45s
batch: 1500/1563 - train loss: 8.2

batch: 1500/1563 - train loss: 5.1647 - test loss: 15.6982 - train acc: 0.7106 - test acc: 0.4266 - 18m 41s
batch: 1563/1563 - train loss: 5.0049 - test loss: 15.5021 - train acc: 0.7235 - test acc: 0.4315 - 18m 46s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.9160 - test loss: 16.2641 - train acc: 0.8381 - test acc: 0.4389 - 18m 51s
batch: 200/1563 - train loss: 2.9346 - test loss: 16.2731 - train acc: 0.8346 - test acc: 0.4329 - 18m 57s
batch: 300/1563 - train loss: 2.8645 - test loss: 16.4977 - train acc: 0.8400 - test acc: 0.4325 - 19m 2s
batch: 400/1563 - train loss: 3.1190 - test loss: 17.0503 - train acc: 0.8200 - test acc: 0.4263 - 19m 7s
batch: 500/1563 - train loss: 3.1511 - test loss: 16.4205 - train acc: 0.8219 - test acc: 0.4332 - 19m 13s
batch: 600/1563 - train loss: 3.3001 - test loss: 17.1135 - train acc: 0.8112 - test acc: 0.4212 - 19m 19s
batch: 700/1563 - train loss: 3.6129 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.3058 - test loss: 14.3882 - train acc: 0.3922 - test acc: 0.3558 - 5m 30s
batch: 1500/1563 - train loss: 13.0600 - test loss: 14.1405 - train acc: 0.4032 - test acc: 0.3641 - 5m 36s
batch: 1563/1563 - train loss: 13.0592 - test loss: 14.0124 - train acc: 0.4047 - test acc: 0.3660 - 5m 40s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.4198 - test loss: 14.1518 - train acc: 0.4513 - test acc: 0.3644 - 5m 46s
batch: 200/1563 - train loss: 11.5345 - test loss: 14.9007 - train acc: 0.4522 - test acc: 0.3560 - 5m 52s
batch: 300/1563 - train loss: 11.5175 - test loss: 14.0469 - train acc: 0.4509 - test acc: 0.3729 - 5m 57s
batch: 400/1563 - train loss: 11.4774 - test loss: 13.9663 - train acc: 0.4532 - test acc: 0.3789 - 6m 3s
batch: 500/1563 - train loss: 11.6273 - test loss: 14.3115 - train acc: 0.4534 - test acc: 0.3704 - 6m 8s
batch: 600/1563 - train loss: 11.7795 -

batch: 600/1563 - train loss: 7.6207 - test loss: 14.4335 - train acc: 0.6047 - test acc: 0.4084 - 12m 2s
batch: 700/1563 - train loss: 7.8181 - test loss: 13.8198 - train acc: 0.6053 - test acc: 0.4245 - 12m 8s
batch: 800/1563 - train loss: 7.5491 - test loss: 14.0014 - train acc: 0.6090 - test acc: 0.4205 - 12m 13s
batch: 900/1563 - train loss: 7.6327 - test loss: 13.7608 - train acc: 0.6112 - test acc: 0.4309 - 12m 19s
batch: 1000/1563 - train loss: 7.5766 - test loss: 14.6005 - train acc: 0.6138 - test acc: 0.4083 - 12m 24s
batch: 1100/1563 - train loss: 8.1401 - test loss: 13.8967 - train acc: 0.5900 - test acc: 0.4136 - 12m 29s
batch: 1200/1563 - train loss: 7.8169 - test loss: 13.9088 - train acc: 0.6013 - test acc: 0.4209 - 12m 35s
batch: 1300/1563 - train loss: 7.6757 - test loss: 14.5770 - train acc: 0.6094 - test acc: 0.4130 - 12m 40s
batch: 1400/1563 - train loss: 8.0766 - test loss: 13.8095 - train acc: 0.5844 - test acc: 0.4241 - 12m 46s
batch: 1500/1563 - train loss: 8.3

batch: 1500/1563 - train loss: 5.0499 - test loss: 15.7947 - train acc: 0.7353 - test acc: 0.4161 - 18m 36s
batch: 1563/1563 - train loss: 4.8871 - test loss: 15.5792 - train acc: 0.7353 - test acc: 0.4266 - 18m 41s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.8691 - test loss: 16.3005 - train acc: 0.8412 - test acc: 0.4283 - 18m 46s
batch: 200/1563 - train loss: 2.7408 - test loss: 16.6929 - train acc: 0.8512 - test acc: 0.4267 - 18m 52s
batch: 300/1563 - train loss: 2.7987 - test loss: 16.3470 - train acc: 0.8449 - test acc: 0.4270 - 18m 57s
batch: 400/1563 - train loss: 3.1986 - test loss: 16.4437 - train acc: 0.8215 - test acc: 0.4324 - 19m 3s
batch: 500/1563 - train loss: 3.3142 - test loss: 17.1961 - train acc: 0.8146 - test acc: 0.4223 - 19m 9s
batch: 600/1563 - train loss: 3.3345 - test loss: 16.5961 - train acc: 0.8146 - test acc: 0.4279 - 19m 14s
batch: 700/1563 - train loss: 3.3775 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.2020 - test loss: 14.4403 - train acc: 0.3912 - test acc: 0.3542 - 5m 27s
batch: 1500/1563 - train loss: 13.1420 - test loss: 14.4659 - train acc: 0.3916 - test acc: 0.3521 - 5m 33s
batch: 1563/1563 - train loss: 13.2052 - test loss: 15.8891 - train acc: 0.3913 - test acc: 0.3112 - 5m 38s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.6415 - test loss: 14.4720 - train acc: 0.4485 - test acc: 0.3602 - 5m 43s
batch: 200/1563 - train loss: 11.6210 - test loss: 14.3038 - train acc: 0.4534 - test acc: 0.3537 - 5m 48s
batch: 300/1563 - train loss: 11.8983 - test loss: 14.8190 - train acc: 0.4341 - test acc: 0.3444 - 5m 53s
batch: 400/1563 - train loss: 11.8858 - test loss: 14.8658 - train acc: 0.4282 - test acc: 0.3548 - 5m 59s
batch: 500/1563 - train loss: 11.7510 - test loss: 13.8573 - train acc: 0.4341 - test acc: 0.3818 - 6m 4s
batch: 600/1563 - train loss: 11.7059 

batch: 600/1563 - train loss: 7.7251 - test loss: 13.9837 - train acc: 0.6053 - test acc: 0.4236 - 11m 56s
batch: 700/1563 - train loss: 8.1509 - test loss: 13.7007 - train acc: 0.5941 - test acc: 0.4242 - 12m 2s
batch: 800/1563 - train loss: 8.1601 - test loss: 14.2061 - train acc: 0.5834 - test acc: 0.4045 - 12m 7s
batch: 900/1563 - train loss: 7.8798 - test loss: 14.4421 - train acc: 0.6022 - test acc: 0.4118 - 12m 12s
batch: 1000/1563 - train loss: 8.3664 - test loss: 15.2448 - train acc: 0.5893 - test acc: 0.3893 - 12m 18s
batch: 1100/1563 - train loss: 8.4472 - test loss: 13.5027 - train acc: 0.5759 - test acc: 0.4247 - 12m 23s
batch: 1200/1563 - train loss: 8.6329 - test loss: 14.0132 - train acc: 0.5712 - test acc: 0.4100 - 12m 29s
batch: 1300/1563 - train loss: 8.5692 - test loss: 14.6237 - train acc: 0.5840 - test acc: 0.3814 - 12m 34s
batch: 1400/1563 - train loss: 8.1583 - test loss: 13.8167 - train acc: 0.5931 - test acc: 0.4168 - 12m 40s
batch: 1500/1563 - train loss: 8.5

batch: 1500/1563 - train loss: 5.4825 - test loss: 15.2528 - train acc: 0.7044 - test acc: 0.4229 - 18m 31s
batch: 1563/1563 - train loss: 5.9067 - test loss: 15.7347 - train acc: 0.6882 - test acc: 0.4079 - 18m 35s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 3.3002 - test loss: 15.2144 - train acc: 0.8228 - test acc: 0.4383 - 18m 41s
batch: 200/1563 - train loss: 3.4628 - test loss: 15.9554 - train acc: 0.8074 - test acc: 0.4241 - 18m 46s
batch: 300/1563 - train loss: 3.5517 - test loss: 17.0190 - train acc: 0.8081 - test acc: 0.4141 - 18m 52s
batch: 400/1563 - train loss: 3.5931 - test loss: 16.4777 - train acc: 0.7999 - test acc: 0.4232 - 18m 57s
batch: 500/1563 - train loss: 3.7836 - test loss: 16.2026 - train acc: 0.8012 - test acc: 0.4324 - 19m 3s
batch: 600/1563 - train loss: 3.7828 - test loss: 16.0195 - train acc: 0.7909 - test acc: 0.4289 - 19m 8s
batch: 700/1563 - train loss: 4.0628 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 12.9678 - test loss: 13.7631 - train acc: 0.3884 - test acc: 0.3761 - 5m 34s
batch: 1500/1563 - train loss: 13.4553 - test loss: 14.5099 - train acc: 0.3819 - test acc: 0.3510 - 5m 39s
batch: 1563/1563 - train loss: 13.0736 - test loss: 13.9023 - train acc: 0.3950 - test acc: 0.3664 - 5m 44s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.6291 - test loss: 13.5802 - train acc: 0.4462 - test acc: 0.3860 - 5m 49s
batch: 200/1563 - train loss: 11.3980 - test loss: 14.0720 - train acc: 0.4591 - test acc: 0.3703 - 5m 55s
batch: 300/1563 - train loss: 11.4943 - test loss: 15.1213 - train acc: 0.4556 - test acc: 0.3403 - 6m 1s
batch: 400/1563 - train loss: 11.6854 - test loss: 14.0194 - train acc: 0.4432 - test acc: 0.3783 - 6m 6s
batch: 500/1563 - train loss: 11.6887 - test loss: 14.0584 - train acc: 0.4450 - test acc: 0.3710 - 6m 11s
batch: 600/1563 - train loss: 11.6862 -

batch: 600/1563 - train loss: 7.3579 - test loss: 14.1318 - train acc: 0.6109 - test acc: 0.4155 - 12m 6s
batch: 700/1563 - train loss: 7.7131 - test loss: 14.1464 - train acc: 0.6031 - test acc: 0.4120 - 12m 12s
batch: 800/1563 - train loss: 7.8455 - test loss: 13.7115 - train acc: 0.6075 - test acc: 0.4305 - 12m 18s
batch: 900/1563 - train loss: 7.6916 - test loss: 13.6549 - train acc: 0.6130 - test acc: 0.4246 - 12m 23s
batch: 1000/1563 - train loss: 7.7102 - test loss: 14.8434 - train acc: 0.6022 - test acc: 0.3973 - 12m 28s
batch: 1100/1563 - train loss: 7.6777 - test loss: 13.8981 - train acc: 0.6125 - test acc: 0.4305 - 12m 34s
batch: 1200/1563 - train loss: 7.7599 - test loss: 14.0009 - train acc: 0.6112 - test acc: 0.4191 - 12m 39s
batch: 1300/1563 - train loss: 8.0042 - test loss: 13.9364 - train acc: 0.5994 - test acc: 0.4230 - 12m 45s
batch: 1400/1563 - train loss: 8.0643 - test loss: 13.7184 - train acc: 0.5884 - test acc: 0.4210 - 12m 51s
batch: 1500/1563 - train loss: 8.

batch: 1500/1563 - train loss: 4.6222 - test loss: 15.8953 - train acc: 0.7485 - test acc: 0.4294 - 18m 51s
batch: 1563/1563 - train loss: 4.9399 - test loss: 16.2438 - train acc: 0.7403 - test acc: 0.4090 - 18m 55s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.9860 - test loss: 17.0245 - train acc: 0.8299 - test acc: 0.4095 - 19m 1s
batch: 200/1563 - train loss: 2.8511 - test loss: 16.0334 - train acc: 0.8415 - test acc: 0.4366 - 19m 6s
batch: 300/1563 - train loss: 2.9479 - test loss: 16.1857 - train acc: 0.8359 - test acc: 0.4317 - 19m 11s
batch: 400/1563 - train loss: 3.0596 - test loss: 16.7983 - train acc: 0.8287 - test acc: 0.4317 - 19m 17s
batch: 500/1563 - train loss: 3.0657 - test loss: 17.5765 - train acc: 0.8252 - test acc: 0.4120 - 19m 22s
batch: 600/1563 - train loss: 3.3003 - test loss: 17.8133 - train acc: 0.8153 - test acc: 0.4055 - 19m 28s
batch: 700/1563 - train loss: 3.5834 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 2359296 - partition: 30 - nº part: 78644 - block updates: 360
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140

batch: 1200/1563 - train loss: 13.0936 - test loss: 14.9661 - train acc: 0.4041 - test acc: 0.3359 - 5m 24s
batch: 1300/1563 - train loss: 13.2703 - test loss: 13.8208 - train acc: 0.4025 - test acc: 0.3770 - 5m 29s
batch: 1400/1563 - train loss: 12.9956 - test loss: 13.9070 - train acc: 0.3896 - test acc: 0.3650 - 5m 35s
batch: 1500/1563 - train loss: 12.8826 - test loss: 15.3007 - train acc: 0.3950 - test acc: 0.3374 - 5m 41s
batch: 1563/1563 - train loss: 12.8485 - test loss: 14.4961 - train acc: 0.3891 - test acc: 0.3513 - 5m 46s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.3414 - test loss: 14.6343 - train acc: 0.4619 - test acc: 0.3537 - 5m 51s
batch: 200/1563 - train loss: 11.5209 - test loss: 14.1840 - train acc: 0.4479 - test acc: 0.3669 - 5m 57s
batch: 300/1563 - train loss: 11.8762 - test loss: 13.9758 - train acc: 0.4419 - test acc: 0.3706 - 6m 3s
batch: 400/1563 - train loss: 11.763

batch: 400/1563 - train loss: 7.3324 - test loss: 13.6058 - train acc: 0.6197 - test acc: 0.4285 - 12m 4s
batch: 500/1563 - train loss: 7.4315 - test loss: 14.1041 - train acc: 0.6219 - test acc: 0.4153 - 12m 9s
batch: 600/1563 - train loss: 7.4614 - test loss: 13.9213 - train acc: 0.6197 - test acc: 0.4179 - 12m 15s
batch: 700/1563 - train loss: 7.2891 - test loss: 14.7276 - train acc: 0.6269 - test acc: 0.4030 - 12m 20s
batch: 800/1563 - train loss: 7.6618 - test loss: 15.0423 - train acc: 0.5987 - test acc: 0.3929 - 12m 26s
batch: 900/1563 - train loss: 7.7504 - test loss: 14.0927 - train acc: 0.6047 - test acc: 0.4226 - 12m 31s
batch: 1000/1563 - train loss: 8.4909 - test loss: 14.1887 - train acc: 0.5662 - test acc: 0.4095 - 12m 37s
batch: 1100/1563 - train loss: 7.8233 - test loss: 14.6712 - train acc: 0.6094 - test acc: 0.4025 - 12m 42s
batch: 1200/1563 - train loss: 8.0544 - test loss: 13.4991 - train acc: 0.5855 - test acc: 0.4240 - 12m 48s
batch: 1300/1563 - train loss: 8.067

batch: 1300/1563 - train loss: 4.8312 - test loss: 15.7780 - train acc: 0.7372 - test acc: 0.4309 - 18m 45s
batch: 1400/1563 - train loss: 4.9465 - test loss: 15.6844 - train acc: 0.7309 - test acc: 0.4278 - 18m 51s
batch: 1500/1563 - train loss: 4.8061 - test loss: 16.5530 - train acc: 0.7422 - test acc: 0.4121 - 18m 57s
batch: 1563/1563 - train loss: 5.3224 - test loss: 16.7108 - train acc: 0.7176 - test acc: 0.4124 - 19m 1s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 3.0252 - test loss: 16.1671 - train acc: 0.8305 - test acc: 0.4294 - 19m 6s
batch: 200/1563 - train loss: 2.8816 - test loss: 17.0599 - train acc: 0.8396 - test acc: 0.4226 - 19m 12s
batch: 300/1563 - train loss: 3.0206 - test loss: 16.3133 - train acc: 0.8381 - test acc: 0.4355 - 19m 17s
batch: 400/1563 - train loss: 3.2123 - test loss: 16.5153 - train acc: 0.8315 - test acc: 0.4310 - 19m 23s
batch: 500/1563 - train loss: 3.3888 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.1407 - test loss: 14.1630 - train acc: 0.3925 - test acc: 0.3644 - 5m 33s
batch: 1500/1563 - train loss: 13.4940 - test loss: 13.7808 - train acc: 0.3819 - test acc: 0.3789 - 5m 38s
batch: 1563/1563 - train loss: 13.2106 - test loss: 15.0598 - train acc: 0.3900 - test acc: 0.3445 - 5m 43s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.4931 - test loss: 14.3092 - train acc: 0.4441 - test acc: 0.3632 - 5m 48s
batch: 200/1563 - train loss: 11.3381 - test loss: 14.7987 - train acc: 0.4506 - test acc: 0.3567 - 5m 53s
batch: 300/1563 - train loss: 11.5236 - test loss: 14.2380 - train acc: 0.4479 - test acc: 0.3727 - 5m 59s
batch: 400/1563 - train loss: 11.5197 - test loss: 15.1620 - train acc: 0.4466 - test acc: 0.3557 - 6m 4s
batch: 500/1563 - train loss: 12.1647 - test loss: 16.2367 - train acc: 0.4294 - test acc: 0.3044 - 6m 9s
batch: 600/1563 - train loss: 11.8423 -

batch: 600/1563 - train loss: 7.7082 - test loss: 13.7641 - train acc: 0.6091 - test acc: 0.4216 - 11m 54s
batch: 700/1563 - train loss: 7.5006 - test loss: 13.9941 - train acc: 0.6225 - test acc: 0.4168 - 12m 0s
batch: 800/1563 - train loss: 7.4369 - test loss: 13.7995 - train acc: 0.6228 - test acc: 0.4235 - 12m 5s
batch: 900/1563 - train loss: 7.7453 - test loss: 14.4181 - train acc: 0.6072 - test acc: 0.3992 - 12m 10s
batch: 1000/1563 - train loss: 7.9457 - test loss: 13.7456 - train acc: 0.6003 - test acc: 0.4188 - 12m 16s
batch: 1100/1563 - train loss: 7.7842 - test loss: 14.4815 - train acc: 0.6059 - test acc: 0.4042 - 12m 21s
batch: 1200/1563 - train loss: 7.9255 - test loss: 14.1188 - train acc: 0.6122 - test acc: 0.4112 - 12m 27s
batch: 1300/1563 - train loss: 8.1501 - test loss: 13.9683 - train acc: 0.5900 - test acc: 0.4106 - 12m 32s
batch: 1400/1563 - train loss: 7.9410 - test loss: 13.5186 - train acc: 0.5887 - test acc: 0.4285 - 12m 37s
batch: 1500/1563 - train loss: 7.7

batch: 1500/1563 - train loss: 4.7244 - test loss: 16.0862 - train acc: 0.7481 - test acc: 0.4283 - 18m 29s
batch: 1563/1563 - train loss: 4.8039 - test loss: 16.0488 - train acc: 0.7450 - test acc: 0.4308 - 18m 33s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 3.0369 - test loss: 16.2930 - train acc: 0.8363 - test acc: 0.4367 - 18m 39s
batch: 200/1563 - train loss: 2.8499 - test loss: 16.8242 - train acc: 0.8418 - test acc: 0.4244 - 18m 44s
batch: 300/1563 - train loss: 3.0944 - test loss: 16.8748 - train acc: 0.8321 - test acc: 0.4211 - 18m 50s
batch: 400/1563 - train loss: 3.0251 - test loss: 16.4159 - train acc: 0.8356 - test acc: 0.4330 - 18m 55s
batch: 500/1563 - train loss: 3.4653 - test loss: 16.5974 - train acc: 0.8028 - test acc: 0.4260 - 19m 1s
batch: 600/1563 - train loss: 3.2196 - test loss: 17.1608 - train acc: 0.8240 - test acc: 0.4217 - 19m 6s
batch: 700/1563 - train loss: 3.3542 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.2473 - test loss: 14.3024 - train acc: 0.3816 - test acc: 0.3589 - 5m 31s
batch: 1500/1563 - train loss: 13.1430 - test loss: 15.4259 - train acc: 0.3925 - test acc: 0.3178 - 5m 37s
batch: 1563/1563 - train loss: 13.2168 - test loss: 15.6836 - train acc: 0.3966 - test acc: 0.3167 - 5m 41s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.2398 - test loss: 15.6691 - train acc: 0.4663 - test acc: 0.3270 - 5m 47s
batch: 200/1563 - train loss: 11.5597 - test loss: 14.0775 - train acc: 0.4550 - test acc: 0.3746 - 5m 52s
batch: 300/1563 - train loss: 11.3375 - test loss: 13.9679 - train acc: 0.4519 - test acc: 0.3747 - 5m 57s
batch: 400/1563 - train loss: 11.5967 - test loss: 14.5818 - train acc: 0.4463 - test acc: 0.3583 - 6m 3s
batch: 500/1563 - train loss: 11.8503 - test loss: 15.7343 - train acc: 0.4312 - test acc: 0.3298 - 6m 8s
batch: 600/1563 - train loss: 11.6741 -

batch: 600/1563 - train loss: 7.5771 - test loss: 15.8307 - train acc: 0.6109 - test acc: 0.3749 - 11m 57s
batch: 700/1563 - train loss: 7.4878 - test loss: 14.3279 - train acc: 0.6234 - test acc: 0.4106 - 12m 3s
batch: 800/1563 - train loss: 7.6403 - test loss: 15.0851 - train acc: 0.6241 - test acc: 0.3869 - 12m 8s
batch: 900/1563 - train loss: 7.7329 - test loss: 14.0748 - train acc: 0.6084 - test acc: 0.4192 - 12m 13s
batch: 1000/1563 - train loss: 7.8073 - test loss: 14.2772 - train acc: 0.6122 - test acc: 0.4097 - 12m 18s
batch: 1100/1563 - train loss: 7.9461 - test loss: 14.3034 - train acc: 0.5997 - test acc: 0.4074 - 12m 24s
batch: 1200/1563 - train loss: 8.0045 - test loss: 14.3972 - train acc: 0.5949 - test acc: 0.4062 - 12m 29s
batch: 1300/1563 - train loss: 7.9847 - test loss: 13.6942 - train acc: 0.5925 - test acc: 0.4251 - 12m 35s
batch: 1400/1563 - train loss: 8.1198 - test loss: 14.1536 - train acc: 0.5897 - test acc: 0.4098 - 12m 40s
batch: 1500/1563 - train loss: 8.2

batch: 1500/1563 - train loss: 4.9550 - test loss: 15.7995 - train acc: 0.7306 - test acc: 0.4296 - 18m 34s
batch: 1563/1563 - train loss: 4.9153 - test loss: 16.0846 - train acc: 0.7285 - test acc: 0.4207 - 18m 38s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.8630 - test loss: 15.5883 - train acc: 0.8406 - test acc: 0.4345 - 18m 43s
batch: 200/1563 - train loss: 2.9700 - test loss: 16.5457 - train acc: 0.8305 - test acc: 0.4284 - 18m 49s
batch: 300/1563 - train loss: 2.9114 - test loss: 16.8453 - train acc: 0.8381 - test acc: 0.4230 - 18m 55s
batch: 400/1563 - train loss: 3.0369 - test loss: 16.6679 - train acc: 0.8268 - test acc: 0.4235 - 19m 0s
batch: 500/1563 - train loss: 3.1572 - test loss: 16.3949 - train acc: 0.8255 - test acc: 0.4325 - 19m 6s
batch: 600/1563 - train loss: 3.4532 - test loss: 17.0753 - train acc: 0.8025 - test acc: 0.4105 - 19m 11s
batch: 700/1563 - train loss: 3.5705 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.0160 - test loss: 15.2929 - train acc: 0.3956 - test acc: 0.3350 - 5m 32s
batch: 1500/1563 - train loss: 13.1316 - test loss: 14.8903 - train acc: 0.3975 - test acc: 0.3434 - 5m 38s
batch: 1563/1563 - train loss: 12.7507 - test loss: 14.0456 - train acc: 0.4094 - test acc: 0.3700 - 5m 42s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.3308 - test loss: 14.0926 - train acc: 0.4638 - test acc: 0.3691 - 5m 47s
batch: 200/1563 - train loss: 11.3616 - test loss: 14.4192 - train acc: 0.4537 - test acc: 0.3644 - 5m 53s
batch: 300/1563 - train loss: 11.5675 - test loss: 15.3566 - train acc: 0.4481 - test acc: 0.3414 - 5m 59s
batch: 400/1563 - train loss: 11.5861 - test loss: 14.1948 - train acc: 0.4528 - test acc: 0.3669 - 6m 4s
batch: 500/1563 - train loss: 11.8850 - test loss: 14.1636 - train acc: 0.4332 - test acc: 0.3675 - 6m 9s
batch: 600/1563 - train loss: 11.7500 -

batch: 600/1563 - train loss: 7.2366 - test loss: 15.4787 - train acc: 0.6194 - test acc: 0.3848 - 11m 58s
batch: 700/1563 - train loss: 7.7028 - test loss: 14.9122 - train acc: 0.6034 - test acc: 0.3933 - 12m 4s
batch: 800/1563 - train loss: 7.6778 - test loss: 14.2275 - train acc: 0.6047 - test acc: 0.4127 - 12m 9s
batch: 900/1563 - train loss: 7.8373 - test loss: 16.5945 - train acc: 0.6040 - test acc: 0.3584 - 12m 14s
batch: 1000/1563 - train loss: 7.9061 - test loss: 14.4491 - train acc: 0.5975 - test acc: 0.4054 - 12m 20s
batch: 1100/1563 - train loss: 7.9863 - test loss: 14.0066 - train acc: 0.5969 - test acc: 0.4162 - 12m 25s
batch: 1200/1563 - train loss: 7.7145 - test loss: 14.5644 - train acc: 0.6090 - test acc: 0.4052 - 12m 31s
batch: 1300/1563 - train loss: 8.0068 - test loss: 14.4611 - train acc: 0.5981 - test acc: 0.4020 - 12m 36s
batch: 1400/1563 - train loss: 7.9445 - test loss: 13.9616 - train acc: 0.5931 - test acc: 0.4183 - 12m 42s
batch: 1500/1563 - train loss: 8.1

batch: 1500/1563 - train loss: 4.9460 - test loss: 15.8154 - train acc: 0.7341 - test acc: 0.4289 - 18m 34s
batch: 1563/1563 - train loss: 5.1564 - test loss: 15.9478 - train acc: 0.7257 - test acc: 0.4223 - 18m 39s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.7470 - test loss: 16.3441 - train acc: 0.8428 - test acc: 0.4285 - 18m 44s
batch: 200/1563 - train loss: 2.7951 - test loss: 17.0944 - train acc: 0.8497 - test acc: 0.4148 - 18m 49s
batch: 300/1563 - train loss: 3.0584 - test loss: 17.0704 - train acc: 0.8371 - test acc: 0.4216 - 18m 55s
batch: 400/1563 - train loss: 3.2474 - test loss: 16.9926 - train acc: 0.8249 - test acc: 0.4296 - 19m 0s
batch: 500/1563 - train loss: 3.4730 - test loss: 16.6137 - train acc: 0.8065 - test acc: 0.4256 - 19m 6s
batch: 600/1563 - train loss: 3.4379 - test loss: 16.7765 - train acc: 0.8037 - test acc: 0.4221 - 19m 12s
batch: 700/1563 - train loss: 3.6643 - 

FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 512 - partition: 30 - nº part: 18 - block updates: 18
FisherPartitioner: param: 51200 - partition: 30 - nº part: 1707 - block updates: 360
FisherPartitioner: param: 100 - partition: 30 - nº part: 4 - block updates: 4
total partitions: 374302 - effective block updates: 7782
initializing buffers and inverses...
partition 1/374302
partition 10000/374302
partition 20000/374302
partition 30000/374302
partition 40000/374302
partition 50000/374302
partition 60000/374302
partition 70000/374302
partition 80000/374302
partition 90000/374302
partition 100000/374302
partition 110000/374302
partition 120000/374302
partition 130000/374302
partition 140000/374302
partition 150000/374302
partition 160000/374302
partition 170000/374302
partition 180000/374302
partition 190000/374302
partition 200000/374302
partition 210000/374302
partition 220000/374302
partition 230000/374302
partition 240000/3743

batch: 1400/1563 - train loss: 13.1178 - test loss: 14.1043 - train acc: 0.3990 - test acc: 0.3626 - 5m 31s
batch: 1500/1563 - train loss: 13.1247 - test loss: 14.4040 - train acc: 0.3925 - test acc: 0.3528 - 5m 36s
batch: 1563/1563 - train loss: 13.0547 - test loss: 14.6232 - train acc: 0.3928 - test acc: 0.3491 - 5m 41s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 5/100
batch: 100/1563 - train loss: 11.5288 - test loss: 14.1530 - train acc: 0.4478 - test acc: 0.3724 - 5m 47s
batch: 200/1563 - train loss: 11.5956 - test loss: 15.3814 - train acc: 0.4453 - test acc: 0.3316 - 5m 52s
batch: 300/1563 - train loss: 11.9684 - test loss: 14.4792 - train acc: 0.4332 - test acc: 0.3613 - 5m 57s
batch: 400/1563 - train loss: 11.7726 - test loss: 15.0461 - train acc: 0.4475 - test acc: 0.3561 - 6m 3s
batch: 500/1563 - train loss: 11.8725 - test loss: 14.6411 - train acc: 0.4385 - test acc: 0.3639 - 6m 8s
batch: 600/1563 - train loss: 11.9369 -

batch: 600/1563 - train loss: 7.6263 - test loss: 13.6577 - train acc: 0.6068 - test acc: 0.4148 - 12m 1s
batch: 700/1563 - train loss: 7.5079 - test loss: 14.1026 - train acc: 0.6156 - test acc: 0.4149 - 12m 7s
batch: 800/1563 - train loss: 7.8148 - test loss: 13.9718 - train acc: 0.6153 - test acc: 0.4188 - 12m 13s
batch: 900/1563 - train loss: 7.8648 - test loss: 14.2381 - train acc: 0.5984 - test acc: 0.4149 - 12m 18s
batch: 1000/1563 - train loss: 7.9081 - test loss: 13.5798 - train acc: 0.6063 - test acc: 0.4183 - 12m 24s
batch: 1100/1563 - train loss: 7.7278 - test loss: 13.7997 - train acc: 0.6112 - test acc: 0.4235 - 12m 29s
batch: 1200/1563 - train loss: 7.7515 - test loss: 14.1103 - train acc: 0.6100 - test acc: 0.4114 - 12m 35s
batch: 1300/1563 - train loss: 7.9826 - test loss: 14.1105 - train acc: 0.5972 - test acc: 0.4216 - 12m 40s
batch: 1400/1563 - train loss: 8.3468 - test loss: 14.3711 - train acc: 0.5787 - test acc: 0.4107 - 12m 46s
batch: 1500/1563 - train loss: 8.2

batch: 1500/1563 - train loss: 4.8618 - test loss: 15.8962 - train acc: 0.7359 - test acc: 0.4271 - 18m 39s
batch: 1563/1563 - train loss: 5.1462 - test loss: 16.4639 - train acc: 0.7285 - test acc: 0.4069 - 18m 44s
GPU memory used: 7.24 GB - max: 7.56 GB - memory reserved: 7.62 GB - max: 7.62 GB
starting epoch: 14/100
batch: 100/1563 - train loss: 2.9321 - test loss: 15.9505 - train acc: 0.8356 - test acc: 0.4329 - 18m 49s
batch: 200/1563 - train loss: 3.0128 - test loss: 16.4292 - train acc: 0.8296 - test acc: 0.4265 - 18m 55s
batch: 300/1563 - train loss: 3.0304 - test loss: 16.5596 - train acc: 0.8337 - test acc: 0.4287 - 19m 0s
batch: 400/1563 - train loss: 3.0562 - test loss: 16.5810 - train acc: 0.8350 - test acc: 0.4219 - 19m 5s
batch: 500/1563 - train loss: 3.4083 - test loss: 16.4013 - train acc: 0.8118 - test acc: 0.4371 - 19m 11s
batch: 600/1563 - train loss: 3.5625 - test loss: 16.8888 - train acc: 0.7965 - test acc: 0.4215 - 19m 16s
batch: 700/1563 - train loss: 3.5852 - 

## testing baseline

In [27]:
for _ in range(nruns):

    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        
    default_metrics, _ = train_network_fisher_optimization(apply_fisher = False,
                                                           net_params = {'p': 0.1},
                                                           epochs = 100,
                                                           time_limit_secs = 1200)

    results_list.append( (default_metrics, buffer_size, partition_size, block_updates) )
    results_list_to_json(results_list, step=step_i)
    step_i += 1
    
    print()

generating CIFAR100 data with 100 classes
Files already downloaded and verified
Files already downloaded and verified
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 16, 16]           9,408
       BatchNorm2d-2           [-1, 64, 16, 16]             128
              ReLU-3           [-1, 64, 16, 16]               0
         MaxPool2d-4             [-1, 64, 8, 8]               0
            Conv2d-5             [-1, 64, 8, 8]          36,864
       BatchNorm2d-6             [-1, 64, 8, 8]             128
              ReLU-7             [-1, 64, 8, 8]               0
            Conv2d-8             [-1, 64, 8, 8]          36,864
       BatchNorm2d-9             [-1, 64, 8, 8]             128
             ReLU-10             [-1, 64, 8, 8]               0
       BasicBlock-11             [-1, 64, 8, 8]               0
           Conv2d-12             [-1, 64, 8, 8]  

batch: 1400/1563 - train loss: 16.5994 - test loss: 16.3922 - train acc: 0.2650 - test acc: 0.2778 - 2m 18s
batch: 1500/1563 - train loss: 16.6086 - test loss: 16.2864 - train acc: 0.2759 - test acc: 0.2772 - 2m 24s
batch: 1563/1563 - train loss: 16.0743 - test loss: 17.1867 - train acc: 0.2912 - test acc: 0.2572 - 2m 28s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 3/100
batch: 100/1563 - train loss: 15.4027 - test loss: 16.0116 - train acc: 0.2919 - test acc: 0.2940 - 2m 32s
batch: 200/1563 - train loss: 14.9696 - test loss: 16.0886 - train acc: 0.3199 - test acc: 0.2873 - 2m 37s
batch: 300/1563 - train loss: 15.0037 - test loss: 16.4906 - train acc: 0.3256 - test acc: 0.2757 - 2m 42s
batch: 400/1563 - train loss: 15.0565 - test loss: 16.3459 - train acc: 0.3187 - test acc: 0.2868 - 2m 47s
batch: 500/1563 - train loss: 15.2491 - test loss: 15.8935 - train acc: 0.3115 - test acc: 0.2981 - 2m 51s
batch: 600/1563 - train loss: 15.0938

batch: 600/1563 - train loss: 9.6747 - test loss: 13.5474 - train acc: 0.5284 - test acc: 0.4057 - 8m 3s
batch: 700/1563 - train loss: 9.7440 - test loss: 14.6558 - train acc: 0.5168 - test acc: 0.3814 - 8m 7s
batch: 800/1563 - train loss: 9.5898 - test loss: 14.0397 - train acc: 0.5225 - test acc: 0.3918 - 8m 12s
batch: 900/1563 - train loss: 9.6970 - test loss: 19.5851 - train acc: 0.5213 - test acc: 0.2809 - 8m 17s
batch: 1000/1563 - train loss: 9.8432 - test loss: 14.2269 - train acc: 0.5185 - test acc: 0.3866 - 8m 22s
batch: 1100/1563 - train loss: 9.7207 - test loss: 14.2599 - train acc: 0.5343 - test acc: 0.3798 - 8m 27s
batch: 1200/1563 - train loss: 9.6413 - test loss: 13.2499 - train acc: 0.5247 - test acc: 0.4130 - 8m 32s
batch: 1300/1563 - train loss: 9.5846 - test loss: 14.5653 - train acc: 0.5294 - test acc: 0.3806 - 8m 37s
batch: 1400/1563 - train loss: 9.8692 - test loss: 13.3567 - train acc: 0.5250 - test acc: 0.4141 - 8m 41s
batch: 1500/1563 - train loss: 9.7465 - tes

batch: 1500/1563 - train loss: 6.1555 - test loss: 15.1252 - train acc: 0.6750 - test acc: 0.4201 - 13m 57s
batch: 1563/1563 - train loss: 6.0714 - test loss: 15.6622 - train acc: 0.6738 - test acc: 0.4047 - 14m 1s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 12/100
batch: 100/1563 - train loss: 3.5342 - test loss: 15.8188 - train acc: 0.8106 - test acc: 0.4111 - 14m 6s
batch: 200/1563 - train loss: 3.5778 - test loss: 15.9947 - train acc: 0.8041 - test acc: 0.4138 - 14m 11s
batch: 300/1563 - train loss: 3.8025 - test loss: 16.2076 - train acc: 0.7856 - test acc: 0.4090 - 14m 16s
batch: 400/1563 - train loss: 4.0508 - test loss: 15.4642 - train acc: 0.7828 - test acc: 0.4238 - 14m 21s
batch: 500/1563 - train loss: 3.8725 - test loss: 16.0506 - train acc: 0.7871 - test acc: 0.4183 - 14m 26s
batch: 600/1563 - train loss: 4.1937 - test loss: 16.5637 - train acc: 0.7609 - test acc: 0.3951 - 14m 30s
batch: 700/1563 - train loss: 4.3481 - 

batch: 700/1563 - train loss: 1.9249 - test loss: 18.7599 - train acc: 0.8973 - test acc: 0.4172 - 19m 47s
batch: 800/1563 - train loss: 2.1370 - test loss: 19.2206 - train acc: 0.8775 - test acc: 0.4088 - 19m 52s
batch: 900/1563 - train loss: 2.0806 - test loss: 18.9534 - train acc: 0.8829 - test acc: 0.4244 - 19m 57s
time is up! finishing training
batch: 904/1563 - train loss: 2.1181 - test loss: 18.8514 - train acc: 0.8810 - test acc: 0.4282 - 20m 1s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB

generating CIFAR100 data with 100 classes
Files already downloaded and verified
Files already downloaded and verified
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 16, 16]           9,408
       BatchNorm2d-2           [-1, 64, 16, 16]             128
              ReLU-3           [-1, 64, 16, 16]               0
         MaxPool2d-4

batch: 900/1563 - train loss: 16.8669 - test loss: 17.3335 - train acc: 0.2550 - test acc: 0.2511 - 1m 53s
batch: 1000/1563 - train loss: 16.9492 - test loss: 17.2998 - train acc: 0.2644 - test acc: 0.2505 - 1m 58s
batch: 1100/1563 - train loss: 16.7208 - test loss: 16.8097 - train acc: 0.2488 - test acc: 0.2617 - 2m 3s
batch: 1200/1563 - train loss: 16.6467 - test loss: 18.3612 - train acc: 0.2629 - test acc: 0.2268 - 2m 7s
batch: 1300/1563 - train loss: 16.4613 - test loss: 16.7528 - train acc: 0.2675 - test acc: 0.2580 - 2m 12s
batch: 1400/1563 - train loss: 15.9221 - test loss: 19.0687 - train acc: 0.2840 - test acc: 0.2115 - 2m 17s
batch: 1500/1563 - train loss: 16.4017 - test loss: 16.0180 - train acc: 0.2706 - test acc: 0.2971 - 2m 22s
batch: 1563/1563 - train loss: 16.0499 - test loss: 16.5127 - train acc: 0.2897 - test acc: 0.2815 - 2m 26s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 3/100
batch: 100/1563 - train loss: 14.94

batch: 100/1563 - train loss: 8.4287 - test loss: 13.4930 - train acc: 0.5913 - test acc: 0.4036 - 7m 29s
batch: 200/1563 - train loss: 8.5792 - test loss: 14.0574 - train acc: 0.5809 - test acc: 0.3917 - 7m 34s
batch: 300/1563 - train loss: 8.7673 - test loss: 14.0559 - train acc: 0.5622 - test acc: 0.3938 - 7m 38s
batch: 400/1563 - train loss: 8.8352 - test loss: 13.9196 - train acc: 0.5593 - test acc: 0.3904 - 7m 44s
batch: 500/1563 - train loss: 9.1918 - test loss: 14.3751 - train acc: 0.5465 - test acc: 0.3863 - 7m 48s
batch: 600/1563 - train loss: 9.3123 - test loss: 14.3344 - train acc: 0.5434 - test acc: 0.3815 - 7m 53s
batch: 700/1563 - train loss: 9.6878 - test loss: 13.5451 - train acc: 0.5175 - test acc: 0.4066 - 7m 58s
batch: 800/1563 - train loss: 9.4501 - test loss: 15.9669 - train acc: 0.5347 - test acc: 0.3411 - 8m 3s
batch: 900/1563 - train loss: 9.6771 - test loss: 14.8384 - train acc: 0.5243 - test acc: 0.3764 - 8m 8s
batch: 1000/1563 - train loss: 9.4780 - test los

batch: 1000/1563 - train loss: 5.4351 - test loss: 15.5298 - train acc: 0.7025 - test acc: 0.4056 - 13m 19s
batch: 1100/1563 - train loss: 5.4887 - test loss: 17.0953 - train acc: 0.7116 - test acc: 0.3849 - 13m 24s
batch: 1200/1563 - train loss: 5.7581 - test loss: 14.8228 - train acc: 0.7000 - test acc: 0.4213 - 13m 29s
batch: 1300/1563 - train loss: 5.8268 - test loss: 15.8895 - train acc: 0.6822 - test acc: 0.3927 - 13m 34s
batch: 1400/1563 - train loss: 5.8961 - test loss: 15.5404 - train acc: 0.6800 - test acc: 0.4118 - 13m 39s
batch: 1500/1563 - train loss: 5.9468 - test loss: 15.3945 - train acc: 0.6800 - test acc: 0.4137 - 13m 44s
batch: 1563/1563 - train loss: 5.8506 - test loss: 15.4054 - train acc: 0.6866 - test acc: 0.4113 - 13m 48s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 12/100
batch: 100/1563 - train loss: 3.5342 - test loss: 15.9912 - train acc: 0.8096 - test acc: 0.4180 - 13m 53s
batch: 200/1563 - train loss: 3.

batch: 200/1563 - train loss: 1.6074 - test loss: 18.9198 - train acc: 0.9117 - test acc: 0.4178 - 19m 9s
batch: 300/1563 - train loss: 1.6781 - test loss: 18.2176 - train acc: 0.9066 - test acc: 0.4267 - 19m 14s
batch: 400/1563 - train loss: 1.7017 - test loss: 18.3185 - train acc: 0.9067 - test acc: 0.4291 - 19m 18s
batch: 500/1563 - train loss: 1.8189 - test loss: 18.4111 - train acc: 0.9045 - test acc: 0.4234 - 19m 24s
batch: 600/1563 - train loss: 1.8477 - test loss: 19.2300 - train acc: 0.8954 - test acc: 0.4123 - 19m 28s
batch: 700/1563 - train loss: 1.9854 - test loss: 19.3716 - train acc: 0.8869 - test acc: 0.4187 - 19m 33s
batch: 800/1563 - train loss: 2.2052 - test loss: 18.6383 - train acc: 0.8750 - test acc: 0.4226 - 19m 38s
batch: 900/1563 - train loss: 2.2969 - test loss: 19.1953 - train acc: 0.8734 - test acc: 0.4085 - 19m 43s
batch: 1000/1563 - train loss: 2.4052 - test loss: 18.9707 - train acc: 0.8640 - test acc: 0.4149 - 19m 48s
batch: 1100/1563 - train loss: 2.2836

batch: 100/1563 - train loss: 17.9058 - test loss: 19.5544 - train acc: 0.2325 - test acc: 0.1881 - 1m 21s
batch: 200/1563 - train loss: 18.0889 - test loss: 18.4283 - train acc: 0.2173 - test acc: 0.2177 - 1m 26s
batch: 300/1563 - train loss: 17.6947 - test loss: 18.1962 - train acc: 0.2291 - test acc: 0.2147 - 1m 30s
batch: 400/1563 - train loss: 17.7159 - test loss: 19.2379 - train acc: 0.2275 - test acc: 0.1891 - 1m 35s
batch: 500/1563 - train loss: 17.8140 - test loss: 17.9209 - train acc: 0.2284 - test acc: 0.2304 - 1m 40s
batch: 600/1563 - train loss: 17.5799 - test loss: 18.1186 - train acc: 0.2353 - test acc: 0.2174 - 1m 45s
batch: 700/1563 - train loss: 17.1568 - test loss: 17.5427 - train acc: 0.2525 - test acc: 0.2404 - 1m 50s
batch: 800/1563 - train loss: 17.0778 - test loss: 18.5728 - train acc: 0.2528 - test acc: 0.2142 - 1m 55s
batch: 900/1563 - train loss: 16.9982 - test loss: 18.2012 - train acc: 0.2469 - test acc: 0.2232 - 2m 0s
batch: 1000/1563 - train loss: 17.0117

batch: 1000/1563 - train loss: 10.8673 - test loss: 13.7312 - train acc: 0.4762 - test acc: 0.3892 - 7m 23s
batch: 1100/1563 - train loss: 10.7140 - test loss: 14.0320 - train acc: 0.4810 - test acc: 0.3792 - 7m 28s
batch: 1200/1563 - train loss: 10.6735 - test loss: 13.2459 - train acc: 0.4934 - test acc: 0.4052 - 7m 33s
batch: 1300/1563 - train loss: 10.8040 - test loss: 13.5924 - train acc: 0.4775 - test acc: 0.3949 - 7m 37s
batch: 1400/1563 - train loss: 11.0192 - test loss: 13.4075 - train acc: 0.4613 - test acc: 0.3972 - 7m 42s
batch: 1500/1563 - train loss: 10.8686 - test loss: 14.1508 - train acc: 0.4900 - test acc: 0.3876 - 7m 47s
batch: 1563/1563 - train loss: 10.7482 - test loss: 13.5866 - train acc: 0.4822 - test acc: 0.3915 - 7m 51s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 7/100
batch: 100/1563 - train loss: 8.7132 - test loss: 14.8975 - train acc: 0.5725 - test acc: 0.3723 - 7m 56s
batch: 200/1563 - train loss: 8.70

batch: 200/1563 - train loss: 4.0293 - test loss: 16.2892 - train acc: 0.7775 - test acc: 0.3939 - 13m 37s
batch: 300/1563 - train loss: 4.5684 - test loss: 15.6455 - train acc: 0.7503 - test acc: 0.4030 - 13m 43s
batch: 400/1563 - train loss: 4.4297 - test loss: 15.4152 - train acc: 0.7690 - test acc: 0.4222 - 13m 48s
batch: 500/1563 - train loss: 4.8252 - test loss: 15.7031 - train acc: 0.7366 - test acc: 0.4149 - 13m 53s
batch: 600/1563 - train loss: 4.9402 - test loss: 15.5816 - train acc: 0.7406 - test acc: 0.4063 - 13m 59s
batch: 700/1563 - train loss: 5.1578 - test loss: 15.8097 - train acc: 0.7237 - test acc: 0.4089 - 14m 4s
batch: 800/1563 - train loss: 5.2686 - test loss: 15.7701 - train acc: 0.7197 - test acc: 0.4028 - 14m 10s
batch: 900/1563 - train loss: 5.4631 - test loss: 15.8871 - train acc: 0.7066 - test acc: 0.4042 - 14m 15s
batch: 1000/1563 - train loss: 5.6052 - test loss: 16.5322 - train acc: 0.6978 - test acc: 0.3888 - 14m 20s
batch: 1100/1563 - train loss: 5.4888

batch: 1100/1563 - train loss: 2.6001 - test loss: 18.2741 - train acc: 0.8484 - test acc: 0.4213 - 19m 51s
batch: 1200/1563 - train loss: 2.9164 - test loss: 19.2842 - train acc: 0.8415 - test acc: 0.3981 - 19m 56s
time is up! finishing training
batch: 1273/1563 - train loss: 3.1680 - test loss: 19.0131 - train acc: 0.8224 - test acc: 0.4036 - 20m 1s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB

generating CIFAR100 data with 100 classes
Files already downloaded and verified
Files already downloaded and verified
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 16, 16]           9,408
       BatchNorm2d-2           [-1, 64, 16, 16]             128
              ReLU-3           [-1, 64, 16, 16]               0
         MaxPool2d-4             [-1, 64, 8, 8]               0
            Conv2d-5             [-1, 64, 8, 8]          36,

batch: 1000/1563 - train loss: 17.1044 - test loss: 17.9342 - train acc: 0.2584 - test acc: 0.2274 - 2m 2s
batch: 1100/1563 - train loss: 16.8692 - test loss: 16.6960 - train acc: 0.2631 - test acc: 0.2627 - 2m 7s
batch: 1200/1563 - train loss: 16.2386 - test loss: 16.9022 - train acc: 0.2812 - test acc: 0.2680 - 2m 11s
batch: 1300/1563 - train loss: 16.9368 - test loss: 16.7958 - train acc: 0.2475 - test acc: 0.2616 - 2m 16s
batch: 1400/1563 - train loss: 16.1336 - test loss: 16.1000 - train acc: 0.2887 - test acc: 0.2930 - 2m 22s
batch: 1500/1563 - train loss: 16.4228 - test loss: 16.3187 - train acc: 0.2793 - test acc: 0.2800 - 2m 27s
batch: 1563/1563 - train loss: 16.3786 - test loss: 16.7437 - train acc: 0.2800 - test acc: 0.2691 - 2m 31s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 3/100
batch: 100/1563 - train loss: 15.3585 - test loss: 16.7392 - train acc: 0.3146 - test acc: 0.2732 - 2m 35s
batch: 200/1563 - train loss: 15.15

batch: 200/1563 - train loss: 8.9073 - test loss: 13.7953 - train acc: 0.5569 - test acc: 0.3904 - 7m 53s
batch: 300/1563 - train loss: 9.1580 - test loss: 14.1005 - train acc: 0.5484 - test acc: 0.3905 - 7m 58s
batch: 400/1563 - train loss: 9.0046 - test loss: 13.9002 - train acc: 0.5581 - test acc: 0.3945 - 8m 3s
batch: 500/1563 - train loss: 9.3459 - test loss: 14.0670 - train acc: 0.5500 - test acc: 0.3943 - 8m 8s
batch: 600/1563 - train loss: 9.4159 - test loss: 13.6707 - train acc: 0.5381 - test acc: 0.3965 - 8m 13s
batch: 700/1563 - train loss: 9.4514 - test loss: 13.6222 - train acc: 0.5287 - test acc: 0.4060 - 8m 18s
batch: 800/1563 - train loss: 9.8953 - test loss: 14.1805 - train acc: 0.5300 - test acc: 0.3895 - 8m 23s
batch: 900/1563 - train loss: 9.6265 - test loss: 13.5621 - train acc: 0.5362 - test acc: 0.4134 - 8m 27s
batch: 1000/1563 - train loss: 9.6695 - test loss: 14.4418 - train acc: 0.5347 - test acc: 0.3827 - 8m 33s
batch: 1100/1563 - train loss: 9.7727 - test lo

batch: 1100/1563 - train loss: 5.7734 - test loss: 16.4807 - train acc: 0.6984 - test acc: 0.3889 - 13m 51s
batch: 1200/1563 - train loss: 6.0952 - test loss: 15.2885 - train acc: 0.6844 - test acc: 0.4088 - 13m 56s
batch: 1300/1563 - train loss: 5.8520 - test loss: 15.5170 - train acc: 0.6887 - test acc: 0.4038 - 14m 1s
batch: 1400/1563 - train loss: 5.8045 - test loss: 15.1164 - train acc: 0.6994 - test acc: 0.4121 - 14m 6s
batch: 1500/1563 - train loss: 6.1911 - test loss: 15.8156 - train acc: 0.6744 - test acc: 0.4107 - 14m 11s
batch: 1563/1563 - train loss: 6.0832 - test loss: 16.2307 - train acc: 0.6747 - test acc: 0.4009 - 14m 16s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 12/100
batch: 100/1563 - train loss: 3.4555 - test loss: 16.1088 - train acc: 0.8183 - test acc: 0.4245 - 14m 21s
batch: 200/1563 - train loss: 3.7037 - test loss: 15.8625 - train acc: 0.8031 - test acc: 0.4224 - 14m 26s
batch: 300/1563 - train loss: 3.803

batch: 300/1563 - train loss: 1.7056 - test loss: 19.0010 - train acc: 0.9017 - test acc: 0.4131 - 19m 53s
batch: 400/1563 - train loss: 1.8624 - test loss: 19.1594 - train acc: 0.8907 - test acc: 0.4133 - 19m 58s
time is up! finishing training
batch: 401/1563 - train loss: 1.8711 - test loss: 18.9203 - train acc: 0.8894 - test acc: 0.4187 - 20m 1s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB

generating CIFAR100 data with 100 classes
Files already downloaded and verified
Files already downloaded and verified
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 16, 16]           9,408
       BatchNorm2d-2           [-1, 64, 16, 16]             128
              ReLU-3           [-1, 64, 16, 16]               0
         MaxPool2d-4             [-1, 64, 8, 8]               0
            Conv2d-5             [-1, 64, 8, 8]          36,864

batch: 1000/1563 - train loss: 17.0287 - test loss: 17.5684 - train acc: 0.2566 - test acc: 0.2438 - 2m 7s
batch: 1100/1563 - train loss: 16.8613 - test loss: 17.6764 - train acc: 0.2609 - test acc: 0.2381 - 2m 12s
batch: 1200/1563 - train loss: 16.9558 - test loss: 16.9006 - train acc: 0.2559 - test acc: 0.2606 - 2m 17s
batch: 1300/1563 - train loss: 16.5672 - test loss: 16.6117 - train acc: 0.2660 - test acc: 0.2731 - 2m 22s
batch: 1400/1563 - train loss: 16.7932 - test loss: 17.1023 - train acc: 0.2597 - test acc: 0.2505 - 2m 27s
batch: 1500/1563 - train loss: 16.3768 - test loss: 16.6821 - train acc: 0.2772 - test acc: 0.2707 - 2m 32s
batch: 1563/1563 - train loss: 16.3743 - test loss: 16.8065 - train acc: 0.2831 - test acc: 0.2700 - 2m 36s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 3/100
batch: 100/1563 - train loss: 15.3793 - test loss: 15.9374 - train acc: 0.3134 - test acc: 0.2927 - 2m 41s
batch: 200/1563 - train loss: 15.3

batch: 200/1563 - train loss: 8.8758 - test loss: 13.5283 - train acc: 0.5531 - test acc: 0.4050 - 8m 4s
batch: 300/1563 - train loss: 8.9308 - test loss: 13.9350 - train acc: 0.5550 - test acc: 0.3963 - 8m 10s
batch: 400/1563 - train loss: 9.0564 - test loss: 14.1407 - train acc: 0.5572 - test acc: 0.3845 - 8m 15s
batch: 500/1563 - train loss: 9.3378 - test loss: 14.9629 - train acc: 0.5343 - test acc: 0.3662 - 8m 20s
batch: 600/1563 - train loss: 9.7339 - test loss: 13.5290 - train acc: 0.5250 - test acc: 0.4117 - 8m 25s
batch: 700/1563 - train loss: 9.3177 - test loss: 13.6490 - train acc: 0.5431 - test acc: 0.4179 - 8m 30s
batch: 800/1563 - train loss: 9.8339 - test loss: 14.5507 - train acc: 0.5240 - test acc: 0.3865 - 8m 35s
batch: 900/1563 - train loss: 9.7029 - test loss: 13.6415 - train acc: 0.5347 - test acc: 0.4044 - 8m 40s
batch: 1000/1563 - train loss: 9.8735 - test loss: 13.9574 - train acc: 0.5146 - test acc: 0.3947 - 8m 45s
batch: 1100/1563 - train loss: 9.7966 - test l

batch: 1100/1563 - train loss: 5.7124 - test loss: 15.5956 - train acc: 0.6916 - test acc: 0.4050 - 14m 10s
batch: 1200/1563 - train loss: 5.7660 - test loss: 15.1385 - train acc: 0.6875 - test acc: 0.4226 - 14m 15s
batch: 1300/1563 - train loss: 6.0759 - test loss: 14.8302 - train acc: 0.6851 - test acc: 0.4144 - 14m 20s
batch: 1400/1563 - train loss: 6.0452 - test loss: 15.9936 - train acc: 0.6766 - test acc: 0.3970 - 14m 25s
batch: 1500/1563 - train loss: 6.0656 - test loss: 15.7276 - train acc: 0.6737 - test acc: 0.4155 - 14m 30s
batch: 1563/1563 - train loss: 6.1378 - test loss: 15.0441 - train acc: 0.6757 - test acc: 0.4197 - 14m 35s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 12/100
batch: 100/1563 - train loss: 3.6596 - test loss: 15.2490 - train acc: 0.8006 - test acc: 0.4249 - 14m 40s
batch: 200/1563 - train loss: 3.6513 - test loss: 15.8413 - train acc: 0.8056 - test acc: 0.4189 - 14m 45s
batch: 300/1563 - train loss: 3.7


generating CIFAR100 data with 100 classes
Files already downloaded and verified
Files already downloaded and verified
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 64, 16, 16]           9,408
       BatchNorm2d-2           [-1, 64, 16, 16]             128
              ReLU-3           [-1, 64, 16, 16]               0
         MaxPool2d-4             [-1, 64, 8, 8]               0
            Conv2d-5             [-1, 64, 8, 8]          36,864
       BatchNorm2d-6             [-1, 64, 8, 8]             128
              ReLU-7             [-1, 64, 8, 8]               0
            Conv2d-8             [-1, 64, 8, 8]          36,864
       BatchNorm2d-9             [-1, 64, 8, 8]             128
             ReLU-10             [-1, 64, 8, 8]               0
       BasicBlock-11             [-1, 64, 8, 8]               0
           Conv2d-12             [-1, 64, 8, 8] 

batch: 1400/1563 - train loss: 16.7167 - test loss: 16.1021 - train acc: 0.2753 - test acc: 0.2921 - 2m 23s
batch: 1500/1563 - train loss: 16.1845 - test loss: 17.1306 - train acc: 0.2843 - test acc: 0.2659 - 2m 28s
batch: 1563/1563 - train loss: 16.2618 - test loss: 16.8042 - train acc: 0.2803 - test acc: 0.2725 - 2m 32s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 3/100
batch: 100/1563 - train loss: 15.1243 - test loss: 18.7233 - train acc: 0.3221 - test acc: 0.2253 - 2m 37s
batch: 200/1563 - train loss: 15.3545 - test loss: 16.1629 - train acc: 0.3068 - test acc: 0.3005 - 2m 42s
batch: 300/1563 - train loss: 15.3607 - test loss: 16.9258 - train acc: 0.3071 - test acc: 0.2686 - 2m 47s
batch: 400/1563 - train loss: 15.2661 - test loss: 16.8851 - train acc: 0.3081 - test acc: 0.2845 - 2m 52s
batch: 500/1563 - train loss: 15.4954 - test loss: 16.2356 - train acc: 0.3140 - test acc: 0.2928 - 2m 57s
batch: 600/1563 - train loss: 15.0388

batch: 600/1563 - train loss: 9.1837 - test loss: 15.8273 - train acc: 0.5453 - test acc: 0.3503 - 8m 19s
batch: 700/1563 - train loss: 9.7248 - test loss: 13.5623 - train acc: 0.5260 - test acc: 0.4066 - 8m 23s
batch: 800/1563 - train loss: 9.8079 - test loss: 13.7231 - train acc: 0.5259 - test acc: 0.4042 - 8m 28s
batch: 900/1563 - train loss: 10.0132 - test loss: 13.6791 - train acc: 0.5163 - test acc: 0.4035 - 8m 33s
batch: 1000/1563 - train loss: 9.9901 - test loss: 13.7310 - train acc: 0.5138 - test acc: 0.4056 - 8m 38s
batch: 1100/1563 - train loss: 9.7010 - test loss: 13.3049 - train acc: 0.5353 - test acc: 0.4070 - 8m 43s
batch: 1200/1563 - train loss: 9.5203 - test loss: 13.4708 - train acc: 0.5222 - test acc: 0.4057 - 8m 48s
batch: 1300/1563 - train loss: 9.7788 - test loss: 13.8184 - train acc: 0.5231 - test acc: 0.3987 - 8m 53s
batch: 1400/1563 - train loss: 10.1728 - test loss: 14.3975 - train acc: 0.5085 - test acc: 0.3768 - 8m 58s
batch: 1500/1563 - train loss: 9.9297 -

batch: 1500/1563 - train loss: 6.4538 - test loss: 14.7082 - train acc: 0.6644 - test acc: 0.4206 - 14m 18s
batch: 1563/1563 - train loss: 6.4238 - test loss: 15.0024 - train acc: 0.6713 - test acc: 0.4103 - 14m 22s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 12/100
batch: 100/1563 - train loss: 3.7537 - test loss: 14.8353 - train acc: 0.8021 - test acc: 0.4316 - 14m 27s
batch: 200/1563 - train loss: 4.0086 - test loss: 15.2890 - train acc: 0.7878 - test acc: 0.4247 - 14m 32s
batch: 300/1563 - train loss: 4.1391 - test loss: 16.0360 - train acc: 0.7737 - test acc: 0.4135 - 14m 37s
batch: 400/1563 - train loss: 4.2072 - test loss: 15.4531 - train acc: 0.7716 - test acc: 0.4236 - 14m 42s
batch: 500/1563 - train loss: 4.1613 - test loss: 15.6804 - train acc: 0.7812 - test acc: 0.4191 - 14m 47s
batch: 600/1563 - train loss: 4.7025 - test loss: 16.1619 - train acc: 0.7431 - test acc: 0.4152 - 14m 52s
batch: 700/1563 - train loss: 4.6353 

batch: 100/1563 - train loss: 26.1413 - test loss: 25.1279 - train acc: 0.0420 - test acc: 0.0666 - 0m 1s
batch: 200/1563 - train loss: 24.2340 - test loss: 23.2539 - train acc: 0.0651 - test acc: 0.0838 - 0m 6s
batch: 300/1563 - train loss: 23.0261 - test loss: 22.8523 - train acc: 0.0868 - test acc: 0.0908 - 0m 11s
batch: 400/1563 - train loss: 22.3180 - test loss: 21.4572 - train acc: 0.0996 - test acc: 0.1221 - 0m 16s
batch: 500/1563 - train loss: 21.8881 - test loss: 21.0903 - train acc: 0.1169 - test acc: 0.1349 - 0m 21s
batch: 600/1563 - train loss: 21.3262 - test loss: 20.8301 - train acc: 0.1218 - test acc: 0.1372 - 0m 26s
batch: 700/1563 - train loss: 20.9360 - test loss: 20.6491 - train acc: 0.1369 - test acc: 0.1441 - 0m 31s
batch: 800/1563 - train loss: 20.6004 - test loss: 20.3420 - train acc: 0.1419 - test acc: 0.1549 - 0m 36s
batch: 900/1563 - train loss: 20.5764 - test loss: 20.3421 - train acc: 0.1519 - test acc: 0.1573 - 0m 41s
batch: 1000/1563 - train loss: 20.1110 

batch: 1000/1563 - train loss: 12.3056 - test loss: 14.9795 - train acc: 0.4278 - test acc: 0.3446 - 5m 54s
batch: 1100/1563 - train loss: 12.1769 - test loss: 14.7823 - train acc: 0.4259 - test acc: 0.3521 - 5m 59s
batch: 1200/1563 - train loss: 12.1395 - test loss: 13.6887 - train acc: 0.4278 - test acc: 0.3834 - 6m 4s
batch: 1300/1563 - train loss: 12.0825 - test loss: 13.8089 - train acc: 0.4297 - test acc: 0.3723 - 6m 9s
batch: 1400/1563 - train loss: 12.2734 - test loss: 13.9290 - train acc: 0.4266 - test acc: 0.3799 - 6m 14s
batch: 1500/1563 - train loss: 12.3146 - test loss: 15.0296 - train acc: 0.4216 - test acc: 0.3410 - 6m 18s
batch: 1563/1563 - train loss: 12.2270 - test loss: 13.6223 - train acc: 0.4203 - test acc: 0.3878 - 6m 22s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 6/100
batch: 100/1563 - train loss: 10.3183 - test loss: 14.2680 - train acc: 0.4928 - test acc: 0.3747 - 6m 27s
batch: 200/1563 - train loss: 10.16

batch: 200/1563 - train loss: 5.5523 - test loss: 14.9269 - train acc: 0.7063 - test acc: 0.4024 - 11m 42s
batch: 300/1563 - train loss: 6.0519 - test loss: 15.2459 - train acc: 0.6844 - test acc: 0.3979 - 11m 47s
batch: 400/1563 - train loss: 6.0495 - test loss: 15.5086 - train acc: 0.6885 - test acc: 0.3923 - 11m 52s
batch: 500/1563 - train loss: 6.0874 - test loss: 14.8435 - train acc: 0.6853 - test acc: 0.4016 - 11m 57s
batch: 600/1563 - train loss: 6.6062 - test loss: 14.6215 - train acc: 0.6544 - test acc: 0.4096 - 12m 2s
batch: 700/1563 - train loss: 6.7033 - test loss: 14.9255 - train acc: 0.6600 - test acc: 0.4035 - 12m 7s
batch: 800/1563 - train loss: 6.8209 - test loss: 15.0935 - train acc: 0.6466 - test acc: 0.3994 - 12m 12s
batch: 900/1563 - train loss: 6.8351 - test loss: 14.8991 - train acc: 0.6378 - test acc: 0.3992 - 12m 16s
batch: 1000/1563 - train loss: 6.6655 - test loss: 15.4489 - train acc: 0.6556 - test acc: 0.3902 - 12m 21s
batch: 1100/1563 - train loss: 6.8121 

batch: 1100/1563 - train loss: 3.6406 - test loss: 16.6938 - train acc: 0.8018 - test acc: 0.4255 - 17m 40s
batch: 1200/1563 - train loss: 3.6878 - test loss: 17.3696 - train acc: 0.8025 - test acc: 0.4011 - 17m 45s
batch: 1300/1563 - train loss: 3.6951 - test loss: 17.6257 - train acc: 0.7980 - test acc: 0.4002 - 17m 50s
batch: 1400/1563 - train loss: 3.8805 - test loss: 17.1659 - train acc: 0.7837 - test acc: 0.4104 - 17m 55s
batch: 1500/1563 - train loss: 3.7196 - test loss: 19.0015 - train acc: 0.7919 - test acc: 0.3904 - 18m 0s
batch: 1563/1563 - train loss: 4.0792 - test loss: 17.1438 - train acc: 0.7765 - test acc: 0.4199 - 18m 4s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 15/100
batch: 100/1563 - train loss: 2.1841 - test loss: 17.8457 - train acc: 0.8788 - test acc: 0.4101 - 18m 9s
batch: 200/1563 - train loss: 2.2305 - test loss: 16.9811 - train acc: 0.8763 - test acc: 0.4318 - 18m 14s
batch: 300/1563 - train loss: 2.1817

batch: 100/1563 - train loss: 26.2498 - test loss: 24.6922 - train acc: 0.0320 - test acc: 0.0475 - 0m 1s
batch: 200/1563 - train loss: 24.0803 - test loss: 23.4666 - train acc: 0.0558 - test acc: 0.0738 - 0m 6s
batch: 300/1563 - train loss: 23.2021 - test loss: 22.6839 - train acc: 0.0758 - test acc: 0.0836 - 0m 12s
batch: 400/1563 - train loss: 22.5097 - test loss: 22.3857 - train acc: 0.0905 - test acc: 0.1020 - 0m 16s
batch: 500/1563 - train loss: 22.2769 - test loss: 21.6445 - train acc: 0.0974 - test acc: 0.1065 - 0m 21s
batch: 600/1563 - train loss: 21.6220 - test loss: 20.7097 - train acc: 0.1187 - test acc: 0.1413 - 0m 26s
batch: 700/1563 - train loss: 21.0084 - test loss: 20.8754 - train acc: 0.1369 - test acc: 0.1409 - 0m 31s
batch: 800/1563 - train loss: 20.9056 - test loss: 20.0174 - train acc: 0.1347 - test acc: 0.1648 - 0m 36s
batch: 900/1563 - train loss: 20.4324 - test loss: 20.1556 - train acc: 0.1454 - test acc: 0.1581 - 0m 42s
batch: 1000/1563 - train loss: 20.0404 

batch: 1000/1563 - train loss: 11.9181 - test loss: 13.9482 - train acc: 0.4388 - test acc: 0.3723 - 5m 59s
batch: 1100/1563 - train loss: 12.0480 - test loss: 14.4789 - train acc: 0.4400 - test acc: 0.3644 - 6m 5s
batch: 1200/1563 - train loss: 12.0388 - test loss: 13.7270 - train acc: 0.4290 - test acc: 0.3796 - 6m 9s
batch: 1300/1563 - train loss: 12.1664 - test loss: 14.4779 - train acc: 0.4325 - test acc: 0.3600 - 6m 14s
batch: 1400/1563 - train loss: 12.2764 - test loss: 13.9978 - train acc: 0.4222 - test acc: 0.3742 - 6m 19s
batch: 1500/1563 - train loss: 12.3340 - test loss: 14.5422 - train acc: 0.4147 - test acc: 0.3592 - 6m 24s
batch: 1563/1563 - train loss: 12.1987 - test loss: 14.3305 - train acc: 0.4288 - test acc: 0.3655 - 6m 28s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 6/100
batch: 100/1563 - train loss: 10.3594 - test loss: 14.5289 - train acc: 0.4997 - test acc: 0.3687 - 6m 33s
batch: 200/1563 - train loss: 10.37

batch: 200/1563 - train loss: 5.7168 - test loss: 16.0168 - train acc: 0.6997 - test acc: 0.3868 - 11m 53s
batch: 300/1563 - train loss: 5.7294 - test loss: 14.9295 - train acc: 0.6957 - test acc: 0.4034 - 11m 58s
batch: 400/1563 - train loss: 5.9495 - test loss: 14.7884 - train acc: 0.6931 - test acc: 0.4016 - 12m 3s
batch: 500/1563 - train loss: 5.9664 - test loss: 16.2718 - train acc: 0.6860 - test acc: 0.3802 - 12m 8s
batch: 600/1563 - train loss: 6.1125 - test loss: 15.8986 - train acc: 0.6841 - test acc: 0.3899 - 12m 13s
batch: 700/1563 - train loss: 6.4055 - test loss: 16.0047 - train acc: 0.6688 - test acc: 0.3865 - 12m 18s
batch: 800/1563 - train loss: 6.6417 - test loss: 15.0216 - train acc: 0.6544 - test acc: 0.4041 - 12m 23s
batch: 900/1563 - train loss: 6.3457 - test loss: 15.6117 - train acc: 0.6610 - test acc: 0.3868 - 12m 28s
batch: 1000/1563 - train loss: 6.9756 - test loss: 14.6366 - train acc: 0.6347 - test acc: 0.4132 - 12m 33s
batch: 1100/1563 - train loss: 6.7910 

batch: 1100/1563 - train loss: 3.2830 - test loss: 18.4872 - train acc: 0.8150 - test acc: 0.3938 - 17m 53s
batch: 1200/1563 - train loss: 3.7410 - test loss: 18.9282 - train acc: 0.7962 - test acc: 0.3921 - 17m 58s
batch: 1300/1563 - train loss: 3.9129 - test loss: 17.2794 - train acc: 0.7747 - test acc: 0.4145 - 18m 3s
batch: 1400/1563 - train loss: 3.8517 - test loss: 17.0673 - train acc: 0.7859 - test acc: 0.4166 - 18m 8s
batch: 1500/1563 - train loss: 3.9117 - test loss: 17.8774 - train acc: 0.7881 - test acc: 0.3989 - 18m 13s
batch: 1563/1563 - train loss: 4.0548 - test loss: 17.6463 - train acc: 0.7721 - test acc: 0.4065 - 18m 17s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 15/100
batch: 100/1563 - train loss: 2.0742 - test loss: 17.2181 - train acc: 0.8875 - test acc: 0.4173 - 18m 22s
batch: 200/1563 - train loss: 2.0737 - test loss: 17.3299 - train acc: 0.8810 - test acc: 0.4203 - 18m 27s
batch: 300/1563 - train loss: 2.329

batch: 100/1563 - train loss: 26.1700 - test loss: 25.6129 - train acc: 0.0367 - test acc: 0.0499 - 0m 2s
batch: 200/1563 - train loss: 24.1642 - test loss: 23.2894 - train acc: 0.0698 - test acc: 0.0765 - 0m 7s
batch: 300/1563 - train loss: 23.0673 - test loss: 22.0755 - train acc: 0.0817 - test acc: 0.0981 - 0m 12s
batch: 400/1563 - train loss: 22.5095 - test loss: 21.9980 - train acc: 0.0971 - test acc: 0.1089 - 0m 16s
batch: 500/1563 - train loss: 21.9867 - test loss: 22.0342 - train acc: 0.1112 - test acc: 0.1065 - 0m 21s
batch: 600/1563 - train loss: 21.4721 - test loss: 20.8377 - train acc: 0.1241 - test acc: 0.1401 - 0m 26s
batch: 700/1563 - train loss: 20.6984 - test loss: 20.9734 - train acc: 0.1319 - test acc: 0.1357 - 0m 31s
batch: 800/1563 - train loss: 20.6961 - test loss: 20.3889 - train acc: 0.1450 - test acc: 0.1480 - 0m 36s
batch: 900/1563 - train loss: 20.1303 - test loss: 21.0419 - train acc: 0.1573 - test acc: 0.1338 - 0m 41s
batch: 1000/1563 - train loss: 20.0788 

batch: 1000/1563 - train loss: 11.8135 - test loss: 14.0927 - train acc: 0.4428 - test acc: 0.3769 - 5m 53s
batch: 1100/1563 - train loss: 11.8543 - test loss: 14.5836 - train acc: 0.4353 - test acc: 0.3534 - 5m 58s
batch: 1200/1563 - train loss: 12.0000 - test loss: 13.9614 - train acc: 0.4338 - test acc: 0.3773 - 6m 3s
batch: 1300/1563 - train loss: 11.8078 - test loss: 15.2897 - train acc: 0.4366 - test acc: 0.3375 - 6m 7s
batch: 1400/1563 - train loss: 12.0655 - test loss: 13.8917 - train acc: 0.4325 - test acc: 0.3788 - 6m 12s
batch: 1500/1563 - train loss: 12.1163 - test loss: 14.1341 - train acc: 0.4362 - test acc: 0.3722 - 6m 17s
batch: 1563/1563 - train loss: 12.0964 - test loss: 13.5357 - train acc: 0.4403 - test acc: 0.3901 - 6m 21s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 6/100
batch: 100/1563 - train loss: 10.3166 - test loss: 14.8735 - train acc: 0.4910 - test acc: 0.3506 - 6m 27s
batch: 200/1563 - train loss: 10.32

batch: 200/1563 - train loss: 5.4876 - test loss: 14.9744 - train acc: 0.7110 - test acc: 0.4119 - 11m 37s
batch: 300/1563 - train loss: 6.1167 - test loss: 15.0811 - train acc: 0.6788 - test acc: 0.4052 - 11m 41s
batch: 400/1563 - train loss: 5.9396 - test loss: 15.0649 - train acc: 0.6906 - test acc: 0.4069 - 11m 47s
batch: 500/1563 - train loss: 6.2394 - test loss: 15.1537 - train acc: 0.6638 - test acc: 0.4013 - 11m 52s
batch: 600/1563 - train loss: 6.3018 - test loss: 14.7866 - train acc: 0.6697 - test acc: 0.4171 - 11m 56s
batch: 700/1563 - train loss: 6.4270 - test loss: 14.5882 - train acc: 0.6688 - test acc: 0.4177 - 12m 1s
batch: 800/1563 - train loss: 6.3708 - test loss: 14.7501 - train acc: 0.6657 - test acc: 0.4217 - 12m 6s
batch: 900/1563 - train loss: 6.5826 - test loss: 15.0464 - train acc: 0.6581 - test acc: 0.4062 - 12m 11s
batch: 1000/1563 - train loss: 6.6731 - test loss: 16.3807 - train acc: 0.6534 - test acc: 0.3868 - 12m 16s
batch: 1100/1563 - train loss: 6.6317 

batch: 1100/1563 - train loss: 3.7529 - test loss: 18.1386 - train acc: 0.7946 - test acc: 0.3995 - 17m 33s
batch: 1200/1563 - train loss: 3.9111 - test loss: 17.1014 - train acc: 0.7884 - test acc: 0.4214 - 17m 37s
batch: 1300/1563 - train loss: 3.5279 - test loss: 16.8568 - train acc: 0.7980 - test acc: 0.4276 - 17m 42s
batch: 1400/1563 - train loss: 3.6758 - test loss: 17.8342 - train acc: 0.7959 - test acc: 0.4048 - 17m 47s
batch: 1500/1563 - train loss: 3.8963 - test loss: 17.7599 - train acc: 0.7853 - test acc: 0.4059 - 17m 52s
batch: 1563/1563 - train loss: 3.7681 - test loss: 17.2786 - train acc: 0.7943 - test acc: 0.4069 - 17m 56s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 15/100
batch: 100/1563 - train loss: 2.2352 - test loss: 16.9954 - train acc: 0.8829 - test acc: 0.4302 - 18m 1s
batch: 200/1563 - train loss: 2.2405 - test loss: 17.6442 - train acc: 0.8707 - test acc: 0.4288 - 18m 6s
batch: 300/1563 - train loss: 2.074

batch: 100/1563 - train loss: 26.2123 - test loss: 25.2478 - train acc: 0.0358 - test acc: 0.0560 - 0m 1s
batch: 200/1563 - train loss: 24.2439 - test loss: 23.1519 - train acc: 0.0573 - test acc: 0.0785 - 0m 6s
batch: 300/1563 - train loss: 23.0845 - test loss: 22.4922 - train acc: 0.0855 - test acc: 0.0924 - 0m 11s
batch: 400/1563 - train loss: 22.1205 - test loss: 22.3754 - train acc: 0.1087 - test acc: 0.1160 - 0m 16s
batch: 500/1563 - train loss: 21.6682 - test loss: 21.0654 - train acc: 0.1159 - test acc: 0.1367 - 0m 21s
batch: 600/1563 - train loss: 21.3757 - test loss: 20.8179 - train acc: 0.1237 - test acc: 0.1375 - 0m 25s
batch: 700/1563 - train loss: 21.2517 - test loss: 20.2409 - train acc: 0.1296 - test acc: 0.1543 - 0m 31s
batch: 800/1563 - train loss: 20.5704 - test loss: 20.3885 - train acc: 0.1428 - test acc: 0.1531 - 0m 35s
batch: 900/1563 - train loss: 20.2043 - test loss: 20.9815 - train acc: 0.1585 - test acc: 0.1256 - 0m 40s
batch: 1000/1563 - train loss: 20.0708 

batch: 1000/1563 - train loss: 12.2049 - test loss: 13.9411 - train acc: 0.4265 - test acc: 0.3726 - 5m 55s
batch: 1100/1563 - train loss: 12.1647 - test loss: 14.1185 - train acc: 0.4409 - test acc: 0.3689 - 6m 0s
batch: 1200/1563 - train loss: 11.9852 - test loss: 13.6945 - train acc: 0.4325 - test acc: 0.3882 - 6m 4s
batch: 1300/1563 - train loss: 11.8445 - test loss: 14.2055 - train acc: 0.4347 - test acc: 0.3709 - 6m 9s
batch: 1400/1563 - train loss: 11.7559 - test loss: 13.8546 - train acc: 0.4534 - test acc: 0.3823 - 6m 14s
batch: 1500/1563 - train loss: 12.0866 - test loss: 13.5570 - train acc: 0.4331 - test acc: 0.3876 - 6m 19s
batch: 1563/1563 - train loss: 11.6807 - test loss: 13.9700 - train acc: 0.4453 - test acc: 0.3773 - 6m 23s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 6/100
batch: 100/1563 - train loss: 10.0006 - test loss: 14.5729 - train acc: 0.5178 - test acc: 0.3722 - 6m 28s
batch: 200/1563 - train loss: 10.073

batch: 200/1563 - train loss: 5.2300 - test loss: 14.5976 - train acc: 0.7213 - test acc: 0.4203 - 11m 47s
batch: 300/1563 - train loss: 5.1692 - test loss: 14.7289 - train acc: 0.7288 - test acc: 0.4285 - 11m 52s
batch: 400/1563 - train loss: 5.5209 - test loss: 15.3914 - train acc: 0.7131 - test acc: 0.4035 - 11m 57s
batch: 500/1563 - train loss: 5.6972 - test loss: 15.5133 - train acc: 0.7013 - test acc: 0.4069 - 12m 2s
batch: 600/1563 - train loss: 6.0597 - test loss: 16.3880 - train acc: 0.6734 - test acc: 0.3859 - 12m 7s
batch: 700/1563 - train loss: 6.4369 - test loss: 15.2668 - train acc: 0.6635 - test acc: 0.4039 - 12m 13s
batch: 800/1563 - train loss: 6.1364 - test loss: 14.8397 - train acc: 0.6697 - test acc: 0.4164 - 12m 18s
batch: 900/1563 - train loss: 6.2447 - test loss: 15.4059 - train acc: 0.6722 - test acc: 0.4043 - 12m 23s
batch: 1000/1563 - train loss: 6.2860 - test loss: 14.7719 - train acc: 0.6734 - test acc: 0.4163 - 12m 28s
batch: 1100/1563 - train loss: 6.4438 

batch: 1100/1563 - train loss: 3.3858 - test loss: 17.7890 - train acc: 0.8159 - test acc: 0.4114 - 17m 45s
batch: 1200/1563 - train loss: 3.4749 - test loss: 17.1420 - train acc: 0.8084 - test acc: 0.4238 - 17m 50s
batch: 1300/1563 - train loss: 3.2584 - test loss: 17.3931 - train acc: 0.8156 - test acc: 0.4250 - 17m 55s
batch: 1400/1563 - train loss: 3.3881 - test loss: 17.1749 - train acc: 0.8062 - test acc: 0.4205 - 18m 0s
batch: 1500/1563 - train loss: 3.8171 - test loss: 17.3994 - train acc: 0.7837 - test acc: 0.4199 - 18m 5s
batch: 1563/1563 - train loss: 3.7072 - test loss: 17.5677 - train acc: 0.7918 - test acc: 0.4100 - 18m 10s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 15/100
batch: 100/1563 - train loss: 2.0599 - test loss: 17.1618 - train acc: 0.8904 - test acc: 0.4311 - 18m 14s
batch: 200/1563 - train loss: 2.0623 - test loss: 17.9207 - train acc: 0.8813 - test acc: 0.4201 - 18m 19s
batch: 300/1563 - train loss: 1.929

batch: 100/1563 - train loss: 26.2066 - test loss: 24.9718 - train acc: 0.0342 - test acc: 0.0498 - 0m 1s
batch: 200/1563 - train loss: 24.1546 - test loss: 23.5053 - train acc: 0.0633 - test acc: 0.0745 - 0m 6s
batch: 300/1563 - train loss: 23.2617 - test loss: 23.4603 - train acc: 0.0730 - test acc: 0.0838 - 0m 11s
batch: 400/1563 - train loss: 22.5127 - test loss: 22.3899 - train acc: 0.0940 - test acc: 0.0966 - 0m 16s
batch: 500/1563 - train loss: 22.0922 - test loss: 21.3847 - train acc: 0.0937 - test acc: 0.1181 - 0m 21s
batch: 600/1563 - train loss: 21.2339 - test loss: 21.1874 - train acc: 0.1210 - test acc: 0.1320 - 0m 26s
batch: 700/1563 - train loss: 20.9666 - test loss: 20.8021 - train acc: 0.1181 - test acc: 0.1358 - 0m 31s
batch: 800/1563 - train loss: 20.6372 - test loss: 20.2774 - train acc: 0.1462 - test acc: 0.1522 - 0m 36s
batch: 900/1563 - train loss: 20.2452 - test loss: 20.3122 - train acc: 0.1535 - test acc: 0.1615 - 0m 41s
batch: 1000/1563 - train loss: 20.1778 

batch: 1000/1563 - train loss: 11.7972 - test loss: 14.0093 - train acc: 0.4403 - test acc: 0.3738 - 6m 4s
batch: 1100/1563 - train loss: 11.7903 - test loss: 15.9466 - train acc: 0.4366 - test acc: 0.3233 - 6m 9s
batch: 1200/1563 - train loss: 12.3833 - test loss: 15.0528 - train acc: 0.4206 - test acc: 0.3493 - 6m 14s
batch: 1300/1563 - train loss: 11.6429 - test loss: 14.3832 - train acc: 0.4525 - test acc: 0.3600 - 6m 19s
batch: 1400/1563 - train loss: 12.2165 - test loss: 13.8131 - train acc: 0.4209 - test acc: 0.3738 - 6m 24s
batch: 1500/1563 - train loss: 12.0726 - test loss: 14.7969 - train acc: 0.4285 - test acc: 0.3668 - 6m 29s
batch: 1563/1563 - train loss: 12.3061 - test loss: 13.8467 - train acc: 0.4222 - test acc: 0.3752 - 6m 33s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 6/100
batch: 100/1563 - train loss: 10.0417 - test loss: 14.1691 - train acc: 0.5156 - test acc: 0.3721 - 6m 37s
batch: 200/1563 - train loss: 10.22

batch: 200/1563 - train loss: 5.3098 - test loss: 14.2781 - train acc: 0.7153 - test acc: 0.4262 - 11m 47s
batch: 300/1563 - train loss: 5.7626 - test loss: 14.8462 - train acc: 0.6913 - test acc: 0.4067 - 11m 52s
batch: 400/1563 - train loss: 5.4964 - test loss: 15.0811 - train acc: 0.7110 - test acc: 0.4178 - 11m 57s
batch: 500/1563 - train loss: 5.8879 - test loss: 15.6911 - train acc: 0.6913 - test acc: 0.4005 - 12m 2s
batch: 600/1563 - train loss: 6.1316 - test loss: 15.0065 - train acc: 0.6757 - test acc: 0.4046 - 12m 7s
batch: 700/1563 - train loss: 6.4510 - test loss: 17.1749 - train acc: 0.6684 - test acc: 0.3600 - 12m 12s
batch: 800/1563 - train loss: 6.6083 - test loss: 14.4679 - train acc: 0.6563 - test acc: 0.4115 - 12m 16s
batch: 900/1563 - train loss: 6.4974 - test loss: 14.9698 - train acc: 0.6562 - test acc: 0.4011 - 12m 21s
batch: 1000/1563 - train loss: 6.6449 - test loss: 14.7787 - train acc: 0.6528 - test acc: 0.4074 - 12m 26s
batch: 1100/1563 - train loss: 6.8107 

batch: 1100/1563 - train loss: 3.4127 - test loss: 17.1727 - train acc: 0.8075 - test acc: 0.4178 - 17m 36s
batch: 1200/1563 - train loss: 3.7064 - test loss: 17.7518 - train acc: 0.7900 - test acc: 0.4075 - 17m 40s
batch: 1300/1563 - train loss: 3.5611 - test loss: 17.0436 - train acc: 0.7934 - test acc: 0.4103 - 17m 45s
batch: 1400/1563 - train loss: 3.8769 - test loss: 16.9718 - train acc: 0.7875 - test acc: 0.4149 - 17m 50s
batch: 1500/1563 - train loss: 3.6845 - test loss: 17.6887 - train acc: 0.7931 - test acc: 0.4074 - 17m 55s
batch: 1563/1563 - train loss: 3.7993 - test loss: 18.3284 - train acc: 0.7865 - test acc: 0.4078 - 17m 59s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 15/100
batch: 100/1563 - train loss: 2.2008 - test loss: 17.1983 - train acc: 0.8806 - test acc: 0.4252 - 18m 4s
batch: 200/1563 - train loss: 2.0428 - test loss: 16.8895 - train acc: 0.8787 - test acc: 0.4363 - 18m 9s
batch: 300/1563 - train loss: 2.117

batch: 100/1563 - train loss: 26.1393 - test loss: 24.7582 - train acc: 0.0351 - test acc: 0.0468 - 0m 1s
batch: 200/1563 - train loss: 24.2089 - test loss: 24.0912 - train acc: 0.0599 - test acc: 0.0699 - 0m 6s
batch: 300/1563 - train loss: 22.7937 - test loss: 22.4171 - train acc: 0.0855 - test acc: 0.1006 - 0m 10s
batch: 400/1563 - train loss: 22.3866 - test loss: 21.9148 - train acc: 0.0946 - test acc: 0.1130 - 0m 15s
batch: 500/1563 - train loss: 21.7509 - test loss: 21.4329 - train acc: 0.1105 - test acc: 0.1199 - 0m 20s
batch: 600/1563 - train loss: 21.2166 - test loss: 21.4496 - train acc: 0.1369 - test acc: 0.1228 - 0m 25s
batch: 700/1563 - train loss: 20.9381 - test loss: 20.6934 - train acc: 0.1306 - test acc: 0.1459 - 0m 29s
batch: 800/1563 - train loss: 20.4284 - test loss: 19.9690 - train acc: 0.1444 - test acc: 0.1599 - 0m 34s
batch: 900/1563 - train loss: 20.5090 - test loss: 19.8519 - train acc: 0.1469 - test acc: 0.1702 - 0m 39s
batch: 1000/1563 - train loss: 19.9519 

batch: 1000/1563 - train loss: 12.1650 - test loss: 13.9322 - train acc: 0.4206 - test acc: 0.3677 - 5m 45s
batch: 1100/1563 - train loss: 11.7396 - test loss: 15.6333 - train acc: 0.4406 - test acc: 0.3327 - 5m 50s
batch: 1200/1563 - train loss: 12.2248 - test loss: 14.4071 - train acc: 0.4234 - test acc: 0.3538 - 5m 55s
batch: 1300/1563 - train loss: 11.8564 - test loss: 13.7596 - train acc: 0.4394 - test acc: 0.3785 - 5m 59s
batch: 1400/1563 - train loss: 11.9822 - test loss: 13.3075 - train acc: 0.4366 - test acc: 0.3911 - 6m 4s
batch: 1500/1563 - train loss: 11.7536 - test loss: 13.4109 - train acc: 0.4497 - test acc: 0.3908 - 6m 9s
batch: 1563/1563 - train loss: 11.6431 - test loss: 13.6197 - train acc: 0.4472 - test acc: 0.3847 - 6m 13s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 6/100
batch: 100/1563 - train loss: 9.9549 - test loss: 13.7140 - train acc: 0.5231 - test acc: 0.3887 - 6m 18s
batch: 200/1563 - train loss: 10.223

batch: 200/1563 - train loss: 5.5791 - test loss: 14.0673 - train acc: 0.7031 - test acc: 0.4285 - 11m 22s
batch: 300/1563 - train loss: 5.9798 - test loss: 14.3361 - train acc: 0.6944 - test acc: 0.4194 - 11m 27s
batch: 400/1563 - train loss: 5.7487 - test loss: 14.6350 - train acc: 0.7047 - test acc: 0.4059 - 11m 32s
batch: 500/1563 - train loss: 5.9866 - test loss: 16.2805 - train acc: 0.6809 - test acc: 0.3800 - 11m 36s
batch: 600/1563 - train loss: 6.2706 - test loss: 15.0903 - train acc: 0.6688 - test acc: 0.4004 - 11m 41s
batch: 700/1563 - train loss: 6.4077 - test loss: 14.6164 - train acc: 0.6597 - test acc: 0.4225 - 11m 46s
batch: 800/1563 - train loss: 6.3501 - test loss: 14.7338 - train acc: 0.6644 - test acc: 0.4082 - 11m 50s
batch: 900/1563 - train loss: 6.6301 - test loss: 14.8899 - train acc: 0.6557 - test acc: 0.4078 - 11m 55s
batch: 1000/1563 - train loss: 6.6232 - test loss: 14.6339 - train acc: 0.6451 - test acc: 0.4105 - 12m 0s
batch: 1100/1563 - train loss: 6.6578

batch: 1100/1563 - train loss: 3.2969 - test loss: 18.0590 - train acc: 0.8125 - test acc: 0.4116 - 17m 3s
batch: 1200/1563 - train loss: 3.6351 - test loss: 18.5113 - train acc: 0.7931 - test acc: 0.3907 - 17m 7s
batch: 1300/1563 - train loss: 3.8558 - test loss: 17.5560 - train acc: 0.7859 - test acc: 0.4143 - 17m 12s
batch: 1400/1563 - train loss: 3.7376 - test loss: 17.1262 - train acc: 0.7900 - test acc: 0.4180 - 17m 17s
batch: 1500/1563 - train loss: 3.9264 - test loss: 17.1880 - train acc: 0.7846 - test acc: 0.4135 - 17m 22s
batch: 1563/1563 - train loss: 3.6700 - test loss: 17.0742 - train acc: 0.7956 - test acc: 0.4149 - 17m 26s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 15/100
batch: 100/1563 - train loss: 2.4095 - test loss: 17.0064 - train acc: 0.8656 - test acc: 0.4279 - 17m 30s
batch: 200/1563 - train loss: 2.0825 - test loss: 17.5061 - train acc: 0.8866 - test acc: 0.4219 - 17m 35s
batch: 300/1563 - train loss: 2.182

batch: 100/1563 - train loss: 26.1789 - test loss: 25.0280 - train acc: 0.0332 - test acc: 0.0458 - 0m 1s
batch: 200/1563 - train loss: 24.4747 - test loss: 23.5502 - train acc: 0.0492 - test acc: 0.0636 - 0m 6s
batch: 300/1563 - train loss: 23.1912 - test loss: 22.8275 - train acc: 0.0793 - test acc: 0.0880 - 0m 10s
batch: 400/1563 - train loss: 22.2691 - test loss: 21.9213 - train acc: 0.1068 - test acc: 0.1109 - 0m 15s
batch: 500/1563 - train loss: 21.9969 - test loss: 22.3158 - train acc: 0.1115 - test acc: 0.0905 - 0m 19s
batch: 600/1563 - train loss: 21.3600 - test loss: 21.3250 - train acc: 0.1297 - test acc: 0.1337 - 0m 24s
batch: 700/1563 - train loss: 20.9972 - test loss: 20.8703 - train acc: 0.1303 - test acc: 0.1348 - 0m 29s
batch: 800/1563 - train loss: 20.8365 - test loss: 20.8417 - train acc: 0.1353 - test acc: 0.1365 - 0m 33s
batch: 900/1563 - train loss: 20.4548 - test loss: 20.2521 - train acc: 0.1525 - test acc: 0.1519 - 0m 38s
batch: 1000/1563 - train loss: 19.9435 

batch: 1000/1563 - train loss: 11.9671 - test loss: 13.9221 - train acc: 0.4313 - test acc: 0.3751 - 5m 34s
batch: 1100/1563 - train loss: 11.6963 - test loss: 13.9087 - train acc: 0.4640 - test acc: 0.3801 - 5m 39s
batch: 1200/1563 - train loss: 12.1582 - test loss: 13.7193 - train acc: 0.4316 - test acc: 0.3824 - 5m 43s
batch: 1300/1563 - train loss: 11.9271 - test loss: 13.9258 - train acc: 0.4357 - test acc: 0.3761 - 5m 48s
batch: 1400/1563 - train loss: 12.0555 - test loss: 16.8274 - train acc: 0.4338 - test acc: 0.3094 - 5m 53s
batch: 1500/1563 - train loss: 12.0011 - test loss: 14.5264 - train acc: 0.4322 - test acc: 0.3606 - 5m 57s
batch: 1563/1563 - train loss: 11.8680 - test loss: 14.4720 - train acc: 0.4484 - test acc: 0.3588 - 6m 2s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 6/100
batch: 100/1563 - train loss: 10.2181 - test loss: 14.6865 - train acc: 0.5006 - test acc: 0.3688 - 6m 6s
batch: 200/1563 - train loss: 10.08

batch: 200/1563 - train loss: 5.4702 - test loss: 14.6024 - train acc: 0.7085 - test acc: 0.4184 - 11m 6s
batch: 300/1563 - train loss: 5.7803 - test loss: 15.2164 - train acc: 0.6925 - test acc: 0.4066 - 11m 10s
batch: 400/1563 - train loss: 6.0400 - test loss: 15.6193 - train acc: 0.6804 - test acc: 0.3911 - 11m 15s
batch: 500/1563 - train loss: 6.2485 - test loss: 15.0511 - train acc: 0.6766 - test acc: 0.4033 - 11m 20s
batch: 600/1563 - train loss: 6.4745 - test loss: 15.2259 - train acc: 0.6554 - test acc: 0.3901 - 11m 24s
batch: 700/1563 - train loss: 6.4664 - test loss: 15.6459 - train acc: 0.6638 - test acc: 0.3972 - 11m 29s
batch: 800/1563 - train loss: 6.4201 - test loss: 14.6769 - train acc: 0.6697 - test acc: 0.4108 - 11m 33s
batch: 900/1563 - train loss: 6.4424 - test loss: 15.0115 - train acc: 0.6654 - test acc: 0.4029 - 11m 38s
batch: 1000/1563 - train loss: 6.9075 - test loss: 15.0894 - train acc: 0.6444 - test acc: 0.4029 - 11m 43s
batch: 1100/1563 - train loss: 7.0137

batch: 1100/1563 - train loss: 3.6603 - test loss: 17.5623 - train acc: 0.8028 - test acc: 0.4140 - 16m 44s
batch: 1200/1563 - train loss: 3.6011 - test loss: 17.2507 - train acc: 0.8006 - test acc: 0.4112 - 16m 48s
batch: 1300/1563 - train loss: 3.6789 - test loss: 17.3341 - train acc: 0.7994 - test acc: 0.4125 - 16m 53s
batch: 1400/1563 - train loss: 3.9839 - test loss: 17.4518 - train acc: 0.7834 - test acc: 0.4110 - 16m 57s
batch: 1500/1563 - train loss: 3.8676 - test loss: 17.2555 - train acc: 0.7809 - test acc: 0.4106 - 17m 2s
batch: 1563/1563 - train loss: 3.9638 - test loss: 17.3308 - train acc: 0.7790 - test acc: 0.4176 - 17m 6s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 15/100
batch: 100/1563 - train loss: 2.3660 - test loss: 17.9114 - train acc: 0.8669 - test acc: 0.4131 - 17m 11s
batch: 200/1563 - train loss: 2.0983 - test loss: 17.5815 - train acc: 0.8797 - test acc: 0.4190 - 17m 16s
batch: 300/1563 - train loss: 2.089

batch: 100/1563 - train loss: 26.1043 - test loss: 25.0462 - train acc: 0.0376 - test acc: 0.0501 - 0m 1s
batch: 200/1563 - train loss: 24.1697 - test loss: 23.4905 - train acc: 0.0548 - test acc: 0.0763 - 0m 6s
batch: 300/1563 - train loss: 23.1543 - test loss: 23.0531 - train acc: 0.0814 - test acc: 0.0924 - 0m 11s
batch: 400/1563 - train loss: 22.2736 - test loss: 22.5014 - train acc: 0.1030 - test acc: 0.1031 - 0m 15s
batch: 500/1563 - train loss: 22.0056 - test loss: 21.3511 - train acc: 0.1122 - test acc: 0.1213 - 0m 20s
batch: 600/1563 - train loss: 21.4246 - test loss: 21.1803 - train acc: 0.1219 - test acc: 0.1375 - 0m 24s
batch: 700/1563 - train loss: 21.2382 - test loss: 20.3534 - train acc: 0.1285 - test acc: 0.1530 - 0m 29s
batch: 800/1563 - train loss: 20.7872 - test loss: 20.4132 - train acc: 0.1387 - test acc: 0.1454 - 0m 34s
batch: 900/1563 - train loss: 20.3607 - test loss: 20.5304 - train acc: 0.1544 - test acc: 0.1458 - 0m 38s
batch: 1000/1563 - train loss: 20.1179 

batch: 1000/1563 - train loss: 12.2155 - test loss: 14.6880 - train acc: 0.4241 - test acc: 0.3519 - 5m 35s
batch: 1100/1563 - train loss: 11.7200 - test loss: 15.8874 - train acc: 0.4441 - test acc: 0.3116 - 5m 39s
batch: 1200/1563 - train loss: 12.0340 - test loss: 14.0388 - train acc: 0.4323 - test acc: 0.3726 - 5m 44s
batch: 1300/1563 - train loss: 11.9702 - test loss: 13.9820 - train acc: 0.4385 - test acc: 0.3744 - 5m 49s
batch: 1400/1563 - train loss: 11.9935 - test loss: 14.0284 - train acc: 0.4375 - test acc: 0.3761 - 5m 54s
batch: 1500/1563 - train loss: 11.8194 - test loss: 13.9264 - train acc: 0.4468 - test acc: 0.3796 - 5m 58s
batch: 1563/1563 - train loss: 11.8401 - test loss: 13.8538 - train acc: 0.4481 - test acc: 0.3778 - 6m 2s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 6/100
batch: 100/1563 - train loss: 10.0771 - test loss: 13.5474 - train acc: 0.5122 - test acc: 0.3996 - 6m 7s
batch: 200/1563 - train loss: 10.48

batch: 200/1563 - train loss: 5.2725 - test loss: 15.3072 - train acc: 0.7203 - test acc: 0.3990 - 11m 8s
batch: 300/1563 - train loss: 5.7252 - test loss: 14.6290 - train acc: 0.7035 - test acc: 0.4201 - 11m 13s
batch: 400/1563 - train loss: 5.8659 - test loss: 14.5531 - train acc: 0.6885 - test acc: 0.4136 - 11m 18s
batch: 500/1563 - train loss: 5.9927 - test loss: 15.1896 - train acc: 0.6735 - test acc: 0.4071 - 11m 22s
batch: 600/1563 - train loss: 6.1284 - test loss: 16.0756 - train acc: 0.6779 - test acc: 0.3814 - 11m 27s
batch: 700/1563 - train loss: 5.9910 - test loss: 15.3739 - train acc: 0.6797 - test acc: 0.3982 - 11m 32s
batch: 800/1563 - train loss: 6.3661 - test loss: 14.9582 - train acc: 0.6591 - test acc: 0.4166 - 11m 36s
batch: 900/1563 - train loss: 6.4443 - test loss: 14.6246 - train acc: 0.6669 - test acc: 0.4149 - 11m 41s
batch: 1000/1563 - train loss: 6.4627 - test loss: 14.9518 - train acc: 0.6616 - test acc: 0.4064 - 11m 46s
batch: 1100/1563 - train loss: 6.9864

batch: 1100/1563 - train loss: 3.3705 - test loss: 17.4778 - train acc: 0.8130 - test acc: 0.4207 - 16m 50s
batch: 1200/1563 - train loss: 3.7729 - test loss: 18.1239 - train acc: 0.7875 - test acc: 0.4077 - 16m 54s
batch: 1300/1563 - train loss: 3.5985 - test loss: 17.1152 - train acc: 0.7956 - test acc: 0.4163 - 16m 59s
batch: 1400/1563 - train loss: 3.6752 - test loss: 17.1755 - train acc: 0.7890 - test acc: 0.4188 - 17m 4s
batch: 1500/1563 - train loss: 3.8501 - test loss: 17.6786 - train acc: 0.7890 - test acc: 0.4046 - 17m 9s
batch: 1563/1563 - train loss: 3.9987 - test loss: 17.6662 - train acc: 0.7713 - test acc: 0.4165 - 17m 13s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 15/100
batch: 100/1563 - train loss: 2.3083 - test loss: 17.4111 - train acc: 0.8731 - test acc: 0.4234 - 17m 17s
batch: 200/1563 - train loss: 2.1467 - test loss: 17.3637 - train acc: 0.8788 - test acc: 0.4211 - 17m 22s
batch: 300/1563 - train loss: 2.199

batch: 100/1563 - train loss: 26.1293 - test loss: 25.0812 - train acc: 0.0445 - test acc: 0.0566 - 0m 1s
batch: 200/1563 - train loss: 24.1026 - test loss: 23.2340 - train acc: 0.0614 - test acc: 0.0750 - 0m 6s
batch: 300/1563 - train loss: 23.1988 - test loss: 23.9935 - train acc: 0.0774 - test acc: 0.0830 - 0m 11s
batch: 400/1563 - train loss: 22.2593 - test loss: 21.6409 - train acc: 0.1049 - test acc: 0.1124 - 0m 15s
batch: 500/1563 - train loss: 21.8189 - test loss: 21.2069 - train acc: 0.1109 - test acc: 0.1310 - 0m 20s
batch: 600/1563 - train loss: 21.3838 - test loss: 21.0962 - train acc: 0.1231 - test acc: 0.1305 - 0m 24s
batch: 700/1563 - train loss: 20.9898 - test loss: 20.5852 - train acc: 0.1340 - test acc: 0.1405 - 0m 29s
batch: 800/1563 - train loss: 20.7685 - test loss: 20.3266 - train acc: 0.1419 - test acc: 0.1475 - 0m 34s
batch: 900/1563 - train loss: 20.2141 - test loss: 20.8720 - train acc: 0.1522 - test acc: 0.1429 - 0m 39s
batch: 1000/1563 - train loss: 20.0835 

batch: 1000/1563 - train loss: 11.8512 - test loss: 14.2027 - train acc: 0.4428 - test acc: 0.3662 - 5m 38s
batch: 1100/1563 - train loss: 12.2654 - test loss: 14.5043 - train acc: 0.4288 - test acc: 0.3593 - 5m 43s
batch: 1200/1563 - train loss: 11.8573 - test loss: 15.6651 - train acc: 0.4263 - test acc: 0.3259 - 5m 48s
batch: 1300/1563 - train loss: 12.1434 - test loss: 15.8447 - train acc: 0.4341 - test acc: 0.3165 - 5m 52s
batch: 1400/1563 - train loss: 11.8809 - test loss: 14.2387 - train acc: 0.4463 - test acc: 0.3768 - 5m 57s
batch: 1500/1563 - train loss: 12.4886 - test loss: 13.5532 - train acc: 0.4238 - test acc: 0.3895 - 6m 1s
batch: 1563/1563 - train loss: 12.1553 - test loss: 13.7930 - train acc: 0.4316 - test acc: 0.3846 - 6m 5s
GPU memory used: 0.13 GB - max: 7.56 GB - memory reserved: 0.23 GB - max: 7.62 GB
starting epoch: 6/100
batch: 100/1563 - train loss: 10.1557 - test loss: 13.5565 - train acc: 0.5165 - test acc: 0.3932 - 6m 10s
batch: 200/1563 - train loss: 10.28