# Kernel Sparsity Pruning Tutorial

Neural networks, in general, are very overparamatarized for given tasks (ie the number of parameters far exceeds the number of training points) yet still they [generalize very well](https://arxiv.org/abs/1611.03530). This flies against conventional wisdom where overparamatarizing a model will lead to overfitting and puting theory behind this empirical evidence is a very active area of research.

Additionally, [early on](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf) it was discovered that large numbers of weights in neural networks could be pruned away (set to 0) without affecting the loss and in most cases actually improving the generalization capability of the network. This work was reinvigorated with Song Han's [2015 paper](https://arxiv.org/abs/1510.00149) in pursuit of compressing model size for mobile applications. This has resulted in numerous papers coming out on the topic of weight pruning, filter pruning, channel pruning, and ultimately block pruning. [This paper](https://arxiv.org/abs/1902.09574) out of Google gives a good overview of the current state of sparsity.

Given that models are very overparamatarized and large numbers of weights can be effectively pruned away, what does this leave us with? Well intuitively, then, we can think of pruning as performing an [architecture search](https://openreview.net/pdf?id=rJlnB3C5Ym) within this large, traditionally fixed weight space. What was originally important in the dense model was representing a large number of pathways for optimization. We can then effectively remove the unused pathways in the optimization space with a fine toothed comb.

Well what does pruning get us? We now have a model with a lot of multiplications by zero that we don't need to run. If we're smart about how we do structure this compute (a suprisingly trickly problem), we can now run the model way faster than ever thought possible! That's where the [Neural Magic](http://neuralmagic.com/) engine can help us.

This tutorial provides a step by step walk through for pruning an already trained (dense) model. Specifically it is setup to work with the model trained in our [model training tutorial](model_training.ipynb), but it can be changed to support other models/datasets as needed:
1. Dataset selection
2. Model selection and loading
3. Pruning setup
4. Recalibration using pruning

In [1]:
import sys
import os

print('Python %s on %s' % (sys.version, sys.platform))

package_path = os.path.abspath(os.path.join(os.path.expanduser(os.getcwd()), os.pardir))
print(package_path)

"""
Adding the path to the neuralmagic-pytorch extension to the path so it isn't necessary to have it installed
"""
sys.path.extend([package_path])

print('Added current package path to sys.path')
print('Be sure to install from requirements.txt and pytorch separately')


Python 3.6.8 (v3.6.8:3c6b436a57, Dec 24 2018, 02:04:31) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
/Users/markkurtz/code/neuralmagic/Shared/neuralmagicml-pytorch
Added current package path to sys.path
Be sure to install from requirements.txt and pytorch separately


## Dataset Selection

We are using fast.ai's [Imagenette dataset](https://github.com/fastai/imagenette) provided under the [Apache License 2.0](https://github.com/fastai/imagenette/blob/master/LICENSE) as the default dataset. The original authors, much like ourselves, were interested in a dataset that has similar properties to more complicated datasets such as the Imagenet dataset but one that would allow rapid iterations. It includes 10 of the easiest classes out of the Imagenet 1000 dataset: tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute. If you are interested in visualizing the properties in this dataset see our [model training tutorial](model_training.ipynb) which also gives a more in depth breakdown for what batch size to use and the dataset splits.

The dataset can easily be changed to the desired dataset in the code given.

Below we will need to fill in the dataset path, train batch size, and test batch size.

In [4]:
import ipywidgets as widgets
import torch

print('\nEnter the local path where the dataset can be found')

dataset_text = widgets.Text(value='', placeholder='Enter local path to dataset', description='Dataset Path')
display(dataset_text)

print('\nChoose the batch size to run through the model during train and test runs')
print('(press enter if/after inputting manually)')
train_batch_size_slider = widgets.IntSlider(
    value=64, min=1, max=256, step=1, description='Train Batch Size:'
)
display(train_batch_size_slider)
test_batch_size_slider = widgets.IntSlider(
    value=64 if torch.cuda.is_available() else 1, min=1, max=256, step=1, description='Test Batch Size:'
)
display(test_batch_size_slider)



Enter the local path where the dataset can be found


Text(value='', description='Dataset Path', placeholder='Enter local path to dataset')


Choose the batch size to run through the model during train and test runs
(press enter if/after inputting manually)


IntSlider(value=64, description='Train Batch Size:', max=256, min=1)

IntSlider(value=1, description='Test Batch Size:', max=256, min=1)

In [5]:
from neuralmagicML.datasets import ImagenetteDataset, EarlyStopDataset
from torch.utils.data import Dataset, DataLoader

dataset_root = os.path.abspath(os.path.expanduser(dataset_text.value.strip()))
print('\nLoading dataset from {}'.format(dataset_root))

if not os.path.exists(dataset_root):
    raise Exception('Folder must exist for dataset at {}'.format(dataset_root))
    
train_batch_size = train_batch_size_slider.value
test_batch_size = test_batch_size_slider.value

print('\nUsing train batch size of {} and test batch size of {}\n'
      .format(train_batch_size, test_batch_size))
    
train_dataset = ImagenetteDataset(dataset_root, train=True, rand_trans=True)
train_data_loader = DataLoader(train_dataset, batch_size=train_batch_size, shuffle=True, num_workers=4)
print('train dataset created: \n{}\n'.format(train_dataset))

val_dataset = ImagenetteDataset(dataset_root, train=False, rand_trans=False)
val_data_loader = DataLoader(val_dataset, batch_size=train_batch_size, shuffle=False, num_workers=4)
print('validation test dataset created: \n{}\n'.format(val_dataset))

train_test_dataset = EarlyStopDataset(ImagenetteDataset(dataset_root, train=True, rand_trans=False),
                                      early_stop=len(val_dataset))
train_test_data_loader = DataLoader(train_test_dataset, batch_size=train_batch_size, shuffle=False, num_workers=4)
print('train test dataset created: \n{}\n'.format(train_test_dataset))



Loading dataset from /Users/markkurtz/datasets/imagenette

Using train batch size of 64 and test batch size of 1

already downloaded imagenette of size ImagenetteSize.s160
train dataset created: 
Dataset ImagenetteDataset
    Number of datapoints: 12894
    Root location: /Users/markkurtz/datasets/imagenette/imagenette-160/train

already downloaded imagenette of size ImagenetteSize.s160
validation test dataset created: 
Dataset ImagenetteDataset
    Number of datapoints: 500
    Root location: /Users/markkurtz/datasets/imagenette/imagenette-160/val

already downloaded imagenette of size ImagenetteSize.s160
train test dataset created: 
Dataset ImagenetteDataset
    Number of datapoints: 500
    Root location: /Users/markkurtz/datasets/imagenette/imagenette-160/train



## Model Selection and Loading

For this exercise we'll create the standard [ResNet50 model](https://arxiv.org/abs/1512.03385) and in addition we will load the pretrained weights from our [model training tutorial](model_training.ipynb)

If you changed the dataset in the above cell then we'll need to update the number of classes to create the model with appropriately and load your own pretrained weights. Additionally the model can be changed out completely to work with your specific use case.

Additionally run the code block and select the device to run on before continuing. cpu runs in the pytorch cpu framework and cuda runs on an attached GPU.


In [7]:
import glob
from neuralmagicML.models import resnet50, load_model

num_classes = 10
# TODO: change this to load pretrained weights from our cloud
model = resnet50(num_classes=num_classes)
pretrained_paths = [path for path in glob.glob('ResNet*.pth')]
pretrained_path = pretrained_paths[0]
load_model(pretrained_path, model)
print('Created model {}'.format(model.__class__.__name__))

print('\nChoose the device to run on')
device_choice = widgets.ToggleButtons(
    options=['cuda', 'cpu'] if torch.cuda.is_available() else ['cpu'],
    description='Device'
)
display(device_choice)


Created model ResNet

Choose the device to run on


ToggleButtons(description='Device', options=('cpu',), value='cpu')

## Sparsity, Prunning and high level motivation
### sparsity:

Informally, sparsity is the degree in which a tensor is comprised of zeros.

Slightly more formally:

let $N^i$ be the total number of elements in a (e.g. weight) tensor $W_i$

let $N^i_z$ be the number of elements which are zero-valued within that tensor

The sparsity level associated with that tensor is defined as $s_i \triangleq \dfrac{N^i_z}{N^i}$ 


### prunning:

Prunning is the process selectively setting weights in a model to zero. The selection of how many and which weights to set to zero affects the accuracy and model required FLOPs and memory footprint. **Critically - attaining high levels of sparsity while preserving accuracy is possible, as we will demonstrate in this notebook**

### Sparsity --> Less FLOPs -?-> accelerated performance:
The fact that models can be heavily sparsified with little or no accuracy hit is well known in the research community. Intuitively, the higher the sparsity level the less theoretical FLOPs are required and hence a correspondingly large performance acceleration. However, while the first part of that intutive reasoning is true (higer sparsity --> less theoretical FLOPs), the second one is not nececcerily true (less theoretical FLOPs -/-> higher performance). The reason is that typical hardware such as GPUs is very ill-equiped to take advantage of that sparsity, and FLOPs savings in practice is very hard to come by. On CPUs, in contrast, algorithms for that very exploitation can be flexibly developed.

Armed with this insight we are ready (and motivated!) to start looking at model sparsity with the aim of increasing it (via prunning). 

In [3]:
from typing import List
from tensorboardX import SummaryWriter


print('Setting up model for kernel sparsity tracking...')
conv_layers_names = []
for name, mod in model.named_modules():
    if isinstance(mod, Conv2d): #to add the FC layers: isinstance(mod, Conv2d) or isinstance(mod, Linear) 
        conv_layers_names.append(name)
analyzed_layers = KSAnalyzerLayer.analyze_layers(model, conv_layers_names)

def _record_kernel_sparsity(analyzed_layers: List[KSAnalyzerLayer], writer: SummaryWriter, epoch: int):
#     layers_sparsities = []
    for ks_layer in analyzed_layers:
        tag = 'Kernel Sparsity/{}'.format(ks_layer.name)
        writer.add_scalar(tag, ks_layer.param_sparsity.item(), epoch)
    print('sparsity per layer [%]: '+ str([round(ks_layer.param_sparsity.item()*100.0,0) for ks_layer in analyzed_layers]))


Setting up model for kernel sparsity tracking...


## Optimizer , Loss, Logging etc.

In [4]:
import torch
from torch import optim
from torch.nn import DataParallel
from neuralmagicML.utils import CrossEntropyLossCalc, TopKAccuracy
import os

init_lr = 0.01
momentum = 0.9
weight_decay = 1e-4

print('Creating optimizer with initial lr: {}, momentum: {}, weight_decay: {}'
          .format(init_lr, momentum, weight_decay))
optimizer = optim.SGD(
    model.parameters(), init_lr, momentum=momentum, weight_decay=weight_decay, nesterov=True
)
loss_extras = {
    'top1acc': TopKAccuracy(1),
    'top5acc': TopKAccuracy(5)
}
loss_calc = CrossEntropyLossCalc(loss_extras)
print('Created loss calc {} with extras {}'.format(loss_calc, ', '.join(loss_extras.keys())))


logs_dir = './logs'
model_dir = '../pruned'
if not os.path.exists(model_dir):
    os.makedirs(model_dir)

if not os.path.exists(logs_dir):
    os.makedirs(logs_dir)
    
save_rate = 5
print('Creating summary writer in {}'.format(logs_dir))
writer = SummaryWriter(logdir=logs_dir, comment='imagenet training')
if isinstance(model, DataParallel):
    model = model.module


Creating optimizer with initial lr: 0.01, momentum: 0.9, weight_decay: 0.0001
Created loss calc <neuralmagicML.utils.loss_calc.CrossEntropyLossCalc object at 0x7fd840247c18> with extras top1acc, top5acc
Creating summary writer in ./logs


## Scheduling the prunning process:
Prunning involves two intertwined processes:
1. sparsification - i.e. the selection of weights to zero out.
2. re-training the model post sparsification.

In practice, a gradual increase of the sparsity level allows for the recovery of accuracy by retraining (up to high levels of sparsity)

In order to simplify these control of these two processes, we introduce 'Modifier' classes which manage the schedules  of the associated hyperparameters (i.e. learning_rate, sparsity per layer) thoughout the epochs. For added convinience below is a simple GUI to set these hyperparameters.
The GUI allows for controlling the target sparsity on an individual layer basis / global / mixed fashion.
Try setting all layers to a sparsity level of 80%

In [5]:
import ipywidgets as widgets

############################################################
## configuration of sparsity levels / enables per layer ####
############################################################
c0 = widgets.VBox([widgets.Checkbox(description=ks_layer.name, value=True) for ks_layer in analyzed_layers])
c1 = widgets.VBox([widgets.FloatSlider(value=0.5,min=0.05,max=0.99) for _ in range(len(analyzed_layers))])
layer_ctrl = widgets.HBox([c0,c1])
global_ctrl = widgets.HBox([widgets.Checkbox(description='enable/disable all', value = True), 
                            widgets.FloatSlider(value=0.5,min=0.05,max=0.99,description='sparsity [%]')
                           ])
output2 = widgets.Output()

activated_layers = [child.value for child in layer_ctrl.children[0].children]

def global_enable_change(change):
    with output2:
        state = change['new']
        print(state)
        if state is not None:
            for ckbx_child in layer_ctrl.children[0].children:
                ckbx_child.value = state
                
global_ctrl.children[0].observe(global_enable_change, names='value')   

def global_sparsity_set(change):
    with output2:
        val = change['new']
        print(val)
        if val is not None:
            for ckbx_child, sldr_child in zip(layer_ctrl.children[0].children, layer_ctrl.children[1].children):
                if ckbx_child.value:
                    sldr_child.value = val
    
global_ctrl.children[1].observe(global_sparsity_set, names='value')   

###############################################
## configuration of learning rate schedule ####
###############################################

lr_class_dict = {   #TODO: read from CONSTRUCTORS in modifier_lr.py instead
                    #to include all supported methods in the GUI
    'ExponentialLR': {'gamma': [0.95, widgets.BoundedFloatText]}, #bound by 0.0
    'StepLR': {'step_size': [20, widgets.BoundedIntText], #bound by 1
              'gamma': [0.2, widgets.BoundedFloatText]}
}

lr_mod_args_field_initval = {
    'start_epoch': 25.0,# 'start epoch:'],
    'end_epoch': 35.0,# 'end epoch  :'],
    'update_frequency': 1.0,# 'update freq:'],
    'init_lr': 0.001# 'initial learning rate :']
}

style = {'description_width': 'initial'}
# lr_cfg_list = [widgets.Text(value='learning rate schedule', disabled=True)]
lr_section_title = widgets.Text(value='learning rate schedule', disabled=True)
lr_cfg_list =[]
for fld, val in lr_mod_args_field_initval.items():
    lr_cfg_list.append(widgets.BoundedFloatText(value=val, description=fld, disabled=False, min=0, style=style,))

lr_slct = widgets.Dropdown(
    options=[key for key in lr_class_dict.keys()],  
    value=[key for key in lr_class_dict.keys()][0],
    description='lr_class',
)
# lr_cfg_list.append(lr_slct)

def create_lr_slct_list():
    lr_slct_params = [] #create new widgets
    for param, val in lr_class_dict[lr_slct.value].items():
        lr_slct_params.append(val[1](value=val[0],description=param))
    return lr_slct_params
slct_param = widgets.VBox(children=create_lr_slct_list())
# lr_cfg_list.append(slct_param)

def refresh_lr_param(change):
    if change['new']:
        val = lr_slct.value
        slct_param.children = create_lr_slct_list()

lr_slct.observe(refresh_lr_param, names='value')   
lr_cfg = widgets.VBox([lr_section_title, *lr_cfg_list, lr_slct, slct_param])

##########################################
## configuration of prunning schedule ####
##########################################

prunning_mod_args_field_initval = {
    'start_epoch': 0.0,# 'start epoch:'],
    'end_epoch': 25.0,#'end epoch  :'],
    'update_frequency': 1.0#,'update freq:'],


}

style = {'description_width': 'initial'}
prn_section_title = widgets.Text(value='prunning schedule', disabled=True)
prn_cfg_list =[]
for fld, val in prunning_mod_args_field_initval.items():
    prn_cfg_list.append(widgets.BoundedFloatText(value=val, description=fld, disabled=False, min=0, style=style,))


prn_cfg = widgets.VBox([prn_section_title,*prn_cfg_list])
schd_cfg = widgets.VBox([lr_cfg, prn_cfg])#,prn_cfg_list])
display(widgets.HBox([widgets.VBox([global_ctrl,layer_ctrl]),schd_cfg]))


HBox(children=(VBox(children=(HBox(children=(Checkbox(value=True, description='enable/disable all'), FloatSlid…

In [6]:
print('Creating learning rate schedule...')
lr_mod_args = {}

for child in lr_cfg_list: 
    lr_mod_args[child.description] = child.value
assert(lr_slct.description == 'lr_class')
lr_mod_args['lr_class'] = lr_slct.value
lr_mod_args['lr_kwargs'] = {}
for child in slct_param.children:
    lr_mod_args['lr_kwargs'][child.description] = child.value

lr_mod = LearningRateModifier(**lr_mod_args)

print('Creating sparsification schedule...')

def create_ks_mod_args(layer_name, final_sparsity):
    ks_mod_args ={
        'param': 'weight',
        'init_sparsity': 0.05,
        'inter_func': 'linear',
        'layers': [layer_name],
        'final_sparsity': final_sparsity
    }
    # add common fields
    for child in prn_cfg_list:
        ks_mod_args[child.description] = child.value
    return ks_mod_args

ks_mod_args_list = []
for ckbx_child, sldr_child in zip(layer_ctrl.children[0].children, layer_ctrl.children[1].children):
        if ckbx_child.value: #layer is sparsified
            layer_name = ckbx_child.description
            final_sparsity = sldr_child.value#
            ks_mod_args_list.append(create_ks_mod_args(layer_name, final_sparsity))
            
ks_mod_list = [GradualKSModifier(**ks_mod_args) for ks_mod_args in ks_mod_args_list]
modifiers = [lr_mod, *ks_mod_list]

modifier_manager = ScheduledModifierManager(modifiers)
optimizer = ScheduledOptimizer(optimizer, model, modifier_manager, steps_per_epoch=len(train_dataset))

Creating learning rate schedule...
Creating sparsification schedule...


## Setting up training
The following should look very familiar - it is in fact the exact same code from our previous tutorial (NB1). 

In [7]:
from tqdm import tqdm
from torch.utils.data import DataLoader

from typing import Tuple, Dict
from torch import Tensor
import torch
from torch.nn import Module


def _test_datasets(model, train_data_loader: DataLoader, val_data_loader: DataLoader,
                   writer: SummaryWriter, epoch: int) -> Tuple[Dict[str, float], Dict[str, float]]:
    val_losses , train_losses = None, None
    if val_data_loader  is not None:
        print('Running test for validation dataset for epoch {}'.format(epoch))
        val = test_epoch(model, val_data_loader, loss_calc, device, epoch)
        print('Completed test for validation dataset for epoch {}'.format(epoch))
        val_losses = {}
        for loss, _ in val.items():
            val_losses[loss] = torch.mean(torch.cat(val[loss])).item()
            val_tag = 'Test/validation/{}'.format(loss)
            writer.add_scalar(val_tag, val_losses[loss], epoch)
        
        val_loss_str = 'validation set - epoch: {} '.format(epoch)
        for loss, value in val_losses.items():
            val_loss_str += (loss + ': {0:.2f} '.format(value))
        print(val_loss_str)
        
        
    if train_data_loader is not None:
        print('Running test for train dataset for epoch {}'.format(epoch))
        train = test_epoch(model, train_data_loader, loss_calc, device, epoch)
        print('Completed test for train dataset for epoch {}'.format(epoch))
        train_losses = {}

        for loss, _ in train.items():
            train_losses[loss] = torch.mean(torch.cat(train[loss])).item()
            train_tag = 'Test/training/{}'.format(loss)
            writer.add_scalar(train_tag, train_losses[loss], epoch)

        
        train_loss_str = 'training set - epoch: {} '.format(epoch)
        for loss, value in train_losses.items():
            train_loss_str += (loss + ': {0:.2f} '.format(value))
        print(train_loss_str)


    return val_losses , train_losses


def test_epoch(model: Module, data_loader: DataLoader, loss, device, epoch: int) -> Dict:
    model.eval()
    results = {}#ModuleTestResults()
    with torch.no_grad():
        for batch, (*x_feature, y_lab) in enumerate(tqdm(data_loader)):
            y_lab = y_lab.to(device)
            x_feature = tuple([dat.to(device) for dat in x_feature])
            batch_size = y_lab.shape[0]
            
            y_pred = model(*x_feature)

            losses = loss(x_feature, y_lab, y_pred)  # type: Dict[str, Tensor]
            for key, val in losses.items():
                if key not in results:
                    results[key] = []

                result = val.detach_().cpu()
                result = result.repeat(batch_size) #repeat tensor so that there is no dependency on batch size
                results[key].append(result)
#             results.append(losses, batch_size)
    return results

def train_epoch(model: Module, data_loader: DataLoader, optimizer, loss, device, data_counter: int):
    model.train()
    
    for batch, (*x_feature, y_lab) in enumerate(tqdm(data_loader)):
        # copy next batch to the device we are using
        y_lab = y_lab.to(device)
        x_feature = tuple([dat.to(device) for dat in x_feature])
        batch_size = y_lab.shape[0]

        # Zero the parameter gradients
        optimizer.zero_grad()

        # forward 
        y_pred = model(*x_feature)
        
        # update losses
        losses = loss(x_feature, y_lab, y_pred)  # type: Dict[str, Tensor]
        
        # backward
        losses['loss'].backward()
        
        # take SGD step
        optimizer.step(closure=None)
        
        # log loss and accuracy
        data_counter += batch_size
        for _loss, _value in losses.items():
            writer.add_scalar('Train/{}'.format(_loss), _value.item(), data_counter)



## Prunning main loop:
This too is very simialr to the training main loop previously introduced, the main differences are:
1. we are tracking sparsity
2. we are following the schedules as orchastrated by the modifiers above

In [8]:
import math
print('Running baseline test...')
epoch = -1
_record_kernel_sparsity(analyzed_layers, writer, epoch)
_test_datasets(model, None, val_data_loader, writer, epoch=-1)

print('Training model')
num_epochs = int(math.ceil(modifier_manager.max_epochs))
data_counter = 0

for epoch in range(num_epochs):
    print('Starting epoch {}'.format(epoch))
    optimizer.epoch_start()
    _record_kernel_sparsity(analyzed_layers, writer, epoch)



    train_epoch(model, train_data_loader, optimizer, loss_calc, device, data_counter)
    optimizer.epoch_end()
    val_losses, train_losses = _test_datasets(model, None, val_data_loader, writer, epoch)

    if save_rate > 0 and epoch % save_rate == 0:
        save_path = os.path.join(model_dir, 'resnet50-epoch={:03d}-val={:.4f}.pth'
                                 .format(epoch, val_losses['loss']))
        save_model(save_path, model, optimizer, epoch)
        print('saved model checkpoint at {}'.format(save_path))

_record_kernel_sparsity(analyzed_layers, writer, num_epochs)


scalars_json_path = os.path.join(logs_dir, 'all_scalars.json')
writer.export_scalars_to_json(scalars_json_path)
writer.close()

save_path = os.path.join(model_dir, 'resnet50-pruned.pth')
print('Finished training, saving model to {}'.format(save_path))
save_model(save_path, model)
print('Saved model')

  0%|          | 0/2 [00:00<?, ?it/s]

Running baseline test...
sparsity per layer [%]: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
Running test for validation dataset for epoch -1


100%|██████████| 2/2 [00:01<00:00,  1.06it/s]


Completed test for validation dataset for epoch -1
validation set - epoch: -1 loss: 0.40 top1acc: 87.20 top5acc: 98.60 
Training model
Starting epoch 0


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 0


100%|██████████| 2/2 [00:01<00:00,  1.05it/s]


Completed test for validation dataset for epoch 0
validation set - epoch: 0 loss: 0.39 top1acc: 87.60 top5acc: 98.80 
saved model checkpoint at ../pruned/resnet50-epoch=000-val=0.3917.pth
Starting epoch 1


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0]


100%|██████████| 101/101 [00:23<00:00,  4.76it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 1


100%|██████████| 2/2 [00:01<00:00,  1.06it/s]


Completed test for validation dataset for epoch 1
validation set - epoch: 1 loss: 0.37 top1acc: 87.80 top5acc: 98.80 
Starting epoch 2


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0, 9.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 2


100%|██████████| 2/2 [00:01<00:00,  1.10it/s]


Completed test for validation dataset for epoch 2
validation set - epoch: 2 loss: 0.39 top1acc: 88.00 top5acc: 99.00 
Starting epoch 3


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0]


100%|██████████| 101/101 [00:23<00:00,  4.74it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 3


100%|██████████| 2/2 [00:01<00:00,  1.14it/s]


Completed test for validation dataset for epoch 3
validation set - epoch: 3 loss: 0.35 top1acc: 88.40 top5acc: 99.00 
Starting epoch 4


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0]


100%|██████████| 101/101 [00:23<00:00,  4.76it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 4


100%|██████████| 2/2 [00:01<00:00,  1.10it/s]


Completed test for validation dataset for epoch 4
validation set - epoch: 4 loss: 0.38 top1acc: 89.00 top5acc: 98.80 
Starting epoch 5


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0]


100%|██████████| 101/101 [00:23<00:00,  4.76it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 5


100%|██████████| 2/2 [00:01<00:00,  1.06it/s]


Completed test for validation dataset for epoch 5
validation set - epoch: 5 loss: 0.35 top1acc: 89.40 top5acc: 98.60 
saved model checkpoint at ../pruned/resnet50-epoch=005-val=0.3498.pth
Starting epoch 6


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0, 16.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 6


100%|██████████| 2/2 [00:01<00:00,  1.05it/s]


Completed test for validation dataset for epoch 6
validation set - epoch: 6 loss: 0.37 top1acc: 89.20 top5acc: 98.40 
Starting epoch 7


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0, 18.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 7


100%|██████████| 2/2 [00:01<00:00,  1.08it/s]


Completed test for validation dataset for epoch 7
validation set - epoch: 7 loss: 0.40 top1acc: 86.20 top5acc: 99.00 
Starting epoch 8


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0, 19.0]


100%|██████████| 101/101 [00:23<00:00,  4.74it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 8


100%|██████████| 2/2 [00:01<00:00,  1.08it/s]


Completed test for validation dataset for epoch 8
validation set - epoch: 8 loss: 0.36 top1acc: 88.00 top5acc: 98.80 
Starting epoch 9


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0, 21.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 9


100%|██████████| 2/2 [00:01<00:00,  1.10it/s]


Completed test for validation dataset for epoch 9
validation set - epoch: 9 loss: 0.36 top1acc: 89.20 top5acc: 99.00 
Starting epoch 10


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0, 23.0]


100%|██████████| 101/101 [00:23<00:00,  4.75it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 10


100%|██████████| 2/2 [00:01<00:00,  1.11it/s]


Completed test for validation dataset for epoch 10
validation set - epoch: 10 loss: 0.37 top1acc: 87.40 top5acc: 98.40 
saved model checkpoint at ../pruned/resnet50-epoch=010-val=0.3740.pth
Starting epoch 11


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0, 25.0]


100%|██████████| 101/101 [00:23<00:00,  4.76it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 11


100%|██████████| 2/2 [00:01<00:00,  1.11it/s]


Completed test for validation dataset for epoch 11
validation set - epoch: 11 loss: 0.37 top1acc: 87.80 top5acc: 98.80 
Starting epoch 12


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0, 27.0]


100%|██████████| 101/101 [00:23<00:00,  4.76it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 12


100%|██████████| 2/2 [00:01<00:00,  1.11it/s]


Completed test for validation dataset for epoch 12
validation set - epoch: 12 loss: 0.38 top1acc: 87.60 top5acc: 98.80 
Starting epoch 13


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0, 28.0]


100%|██████████| 101/101 [00:23<00:00,  4.76it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 13


100%|██████████| 2/2 [00:01<00:00,  1.10it/s]


Completed test for validation dataset for epoch 13
validation set - epoch: 13 loss: 0.37 top1acc: 87.60 top5acc: 98.80 
Starting epoch 14


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0, 30.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 14


100%|██████████| 2/2 [00:01<00:00,  1.06it/s]


Completed test for validation dataset for epoch 14
validation set - epoch: 14 loss: 0.35 top1acc: 89.00 top5acc: 98.80 
Starting epoch 15


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0, 32.0]


100%|██████████| 101/101 [00:23<00:00,  4.76it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 15


100%|██████████| 2/2 [00:01<00:00,  1.07it/s]


Completed test for validation dataset for epoch 15
validation set - epoch: 15 loss: 0.38 top1acc: 89.00 top5acc: 98.40 
saved model checkpoint at ../pruned/resnet50-epoch=015-val=0.3849.pth
Starting epoch 16


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0, 34.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 16


100%|██████████| 2/2 [00:01<00:00,  1.07it/s]


Completed test for validation dataset for epoch 16
validation set - epoch: 16 loss: 0.37 top1acc: 89.00 top5acc: 98.40 
Starting epoch 17


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0, 36.0]


100%|██████████| 101/101 [00:23<00:00,  4.76it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 17


100%|██████████| 2/2 [00:01<00:00,  1.11it/s]


Completed test for validation dataset for epoch 17
validation set - epoch: 17 loss: 0.36 top1acc: 89.60 top5acc: 98.80 
Starting epoch 18


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0, 37.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 18


100%|██████████| 2/2 [00:01<00:00,  1.06it/s]


Completed test for validation dataset for epoch 18
validation set - epoch: 18 loss: 0.39 top1acc: 87.80 top5acc: 99.00 
Starting epoch 19


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0, 39.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 19


100%|██████████| 2/2 [00:01<00:00,  1.11it/s]


Completed test for validation dataset for epoch 19
validation set - epoch: 19 loss: 0.37 top1acc: 89.00 top5acc: 98.80 
Starting epoch 20


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0, 41.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 20


100%|██████████| 2/2 [00:01<00:00,  1.11it/s]


Completed test for validation dataset for epoch 20
validation set - epoch: 20 loss: 0.36 top1acc: 89.60 top5acc: 98.80 
saved model checkpoint at ../pruned/resnet50-epoch=020-val=0.3554.pth
Starting epoch 21


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0, 43.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 21


100%|██████████| 2/2 [00:01<00:00,  1.13it/s]


Completed test for validation dataset for epoch 21
validation set - epoch: 21 loss: 0.35 top1acc: 88.80 top5acc: 98.40 
Starting epoch 22


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0, 45.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 22


100%|██████████| 2/2 [00:01<00:00,  1.11it/s]


Completed test for validation dataset for epoch 22
validation set - epoch: 22 loss: 0.37 top1acc: 89.20 top5acc: 99.00 
Starting epoch 23


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0, 46.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 23


100%|██████████| 2/2 [00:01<00:00,  1.10it/s]


Completed test for validation dataset for epoch 23
validation set - epoch: 23 loss: 0.41 top1acc: 87.80 top5acc: 98.00 
Starting epoch 24


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0, 48.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 24


100%|██████████| 2/2 [00:01<00:00,  1.09it/s]


Completed test for validation dataset for epoch 24
validation set - epoch: 24 loss: 0.39 top1acc: 88.00 top5acc: 98.00 
Starting epoch 25


  0%|          | 0/101 [00:00<?, ?it/s]

sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.78it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 25


100%|██████████| 2/2 [00:01<00:00,  1.06it/s]
  0%|          | 0/101 [00:00<?, ?it/s]

Completed test for validation dataset for epoch 25
validation set - epoch: 25 loss: 0.38 top1acc: 89.60 top5acc: 98.60 
saved model checkpoint at ../pruned/resnet50-epoch=025-val=0.3797.pth
Starting epoch 26
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 26


100%|██████████| 2/2 [00:01<00:00,  1.11it/s]
  0%|          | 0/101 [00:00<?, ?it/s]

Completed test for validation dataset for epoch 26
validation set - epoch: 26 loss: 0.37 top1acc: 89.00 top5acc: 98.40 
Starting epoch 27
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 27


100%|██████████| 2/2 [00:01<00:00,  1.10it/s]
  0%|          | 0/101 [00:00<?, ?it/s]

Completed test for validation dataset for epoch 27
validation set - epoch: 27 loss: 0.37 top1acc: 88.80 top5acc: 98.40 
Starting epoch 28
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 28


100%|██████████| 2/2 [00:01<00:00,  1.10it/s]
  0%|          | 0/101 [00:00<?, ?it/s]

Completed test for validation dataset for epoch 28
validation set - epoch: 28 loss: 0.38 top1acc: 88.80 top5acc: 98.20 
Starting epoch 29
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 29


100%|██████████| 2/2 [00:01<00:00,  1.09it/s]
  0%|          | 0/101 [00:00<?, ?it/s]

Completed test for validation dataset for epoch 29
validation set - epoch: 29 loss: 0.38 top1acc: 89.00 top5acc: 98.40 
Starting epoch 30
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.75it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 30


100%|██████████| 2/2 [00:01<00:00,  1.09it/s]


Completed test for validation dataset for epoch 30
validation set - epoch: 30 loss: 0.38 top1acc: 89.20 top5acc: 98.20 
saved model checkpoint at ../pruned/resnet50-epoch=030-val=0.3755.pth
Starting epoch 31
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.78it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 31


100%|██████████| 2/2 [00:01<00:00,  1.12it/s]
  0%|          | 0/101 [00:00<?, ?it/s]

Completed test for validation dataset for epoch 31
validation set - epoch: 31 loss: 0.38 top1acc: 89.00 top5acc: 98.40 
Starting epoch 32
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.76it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 32


100%|██████████| 2/2 [00:01<00:00,  1.10it/s]
  0%|          | 0/101 [00:00<?, ?it/s]

Completed test for validation dataset for epoch 32
validation set - epoch: 32 loss: 0.38 top1acc: 88.60 top5acc: 97.80 
Starting epoch 33
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.75it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 33


100%|██████████| 2/2 [00:01<00:00,  1.07it/s]
  0%|          | 0/101 [00:00<?, ?it/s]

Completed test for validation dataset for epoch 33
validation set - epoch: 33 loss: 0.38 top1acc: 88.80 top5acc: 98.00 
Starting epoch 34
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]


100%|██████████| 101/101 [00:23<00:00,  4.77it/s]
  0%|          | 0/2 [00:00<?, ?it/s]

Running test for validation dataset for epoch 34


100%|██████████| 2/2 [00:01<00:00,  1.04it/s]


Completed test for validation dataset for epoch 34
validation set - epoch: 34 loss: 0.38 top1acc: 88.60 top5acc: 98.20 
sparsity per layer [%]: [50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0, 50.0]
Finished training, saving model to ../pruned/resnet50-pruned.pth
Saved model


In [9]:
modifier_manager.max_epochs

35.0