© 2020 Neuralmagic, Inc., Confidential // Neural Magic Evaluation License Agreement

# PyTorch Model Pruning with Adam Optimizer

This notebook provides a step-by-step walkthrough for pruning an already trained (dense) model to enable better performance at inference time using the Neural Magic Inference Engine. You will:
- Set up the environment
- Set up the model and dataset
- Analyze loss sensitivity
- Select hyperparameters
- Recalibrate using pruning
- Export to ONNX

Reading through this notebook will be reasonably quick to gain an intuition for what is happening. Rough time estimates for fully pruning the default model are given. Note that training with the PyTorch CPU implementation will be much slower than a GPU:
- 15 minutes on a GPU
- 45 minutes on a laptop CPU

## Background
Neural networks are generally overparameterized for given tasks (i.e., the number of parameters far exceeds the number of training points), yet they still generalize well. Overparameterization is contrary to conventional ML wisdom, where overparameterizing a model would traditionally lead to overﬁtting. The overall term for this is double descent and it is a very active area of research.

A side eﬀect of this overparameterization is that a large number of weights in deep learning networks can be pruned away (set to 0). This was discovered early on by Yann Lecun, but interest waned due to lack of applications at that time. Song Han's 2015 paper reinvigorated the area in pursuit of compressing model size for mobile applications. This renewed interest has resulted in numerous papers on the topic of weight pruning, ﬁlter pruning, channel pruning, and ultimately, block pruning. A Google paper gives a good overview of the current state of kernel sparsity (model pruning).

While pruning to increase kernel sparsity, we iteratively go through and remove weights based on their absolute magnitude. The smallest weights are the ones pruned ﬁrst. Generally, two properties enable us to do this: the self-regularizing effect of gradient descent as well as the L1 or L2 regularization functions applied to the weights. Weights that do not help in the optimization process are quickly reduced in absolute value. In this way, pruning can be thought of as an architecture search.

What does pruning get us? We now have a model with a lot of multiplications by zero that we don't need to run. If we're smart about how we structure this compute (a surprisingly tricky problem), we can run the model much faster than before! The pruned model plus the ability to run it quickly in the Neural Magic Inference Engine helps to optimize performance. Neural Magic makes it easier to apply the algorithm, giving you more information so you can apply the algorithm with better results.
In this notebook, you prune a simple CNN on the MNIST dataset using an Adam optimizer. However, the notebook is designed to be easily extendable for your model and dataset.  Guided instructions are provided in the notebook code comments. 

Note that the Adam optimizer is easier to use when compared with a Stocahstic Gradient Descent (SGD) optimizer; however, SGD is the preferred method for pruning to ensure the resulting model will generalize well. See our other notebooks for pruning with SGD. 

## Before you begin…
Be sure to read through the README found in the Neural Magic Recalibration Tooling (neuralmagicML) package.


## Step 1 - Setting Up the Environment

In this step, Neural Magic checks your environment setup to ensure the rest of the notebook will flow smoothly.
Before running, install the neuralmagicML package into the system using the following command:

`pip install neuralmagicML-python/ `


In [None]:
notebook_name = "pruning_adam_pytorch"
print("checking setup for {}...".format(notebook_name))

# filter because of tensorboard future warnings
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

try:
    # make sure neuralmagicML is installed
    import neuralmagicML
except Exception as ex:
    raise Exception(
        "please install neuralmagicML using the setup.py file before continuing"
    )
    
from neuralmagicML.utilsnb import check_notebook_setup
check_notebook_setup()

## Step 2 - Setting Up the Model and Dataset

By default, you will create a simple CNN to prune on the MNIST dataset. The CNN is already pretrained, and the weights download from the Neural Magic Model Repo. The MNIST dataset will auto-download as well through PyTorch.

If you would like to try out your model for pruning, modify the appropriate lines for your model and dataset, speciﬁcally:
- model = mnist_net(pretrained=True)
- train_dataset = MNISTDataset(dataset_root, train=True) 
- val_dataset = MNISTDataset(dataset_root, train=False)

Take care to keep the variable names the same, as the rest of the notebook is set up according to those.


In [None]:
import os
from neuralmagicML.pytorch.datasets import MNISTDataset
from neuralmagicML.pytorch.models import mnist_net
from neuralmagicML.utils import clean_path

#######################################################
# Define your model below
#######################################################
print("loading model...")
model = mnist_net(pretrained=True)
model_name = model.__class__.__name__
print(model)

#######################################################
# Define your train and validation datasets below
#######################################################

print("\nloading train dataset...")
train_dataset = MNISTDataset(train=True)
print(train_dataset)

print("\nloading val dataset...")
val_dataset = MNISTDataset(train=False)
print(val_dataset)

## Step 3 - Analyzing Loss Sensitivity

One of the hyperparameters you need to control is how sparse (percentage of zeros) to make each fully connected or convolutional layer in a network. Not all layers are created equal, so you will want to be careful about how you assign sparsity. Generally, the more parameters there are per input data, the less sensitive (and therefore more prunable) the layer will be. For example, a 3x3 convolution is much less sensitive than an equivalent channel sized 1x1 convolution. Likewise, increasing stride for convolutions will increase sensitivity.

To enable more natural visibility into this, we provide a quick, one-shot approach to approximating sensitivity. Using the one_shot_ks_loss_sensitivity() function, an algorithm goes layer by layer and prunes each to different levels of sparsity without retraining. In this way, it is a reasonable approximation that is inexpensive to run because it does not require significant computer resources. For display, the piecewise integral of the sparsity versus loss curve is calculated for each layer. Therefore, higher sensitivities mean more loss for a given amount of sparsity.

Note: If you changed the model and/or dataset above, you should change the loss, batch_size, and samples_per_measurement variables below. The number of samples per measurement can be relatively small (only one or a few items per class) to get a proper analysis.

Finally, after running, the results will be saved to a JSON ﬁle and plotted in this notebook for easy viewing.


In [None]:
import torch
from neuralmagicML.pytorch.utils import CrossEntropyLossWrapper
from neuralmagicML.pytorch.recal import one_shot_ks_loss_sensitivity
from neuralmagicML.utils import clean_path

device = "cuda" if torch.cuda.is_available() else "cpu"
print("running ks loss sensitivity analysis for model on {}".format(device))

#######################################################
# Edit paramaters below
#######################################################
loss = CrossEntropyLossWrapper()
batch_size = 1024
samples_per_measurement = 1024

loss_analysis = one_shot_ks_loss_sensitivity(
    model, val_dataset, loss, device, batch_size, samples_per_measurement
)

save_path = clean_path(
    os.path.join(".", notebook_name, model_name, "ks-loss-sensitivity.json")
)
loss_analysis.save_json(save_path)
print("saved analysis to {}".format(save_path))
print("plotting...")
fig, axes = loss_analysis.plot(path=None, normalize=False)

## Step 4 - Hyperparameters

In addition to the sparsity per layer hyperparameter, there are a few more for pruning. The most significant are:
- When to start pruning (stabilization period). Letting the model stabilize a bit before beginning pruning is generally a good idea. Edits to the training setup can make the initial epoch or two unstable. So, before cutting out weights, you want to make sure the model is stable.
- How long to prune (pruning period). Pruning for more epochs is preferred up to a point. The shorter the pruning period, the less likely it is that the model has converged to a stable position before pruning again. A good rule is to prune over roughly 1/6 to 1/3 the number of epochs it took to train.
- How long to train after pruning (fine-tuning period). Generally, the model will not have fully recovered after pruning has stopped. In this case, training should continue a bit longer until the validation loss has stabilized. A good rule is to ﬁne-tune for roughly 1/6 to 1/3 the number of epochs it took to train.
- How often to update pruning steps while in the pruning period. The general convention is to apply pruning steps once per epoch. For diﬀerent setups, it may be beneﬁcial to prune more often (e.g., once every tenth of an epoch -- 0.1). It depends on how many weight updates have happened since the last pruning step and how stable the loss function is currently.

In support of all these diﬀerent hyperparameters, a conﬁguration ﬁle is used and then loaded at training time. A simple UI is given in the cell block below to enable easy editing of the conﬁguration. The parameters mentioned above can all be adjusted. Soon, Neural Magic will replace this with a more advanced UI with more features to make this selection even easier! For now, we recommend using this notebook and the UI inside to generate the conﬁguration ﬁle. You can look at the output after the next step as it saves the conﬁguration to a ﬁle locally.

Defaults are given for the MNIST network and dataset. You may need to change these to better ﬁt your application.


In [None]:
from neuralmagicML.utilsnb import (
    KSWidgetContainer,
    PruningEpochWidget,
    PruningLayersWidget,
)
from neuralmagicML.pytorch.utils import get_prunable_layers
from neuralmagicML.pytorch.models import MnistNet

if "loss_analysis" not in globals():
    loss_analysis = None

prune_layers = get_prunable_layers(model)
not_mnist = not isinstance(model, MnistNet)
widget_container = KSWidgetContainer(
    PruningEpochWidget(start_epoch=2, end_epoch=20, total_epochs=25, max_epochs=100),
    PruningLayersWidget(
        layer_names=[layer[0] for layer in prune_layers],
        layer_descs=[str(layer[1]) for layer in prune_layers],
        layer_enables=None if not_mnist else [False, True, True, True, True],
        layer_sparsities=None if not_mnist else [0.0, 0.8, 0.9, 0.9, 0.9],
        loss_sens_analysis=loss_analysis,
    ),
)
print("creating ui...")
display(widget_container.create())

## Step 5 - Recalibrating Using Pruning

Now that the hyperparameters are chosen, you will use them to recalibrate the given model and dataset. The library is designed to be easily plugged into nearly any training setup for PyTorch. In the cell block below is an example of how an integration looks. Note that only five lines are needed to be able to integrate fully.
- Create a `ScheduledModifierManager()`. This loads the conﬁg into PyTorch objects that modify the training process.
- Create a `ScheduledOptimizer()`. This updates the PyTorch objects that modify the training process. It wraps the original optimizer that was used to modify the training process/graph, and should be used in place of that (`optimizer.step()` must be called on ScheduledOptimizer and not the original).
- Use `max_epochs` on the `ScheduledModifierManager` to know how many epochs are needed for training.
- Call into the `ScheduledOptimizer` for `epoch_start()` and `epoch_end()` before training. These calls mark when an epoch has started and after training when an epoch has ended, respectively.
 
Once the training objects are created (optimizer, loss function, etc.), a `ScheduledModifierManager` and `ScheduledOptimizer` are instantiated from the conﬁguration. Almost all logging and updates are done through TensorBoard for this notebook. The use of TensorBoard is entirely optional. Finally, regular training and testing code for PyTorch is used to go through the process.

Note, for convenience a TensorBoard instance is launched in the cell below pointed at `localhost`. If you are running this notebook on a remote server, then you will need to update TensorBoard accordingly.


In [None]:
import math
from tqdm import auto
import torch
from torch.utils.data import DataLoader
from torch.optim import Adam
from neuralmagicML.utils import create_unique_dir, clean_path
from neuralmagicML.pytorch.utils import (
    CrossEntropyLossWrapper,
    TopKAccuracy,
    ModuleTrainer,
    ModuleTester,
    TensorBoardLogger,
)
from neuralmagicML.pytorch.recal import ScheduledModifierManager, ScheduledOptimizer

# save the config locally for use in this flow
config_path = clean_path(os.path.join(".", notebook_name, model_name, "config.yaml"))
print("saving config to {}".format(config_path))
widget_container.get_manager("pytorch").save(config_path)

# setup device, data loaders, loss, optimizer
device = "cuda" if torch.cuda.is_available() else "cpu"
batch_size = 1024
train_data_loader = DataLoader(train_dataset, batch_size, shuffle=True, pin_memory=True)
val_data_loader = DataLoader(val_dataset, batch_size, shuffle=False, pin_memory=True)
loss = CrossEntropyLossWrapper(extras={"top1acc": TopKAccuracy(1)})
optim = Adam(model.parameters())
print("device:{} batch_size:{} loss:{}".format(device, batch_size, loss))

tensorboard_model_path = create_unique_dir(
    os.path.join(".", "tensorboard-logs", notebook_name, model_name)
)
print("logging at {}".format(tensorboard_model_path))

#######################################################
# First lines required for recalibrating a model in PyTorch
#######################################################
loggers = [TensorBoardLogger(tensorboard_model_path)]
steps_per_epoch = math.ceil(len(train_dataset) / batch_size)
manager = ScheduledModifierManager.from_yaml(config_path)
optim = ScheduledOptimizer(optim, model, manager, steps_per_epoch=steps_per_epoch, loggers=loggers)
print("created manager and optimizer from config at {}".format(config_path))

# we use prewritten trainers and testers to make the code more concise
trainer = ModuleTrainer(model, device, loss, optim, loggers=loggers)
tester = ModuleTester(model, device, loss, loggers=loggers)
model = model.to(device)

# startup tensorboard
%load_ext tensorboard
%tensorboard --logdir ./tensorboard-logs

# run initial validation for comparison
tester.run_epoch(val_data_loader, epoch=-1, show_progress=False)

#######################################################
# Final lines required for recalibrating a model in PyTorch
#######################################################
for epoch in auto.tqdm(range(manager.max_epochs), desc="training"):
    optim.epoch_start()

    trainer.run_epoch(train_data_loader, epoch, show_progress=False)
    tester.run_epoch(val_data_loader, epoch, show_progress=False)

    optim.epoch_end()

# delete so all modifiers are cleaned up before exporting
del optim
print("training completed")

## Step 6 - Exporting to ONNX

Now that the model is fully recalibrated, you need to export it to an ONNX format, which is the format used by the Neural Magic Inference Engine. For PyTorch, exporting to ONNX is natively supported. In the cell block below, a convenience class, ModuleExporter(), is used to handle exporting.

Once the model is saved as an ONNX ﬁle, it is ready to be used for inference with Neural Magic.


In [None]:
from neuralmagicML.utils import clean_path
from neuralmagicML.pytorch.utils import ModuleExporter

print("exporting to onnx...")
export_path = clean_path(os.path.join(".", notebook_name, model_name))
exporter = ModuleExporter(model, export_path)
for batch in val_data_loader:
    sample_input = batch[0]
    break
exporter.export_onnx(sample_input)
print("exported onnx to {}".format(export_path))

## Next Step

Run your model (ONNX file) through the Neural Magic Inference Engine. The following is an example of code that you can run in your Python console. Be sure to enter your ONNX file path and batch size.

```
from neuralmagic import create_model
model = create_model(onnx_file_path=’some/path/to/model.onnx’, batch_size=1)
inp = [numpy.random.rand(1, 3, 224, 224).astype(numpy.float32)]
out = model.forward(inp)
print(out)
```