From c67669ea251a79b03a5448e6ae1a55a5df8e958e Mon Sep 17 00:00:00 2001
From: Benjamin <ben@neuralmagic.com>
Date: Mon, 22 Feb 2021 21:24:03 -0500
Subject: [PATCH 1/8] pytorch sparse quantized transfer learning notebook

---
 ...h_sparse_quantized_transfer_learning.ipynb | 467 ++++++++++++++++++
 .../pytorch/models/external/torchvision.py    |   5 +-
 src/sparseml/pytorch/models/registry.py       |   5 +-
 .../optim/quantization/quantize_qat_export.py |  12 +-
 src/sparseml/pytorch/utils/model.py           |  15 +-
 5 files changed, 499 insertions(+), 5 deletions(-)
 create mode 100644 notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
diff --git a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
new file mode 100644
index 00000000000..23271c4582a
--- /dev/null
+++ b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
@@ -0,0 +1,467 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<sub>&copy; 2021-present Neuralmagic, Inc. // [Neural Magic Legal](https://neuralmagic.com/legal)</sub> \n",
+    "\n",
+    "# Sparse Quantized Transfer Learning in PyTorch using SparseML\n",
+    "\n",
+    "This notebook provides a step-by-step walkthrough for creating a performant sparse quantized model\n",
+    "by transfer learning the pre-optimized structure from an already sparse quantized model.  Sparse quantized models take advatage of both block pruning to reduce model parameters and INT8 quantization to reduce computation cost.  Using these optimizations, your model will obtain better performance at inference time using the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse).\n",
+    "\n",
+    "Sparse quantized transfer learning takes two steps: First, fine-tune a pre-trained sparse model for the\n",
+    "transfer dataset while maintaining the pre-trianed sparsity structure.  Second, perform [quantization aware training](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training) to quantize the now sparse model while still holding the same sparsity structure.  [SparseML](https://github.com/neuralmagic/sparseml) recipes make it easy to perform these optimizations.\n",
+    "\n",
+    "In this notebook, you will:\n",
+    "- Set up the model and dataset\n",
+    "- Define a generic PyTorch training flow\n",
+    "- Integrate the PyTorch flow with SparseML for transfer learning\n",
+    "- Perform sparse transfer learning and quantization aware training using the PyTorch+SparseML flow\n",
+    "- Export to [ONNX](https://onnx.ai/) and convert the model from a QAT\n",
+    "- Compare DeepSparse engine benchmarks of the final sparse quantized model to an unoptimized model\n",
+    "\n",
+    "Reading through this notebook will be reasonably quick to gain an intuition for how to plug SparseML into your PyTorch training flow for transfer learning and generically. Rough time estimates for fully pruning the default model are given. Note that training with the PyTorch CPU implementation will be much slower than a GPU:\n",
+    "- 30 minutes on a GPU\n",
+    "- 90 minutes on a laptop CPU"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 1 - Requirements\n",
+    "\n",
+    "To run this notebook, you will need the following packages already installed:\n",
+    "* SparseML, SparseZoo\n",
+    "* PyTorch (>= 1.7.0) and torchvision\n",
+    "* DeepSparse (can be installed with `pip install deepsparse` if not already)\n",
+    "\n",
+    "You can install any package that is not already present via `pip`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import deepsparse\n",
+    "import sparseml\n",
+    "import sparsezoo\n",
+    "import torch\n",
+    "import torchvision\n",
+    "\n",
+    "assert torch.__version__ >= \"1.7\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 2 - Setting Up the Model and Dataset\n",
+    "\n",
+    "By default, you will transfer learn from a sparse quantized [ResNet50](https://arxiv.org/abs/1512.03385) model trained on the [ImageNet dataset](http://www.image-net.org/) to the much smaller [Imagenette dataset](https://github.com/fastai/imagenette). The transfer learning weights are downloaded from the [SparseZoo](https://github.com/neuralmagic/sparsezoo) model repo.   The Imagenette dataset is downloaded from its repository via a helper class from SparseML.\n",
+    "\n",
+    "When loading weights for transfer learning classification models, it is standard to override the final classifier layer to fit the output shape for the new dataset.  In the example below, this is done by specifying `ignore_error_tensors` as the weights that will be initialzed for the new model.  In other flows this could be accomplished by setting `model.classifier.fc = torch.nn.Linear(...)`.\n",
+    "\n",
+    "If you would like to try out your model for pruning, modify the appropriate lines for your model and dataset, speciﬁcally:\n",
+    "- checkpoint_path = ...\n",
+    "- model = ModelRegistry.create(...)\n",
+    "- train_dataset = ImagenetteDataset(...)\n",
+    "- val_dataset = ImagenetteDataset(...)\n",
+    "\n",
+    "Take care to keep the variable names the same, as the rest of the notebook is set up according to those and update any parts of the training flow as needed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sparseml.pytorch.models import ModelRegistry\n",
+    "from sparseml.pytorch.datasets import ImagenetteDataset, ImagenetteSize\n",
+    "from sparsezoo import Zoo\n",
+    "\n",
+    "#######################################################\n",
+    "# Define your model below\n",
+    "#######################################################\n",
+    "print(\"loading model...\")\n",
+    "# SparseZoo stub to pretrained sparse quantized ResNet50 for imagenet dataset\n",
+    "zoo_checkpoint_path = (\n",
+    "    \"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned_quant-moderate\"\n",
+    ")\n",
+    "model = ModelRegistry.create(\n",
+    "    key=\"resnet50\",\n",
+    "    pretrained_path=zoo_checkpoint_path,\n",
+    "    pretrained_dataset=\"imagenette\",\n",
+    "    num_classes=10,\n",
+    "    ignore_error_tensors=[\"classifier.fc.weight\", \"classifier.fc.bias\"],\n",
+    ")\n",
+    "input_shape = ModelRegistry.input_shape(\"resnet50\")\n",
+    "input_size = input_shape[-1]\n",
+    "print(model)\n",
+    "#######################################################\n",
+    "# Define your train and validation datasets below\n",
+    "#######################################################\n",
+    "\n",
+    "print(\"\\nloading train dataset...\")\n",
+    "train_dataset = ImagenetteDataset(\n",
+    "    train=True, dataset_size=ImagenetteSize.s320, image_size=input_size\n",
+    ")\n",
+    "print(train_dataset)\n",
+    "\n",
+    "print(\"\\nloading val dataset...\")\n",
+    "val_dataset = ImagenetteDataset(\n",
+    "    train=False, dataset_size=ImagenetteSize.s320, image_size=input_size\n",
+    ")\n",
+    "print(val_dataset)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3 - Set Up a PyTorch Training Loop\n",
+    "SparseML can plug directly into your existing PyTorch training flow by overriding the Optimizer object. To demonstrate this, in the cell below, we define a simple PyTorch training loop adapted from [here](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html).  To prune and quantize your existing models using SparseML, you can use your own training flow."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tqdm.auto import tqdm\n",
+    "import math\n",
+    "import torch\n",
+    "\n",
+    "\n",
+    "def run_model_one_epoch(model, data_loader, criterion, device, train=False, optimizer=None):\n",
+    "    if train:\n",
+    "        model.train()\n",
+    "    else:\n",
+    "        model.eval()\n",
+    "\n",
+    "    running_loss = 0.0\n",
+    "    total_correct = 0\n",
+    "    total_predictions = 0\n",
+    "\n",
+    "    for step, (inputs, labels) in tqdm(enumerate(data_loader), total=len(data_loader)):\n",
+    "        inputs = inputs.to(device)\n",
+    "        labels = labels.to(device)\n",
+    "\n",
+    "        if train:\n",
+    "            optimizer.zero_grad()\n",
+    "\n",
+    "        outputs, _ = model(inputs)  # model returns logits and softmax as a tuple\n",
+    "        loss = criterion(outputs, labels)\n",
+    "\n",
+    "        if train:\n",
+    "            loss.backward()\n",
+    "            optimizer.step()\n",
+    "\n",
+    "        running_loss += loss.item()\n",
+    "\n",
+    "        predictions = outputs.argmax(dim=1)\n",
+    "        total_correct += torch.sum(predictions == labels).item()\n",
+    "        total_predictions += inputs.size(0)\n",
+    "\n",
+    "    loss = running_loss / (step + 1.0)\n",
+    "    accuracy = total_correct / total_predictions\n",
+    "    return loss, accuracy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4 - Set Up PyTorch Training Objects\n",
+    "In this step, you will select hyperparameters, a device to train your model with, set up DataLoader objects, a loss function, and optimizer.  All of these variables and objects can be replaced to fit your training flow."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from torch.utils.data import DataLoader\n",
+    "from torch.nn import CrossEntropyLoss\n",
+    "from torch.optim import Adam\n",
+    "\n",
+    "# hyperparameters\n",
+    "BATCH_SIZE = 32\n",
+    "\n",
+    "# setup device\n",
+    "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
+    "model.to(device)\n",
+    "print(f\"Using device: {device}\")\n",
+    "\n",
+    "# setup data loaders\n",
+    "train_loader = DataLoader(\n",
+    "    train_dataset, BATCH_SIZE, shuffle=True, pin_memory=True, num_workers=8\n",
+    ")\n",
+    "val_loader = DataLoader(\n",
+    "    val_dataset, BATCH_SIZE, shuffle=False, pin_memory=True, num_workers=8\n",
+    ")\n",
+    "\n",
+    "# setup loss function and optimizer, LR will be overriden by sparseml\n",
+    "criterion = CrossEntropyLoss()\n",
+    "optimizer = Adam(model.parameters(), lr=8e-3)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 5 - Apply a SparseML Recipe and Prune Model\n",
+    "\n",
+    "To run sparse quantized transfer learning with SparseML, you will download a transfer learning recipe from SparseZoo and use it to create a `ScheduledModifierManager` object.  This manager will be used to wrap the optimizer object to maintain the pre-optimized model's sparsity structure while learning weights for the new dataset as well as performing quantization aware training.\n",
+    "\n",
+    "You can create SparseML recipes to perform various model pruning schedules, quantization aware training, sparse transfer learning, and more.  If you are using a different model than the default, you will have to modify the recipe  file to match the new target's parameters.\n",
+    "\n",
+    "Finally, using the wrapped optimizer object, you will call the training function to prune your model.\n",
+    "\n",
+    "If the kernel shuts down during training, this may be an out of memory error, to resolve this, try lowering the `batch_size` in the cell above."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Downloading a Recipe from SparseZoo\n",
+    "The [SparseZoo](https://github.com/neuralmagic/sparsezoo) API provides precofigured recipes for its optimized model.  In the cell below, you will download a recipe for pruning ResNet50 on the Imagenette dataset and record it's saved path."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sparsezoo import Zoo\n",
+    "\n",
+    "recipe = Zoo.download_recipe_from_stub(f\"{zoo_checkpoint_path}?recipe_type=transfer\")\n",
+    "print(f\"Recipe downloaded to: {recipe_path}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sparseml.pytorch.optim import ScheduledModifierManager, ScheduledOptimizer\n",
+    "\n",
+    "# create ScheduledModifierManager and Optimizer wrapper\n",
+    "manager = ScheduledModifierManager.from_yaml(recipe_path)\n",
+    "optimizer = ScheduledOptimizer(\n",
+    "    optimizer,\n",
+    "    model,\n",
+    "    manager,\n",
+    "    steps_per_epoch=len(train_loader),\n",
+    "    loggers=[],\n",
+    ")\n",
+    "\n",
+    "\n",
+    "# Run model pruning\n",
+    "epoch = manager.min_epochs\n",
+    "for epoch in range(manager.max_epochs):\n",
+    "    # run training loop\n",
+    "    epoch_name = f\"{epoch + 1}/{manager.max_epochs}\"\n",
+    "    print(f\"Running Training Epoch {epoch_name}\")\n",
+    "    train_loss, train_acc = run_model_one_epoch(\n",
+    "        model, train_loader, criterion, device, train=True, optimizer=optimizer\n",
+    "    )\n",
+    "    print(\n",
+    "        f\"Training Epoch: {epoch_name}\\nTraining Loss: {train_loss}\\nTop 1 Acc: {train_acc}\\n\"\n",
+    "    )\n",
+    "\n",
+    "    # run validation loop\n",
+    "    print(f\"Running Validation Epoch {epoch_name}\")\n",
+    "    val_loss, val_acc = run_model_one_epoch(model, train_loader, criterion, device)\n",
+    "    print(\n",
+    "        f\"Validation Epoch: {epoch_name}\\nVal Loss: {val_loss}\\nTop 1 Acc: {val_acc}\\n\"\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6 - View Model Sparsity\n",
+    "To see the effects of sparse quantized transfer learning, in this step, you will print out the sparsities of each Conv and FC layer in your model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sparseml.pytorch.utils import get_prunable_layers, tensor_sparsity\n",
+    "\n",
+    "# print sparsities of each layer\n",
+    "for (name, layer) in get_prunable_layers(model):\n",
+    "    print(f\"{name}.weight: {tensor_sparsity(layer.weight).item():.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 7 - Exporting to ONNX\n",
+    "\n",
+    "Now that the model is fully recalibrated, you need to export it to an ONNX format, which is the format used by the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse). For PyTorch, exporting to ONNX is natively supported. In the cell block below, a convenience class, ModuleExporter(), is used to handle exporting.\n",
+    "\n",
+    "Additionally, PyTorch, exports a graph setup for quantization aware training (QAT) to ONNX. To run a fully quantized graph, you will need to convert these QAT operations to fully quantized INT8 operations.  SparseML provides the `quantize_torch_qat_export` helper function to perform this conversion.\n",
+    "\n",
+    "Once the model is saved as an ONNX ﬁle, it is ready to be used for inference with the DeepSparse Engine.  For saving a custom model, you can override the sample batch for ONNX graph freezing and locations to save to."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from sparseml.pytorch.utils import ModuleExporter\n",
+    "from sparseml.pytorch.optim.quantization import quantize_torch_qat_export\n",
+    "\n",
+    "save_dir = \"pytorch_sparse_quantized_transfer_learning\"\n",
+    "qat_onnx_graph_name = \"resnet50_imagenette_pruned_qat.onnx\"\n",
+    "quantized_onnx_path = os.path.join(save_dir, \"resnet50_imagenette_pruned_quant.onnx\")\n",
+    "\n",
+    "exporter = ModuleExporter(model, output_dir=save_dir)\n",
+    "exporter.export_pytorch(name=\"resnet50_imagenette_pruned_qat.pth\")\n",
+    "exporter.export_onnx(\n",
+    "    torch.randn(1, 3, 224, 224), name=qat_onnx_graph_name\n",
+    ")\n",
+    "\n",
+    "\n",
+    "# convert QAT graph to fully quantized operators\n",
+    "quantize_torch_qat_export(os.path.join(save_dir, qat_onnx_graph_name), output_file_path=quantized_onnx_path)\n",
+    "\n",
+    "print(f\"Sparse-Quantized ONNX model saved to {quantized_onnx_path}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 8 - Benchmarking\n",
+    "\n",
+    "Finally, to see the total effect of these optimizations, you will benchmark an unoptimized, dense ResNet50 model from SparseZoo against your sparse quantized model using the `deepsparse` API.\n",
+    "\n",
+    "Note, in order to view speedup from quantization, your CPU must run VNNI instructions.  In the cell before benchmarking is run, you will detect if these instructions are available on your CPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from deepsparse.cpu import cpu_architecture\n",
+    "\n",
+    "if cpu_architecture()[\"vnni\"]:\n",
+    "    print(\"VNNI extensions detected, model will run with quantized speedups\")\n",
+    "else:\n",
+    "    print(\n",
+    "        \"WARNING: No VNNI extensions detected. Your model will not run with \"\n",
+    "        \"quantized speedups which will affect benchmarking\"\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy\n",
+    "from deepsparse import benchmark_model\n",
+    "\n",
+    "BATCH_SIZE = 64\n",
+    "NUM_CORES = None  # maximum number of cores available\n",
+    "NUM_ITERATIONS = 100\n",
+    "NUM_WARMUP_ITERATIONS = 20\n",
+    "\n",
+    "\n",
+    "def benchmark_imagenette_model(model_name, model_path):\n",
+    "    print(\n",
+    "        f\"Benchmarking {model_name} for {NUM_ITERATIONS} iterations at batch \"\n",
+    "        f\"size {BATCH_SIZE} with {NUM_CORES} CPU cores\"\n",
+    "    )\n",
+    "    sample_input = [\n",
+    "        numpy.ascontiguousarray(\n",
+    "            numpy.random.randn(BATCH_SIZE, 3, 224, 224).astype(numpy.float32)\n",
+    "        )\n",
+    "    ]\n",
+    "\n",
+    "    results = benchmark_model(\n",
+    "        model=model_path,\n",
+    "        inp=sample_input,\n",
+    "        batch_size=BATCH_SIZE,\n",
+    "        num_cores=NUM_CORES,\n",
+    "        num_iterations=NUM_ITERATIONS,\n",
+    "        num_warmup_iterations=NUM_WARMUP_ITERATIONS,\n",
+    "        show_progress=True,\n",
+    "    )\n",
+    "    print(f\"results:\\n{results}\")\n",
+    "    return results\n",
+    "\n",
+    "\n",
+    "# base ResNet50 Imagenette model downloaded from SparseZoo\n",
+    "base_results = benchmark_imagenette_model(\n",
+    "    \"ResNet50 Imagenette Base\",\n",
+    "    \"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenette/base-none\"\n",
+    ")\n",
+    "\n",
+    "optimized_results = benchmark_imagenette_model(\n",
+    "    \"ResNet50 Imagenette pruned-quantized\", quantized_onnx_path\n",
+    ")\n",
+    "\n",
+    "speed_up = base_results.ms_per_batch / optimized_results.ms_per_batch\n",
+    "print(f\"Speed-up from sparse quantized transfer learning: {speed_up}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Next Steps\n",
+    "\n",
+    "Congratulations, you have created a sparse quantized model and exported it to ONNX for inference!  Next steps you can pursue include:\n",
+    "* Transfer learning, pruning, or quantizing different models using SparseML\n",
+    "* Trying different pruning and optimization recipes\n",
+    "* Benchmarking other models on the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/src/sparseml/pytorch/models/external/torchvision.py b/src/sparseml/pytorch/models/external/torchvision.py
index 09e82748656..720430fa319 100644
--- a/src/sparseml/pytorch/models/external/torchvision.py
+++ b/src/sparseml/pytorch/models/external/torchvision.py
@@ -80,7 +80,10 @@ def wrapper(
     ):
         """
         :param pretrained_path: A path to the pretrained weights to load,
-            if provided will override the pretrained param
+            if provided will override the pretrained param. May also be
+            a SparseZoo stub path preceded by 'zoo:' with the optional
+            `?recipe_type=` argument. If given a recipe type, the base
+                model weights for that recipe will be loaded
         :param pretrained: True to load the default pretrained weights,
             a string to load a specific pretrained weight
             (ex: base, pruned-moderate),
diff --git a/src/sparseml/pytorch/models/registry.py b/src/sparseml/pytorch/models/registry.py
index c3a0d34f255..f60d86af647 100644
--- a/src/sparseml/pytorch/models/registry.py
+++ b/src/sparseml/pytorch/models/registry.py
@@ -308,7 +308,10 @@ def wrapper(
         ):
             """
             :param pretrained_path: A path to the pretrained weights to load,
-                if provided will override the pretrained param
+                if provided will override the pretrained param. May also be
+                a SparseZoo stub path preceded by 'zoo:' with the optional
+                `?recipe_type=` argument. If given a recipe type, the base
+                model weights for that recipe will be loaded
             :param pretrained: True to load the default pretrained weights,
                 a string to load a specific pretrained weight
                 (ex: base, optim, optim-perf),
diff --git a/src/sparseml/pytorch/optim/quantization/quantize_qat_export.py b/src/sparseml/pytorch/optim/quantization/quantize_qat_export.py
index 44e172385c3..e4b87f46dd7 100644
--- a/src/sparseml/pytorch/optim/quantization/quantize_qat_export.py
+++ b/src/sparseml/pytorch/optim/quantization/quantize_qat_export.py
@@ -569,9 +569,14 @@ def _remove_duplicate_quantize__ops(model: ModelProto):
             remove_node_and_params_from_graph(model, remove_node)
 
 
-def quantize_torch_qat_export(model: ModelProto, inplace: bool = True) -> ModelProto:
+def quantize_torch_qat_export(
+    model: Union[ModelProto, str],
+    output_file_path: Union[str, None] = None,
+    inplace: bool = True,
+) -> ModelProto:
     """
-    :param model: The model to convert
+    :param model: The model to convert, or a file path to it
+    :param output_file_path: File path to save the converted model to
     :param inplace: If true, does conversion of model in place. Default is true
     :return: Converts a model exported from a torch QAT session from a QAT graph with
         fake quantize ops surrounding operations to a quantized graph with quantized
@@ -589,4 +594,7 @@ def quantize_torch_qat_export(model: ModelProto, inplace: bool = True) -> ModelP
     quantize_resnet_identity_add_inputs(model)
     _remove_duplicate_quantize__ops(model)
 
+    if output_file_path:
+        onnx.save(model, output_file_path)
+
     return model
diff --git a/src/sparseml/pytorch/utils/model.py b/src/sparseml/pytorch/utils/model.py
index d3b87bcf396..09e654f710f 100644
--- a/src/sparseml/pytorch/utils/model.py
+++ b/src/sparseml/pytorch/utils/model.py
@@ -24,6 +24,7 @@
 from torch.optim.optimizer import Optimizer
 
 from sparseml.utils.helpers import create_parent_dirs
+from sparsezoo import Zoo
 
 
 try:
@@ -57,7 +58,10 @@ def load_model(
     """
     Load the state dict into a model from a given file.
 
-    :param path: the path to the pth file to load the state dict from
+    :param path: the path to the pth file to load the state dict from.
+        May also be a SparseZoo stub path preceded by 'zoo:' with the optional
+        `?recipe_type=` argument. If given a recipe type, the base model weights
+        for that recipe will be loaded.
     :param model: the model to load the state dict into
     :param strict: True to enforce that all tensors match between the model
         and the file; False otherwise
@@ -67,6 +71,15 @@ def load_model(
         look like they came from DataParallel type setup (start with module.).
         This removes "module." all keys
     """
+    if path.startswith("zoo:"):
+        if "recipe_type=" in path:
+            path = Zoo.download_recipe_base_framework_files(path, extensions=[".pth"])[
+                0
+            ]
+        else:
+            path = Zoo.load_model_from_stub(path).download_framework_files(
+                extensions=[".pth"]
+            )[0]
     model_dict = torch.load(path, map_location="cpu")
     current_dict = model.state_dict()
 

From 7f582fbbda84282523df69260a5ec8898d1f3920 Mon Sep 17 00:00:00 2001
From: Benjamin <ben@neuralmagic.com>
Date: Tue, 23 Feb 2021 11:24:03 -0500
Subject: [PATCH 2/8] responding to eng review

---
 ...h_sparse_quantized_transfer_learning.ipynb | 69 +++++++++----------
 1 file changed, 31 insertions(+), 38 deletions(-)

diff --git a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
index 23271c4582a..f5584f1f071 100644
--- a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
+++ b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
@@ -6,13 +6,15 @@
    "source": [
     "<sub>&copy; 2021-present Neuralmagic, Inc. // [Neural Magic Legal](https://neuralmagic.com/legal)</sub> \n",
     "\n",
-    "# Sparse Quantized Transfer Learning in PyTorch using SparseML\n",
+    "# Sparse-Quantized Transfer Learning in PyTorch using SparseML\n",
     "\n",
-    "This notebook provides a step-by-step walkthrough for creating a performant sparse quantized model\n",
-    "by transfer learning the pre-optimized structure from an already sparse quantized model.  Sparse quantized models take advatage of both block pruning to reduce model parameters and INT8 quantization to reduce computation cost.  Using these optimizations, your model will obtain better performance at inference time using the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse).\n",
+    "This notebook provides a step-by-step walkthrough for creating a performant sparse-quantized model\n",
+    "by transfer learning the pruned structure from an already sparse-quantized model.\n",
     "\n",
-    "Sparse quantized transfer learning takes two steps: First, fine-tune a pre-trained sparse model for the\n",
-    "transfer dataset while maintaining the pre-trianed sparsity structure.  Second, perform [quantization aware training](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training) to quantize the now sparse model while still holding the same sparsity structure.  [SparseML](https://github.com/neuralmagic/sparseml) recipes make it easy to perform these optimizations.\n",
+    "Sparse-quantized models combine [pruning](https://neuralmagic.com/blog/pruning-overview/) and [quantization](https://arxiv.org/abs/1609.07061) to reduce both the number of parameters and the precision of the remaining parameters to significantly increase the performance of neural networks. Using these optimizations, your model will obtain significantly better (around 7x vs unoptimized) performance at inference time using the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse).\n",
+    "\n",
+    "Sparse-quantized transfer learning takes two steps: First, fine-tune a pre-trained sparse model for the\n",
+    "transfer dataset while maintaining the pre-trained sparsity structure.  Second, perform [quantization aware training](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training) to quantize the now sparse model while still holding the same sparsity structure.  [SparseML](https://github.com/neuralmagic/sparseml) recipes make it easy to perform these optimizations.\n",
     "\n",
     "In this notebook, you will:\n",
     "- Set up the model and dataset\n",
@@ -20,7 +22,7 @@
     "- Integrate the PyTorch flow with SparseML for transfer learning\n",
     "- Perform sparse transfer learning and quantization aware training using the PyTorch+SparseML flow\n",
     "- Export to [ONNX](https://onnx.ai/) and convert the model from a QAT\n",
-    "- Compare DeepSparse engine benchmarks of the final sparse quantized model to an unoptimized model\n",
+    "- Compare DeepSparse engine benchmarks of the final sparse-quantized model to an unoptimized model\n",
     "\n",
     "Reading through this notebook will be reasonably quick to gain an intuition for how to plug SparseML into your PyTorch training flow for transfer learning and generically. Rough time estimates for fully pruning the default model are given. Note that training with the PyTorch CPU implementation will be much slower than a GPU:\n",
     "- 30 minutes on a GPU\n",
@@ -62,17 +64,9 @@
    "source": [
     "## Step 2 - Setting Up the Model and Dataset\n",
     "\n",
-    "By default, you will transfer learn from a sparse quantized [ResNet50](https://arxiv.org/abs/1512.03385) model trained on the [ImageNet dataset](http://www.image-net.org/) to the much smaller [Imagenette dataset](https://github.com/fastai/imagenette). The transfer learning weights are downloaded from the [SparseZoo](https://github.com/neuralmagic/sparsezoo) model repo.   The Imagenette dataset is downloaded from its repository via a helper class from SparseML.\n",
-    "\n",
-    "When loading weights for transfer learning classification models, it is standard to override the final classifier layer to fit the output shape for the new dataset.  In the example below, this is done by specifying `ignore_error_tensors` as the weights that will be initialzed for the new model.  In other flows this could be accomplished by setting `model.classifier.fc = torch.nn.Linear(...)`.\n",
-    "\n",
-    "If you would like to try out your model for pruning, modify the appropriate lines for your model and dataset, speciﬁcally:\n",
-    "- checkpoint_path = ...\n",
-    "- model = ModelRegistry.create(...)\n",
-    "- train_dataset = ImagenetteDataset(...)\n",
-    "- val_dataset = ImagenetteDataset(...)\n",
+    "By default, you will transfer learn from a sparse-quantized [ResNet50](https://arxiv.org/abs/1512.03385) model trained on the [ImageNet dataset](http://www.image-net.org/) to the much smaller [Imagenette dataset](https://github.com/fastai/imagenette). The transfer learning weights are downloaded from the [SparseZoo](https://github.com/neuralmagic/sparsezoo) model repo.   The Imagenette dataset is downloaded from its repository via a helper class from SparseML.\n",
     "\n",
-    "Take care to keep the variable names the same, as the rest of the notebook is set up according to those and update any parts of the training flow as needed."
+    "When loading weights for transfer learning classification models, it is standard to override the final classifier layer to fit the output shape for the new dataset.  In the example below, this is done by specifying `ignore_error_tensors` as the weights that will be initialzed for the new model.  In other flows this could be accomplished by setting `model.classifier.fc = torch.nn.Linear(...)`."
    ]
   },
   {
@@ -89,7 +83,7 @@
     "# Define your model below\n",
     "#######################################################\n",
     "print(\"loading model...\")\n",
-    "# SparseZoo stub to pretrained sparse quantized ResNet50 for imagenet dataset\n",
+    "# SparseZoo stub to pretrained sparse-quantized ResNet50 for imagenet dataset\n",
     "zoo_checkpoint_path = (\n",
     "    \"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned_quant-moderate\"\n",
     ")\n",
@@ -219,7 +213,7 @@
    "source": [
     "## Step 5 - Apply a SparseML Recipe and Prune Model\n",
     "\n",
-    "To run sparse quantized transfer learning with SparseML, you will download a transfer learning recipe from SparseZoo and use it to create a `ScheduledModifierManager` object.  This manager will be used to wrap the optimizer object to maintain the pre-optimized model's sparsity structure while learning weights for the new dataset as well as performing quantization aware training.\n",
+    "To run sparse-quantized transfer learning with SparseML, you will download a transfer learning recipe from SparseZoo and use it to create a `ScheduledModifierManager` object.  This manager will be used to wrap the optimizer object to maintain the pre-optimized model's sparsity structure while learning weights for the new dataset as well as performing quantization aware training.\n",
     "\n",
     "You can create SparseML recipes to perform various model pruning schedules, quantization aware training, sparse transfer learning, and more.  If you are using a different model than the default, you will have to modify the recipe  file to match the new target's parameters.\n",
     "\n",
@@ -293,7 +287,7 @@
    "metadata": {},
    "source": [
     "## Step 6 - View Model Sparsity\n",
-    "To see the effects of sparse quantized transfer learning, in this step, you will print out the sparsities of each Conv and FC layer in your model."
+    "To see the effects of sparse-quantized transfer learning, in this step, you will print out the sparsities of each Conv and FC layer in your model."
    ]
   },
   {
@@ -315,11 +309,15 @@
    "source": [
     "## Step 7 - Exporting to ONNX\n",
     "\n",
-    "Now that the model is fully recalibrated, you need to export it to an ONNX format, which is the format used by the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse). For PyTorch, exporting to ONNX is natively supported. In the cell block below, a convenience class, ModuleExporter(), is used to handle exporting.\n",
+    "Now that the sparse-quantized transfer learning is complete, it should be prepped for inference.  A common next step for inference is exporting the model to ONNX.  This is also the format used by the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse) to achieve the sparse-quantized speedups.\n",
+    "\n",
+    "For PyTorch, exporting to ONNX is natively supported. In the cell block below, a convenience class, ModuleExporter(), is used to handle exporting.\n",
     "\n",
     "Additionally, PyTorch, exports a graph setup for quantization aware training (QAT) to ONNX. To run a fully quantized graph, you will need to convert these QAT operations to fully quantized INT8 operations.  SparseML provides the `quantize_torch_qat_export` helper function to perform this conversion.\n",
     "\n",
-    "Once the model is saved as an ONNX ﬁle, it is ready to be used for inference with the DeepSparse Engine.  For saving a custom model, you can override the sample batch for ONNX graph freezing and locations to save to."
+    "Once the model is saved as an ONNX ﬁle, it is ready to be used for inference with the DeepSparse Engine.  For saving a custom model, you can override the sample batch for ONNX graph freezing and locations to save to.\n",
+    "\n",
+    "If exporting the model only to PyTorch for inference, the graph can be converted to fully quantized in PyTorch only using `torch.quantization.convert`, however the resulting model will not be compatible with ONNX conversion."
    ]
   },
   {
@@ -355,9 +353,9 @@
    "source": [
     "## Step 8 - Benchmarking\n",
     "\n",
-    "Finally, to see the total effect of these optimizations, you will benchmark an unoptimized, dense ResNet50 model from SparseZoo against your sparse quantized model using the `deepsparse` API.\n",
+    "Finally, to see the total effect of these optimizations, you will benchmark an unoptimized, dense ResNet50 model from SparseZoo against your sparse-quantized model using the `deepsparse` API.\n",
     "\n",
-    "Note, in order to view speedup from quantization, your CPU must run VNNI instructions.  In the cell before benchmarking is run, you will detect if these instructions are available on your CPU."
+    "Note, in order to view speedup from quantization, your CPU must run VNNI instructions.  The benchmarking cell below contains a check for VNNI instructions and will log a warning if they are not detected.  You can learn more about DeepSparse hardware compatibility [here](https://docs.neuralmagic.com/deepsparse/hardware.html)."
    ]
   },
   {
@@ -366,25 +364,20 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "import numpy\n",
+    "from deepsparse import benchmark_model\n",
     "from deepsparse.cpu import cpu_architecture\n",
     "\n",
+    "\n",
+    "# check VNNI\n",
     "if cpu_architecture()[\"vnni\"]:\n",
-    "    print(\"VNNI extensions detected, model will run with quantized speedups\")\n",
+    "    print(\"VNNI extensions detected, model will run with quantized speedups\\n\")\n",
     "else:\n",
     "    print(\n",
     "        \"WARNING: No VNNI extensions detected. Your model will not run with \"\n",
-    "        \"quantized speedups which will affect benchmarking\"\n",
-    "    )"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import numpy\n",
-    "from deepsparse import benchmark_model\n",
+    "        \"quantized speedups which will affect benchmarking\\n\"\n",
+    "    )\n",
+    "\n",
     "\n",
     "BATCH_SIZE = 64\n",
     "NUM_CORES = None  # maximum number of cores available\n",
@@ -427,7 +420,7 @@
     ")\n",
     "\n",
     "speed_up = base_results.ms_per_batch / optimized_results.ms_per_batch\n",
-    "print(f\"Speed-up from sparse quantized transfer learning: {speed_up}\")"
+    "print(f\"Speed-up from sparse-quantized transfer learning: {speed_up}\")"
    ]
   },
   {
@@ -436,7 +429,7 @@
    "source": [
     "## Next Steps\n",
     "\n",
-    "Congratulations, you have created a sparse quantized model and exported it to ONNX for inference!  Next steps you can pursue include:\n",
+    "Congratulations, you have created a sparse-quantized model and exported it to ONNX for inference!  Next steps you can pursue include:\n",
     "* Transfer learning, pruning, or quantizing different models using SparseML\n",
     "* Trying different pruning and optimization recipes\n",
     "* Benchmarking other models on the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse)"

From a33d3ca0ef0a55696051179475cccf62e77b87c0 Mon Sep 17 00:00:00 2001
From: Benjamin <ben@neuralmagic.com>
Date: Tue, 23 Feb 2021 12:49:43 -0500
Subject: [PATCH 3/8] update recipe type

---
 notebooks/pytorch_sparse_quantized_transfer_learning.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
index f5584f1f071..bc8e2fa1ba5 100644
--- a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
+++ b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
@@ -238,7 +238,7 @@
    "source": [
     "from sparsezoo import Zoo\n",
     "\n",
-    "recipe = Zoo.download_recipe_from_stub(f\"{zoo_checkpoint_path}?recipe_type=transfer\")\n",
+    "recipe = Zoo.download_recipe_from_stub(f\"{zoo_checkpoint_path}?recipe_type=transfer_learn\")\n",
     "print(f\"Recipe downloaded to: {recipe_path}\")"
    ]
   },

From d30834b11ef79db960f8a4e735b8d6d46fce00ff Mon Sep 17 00:00:00 2001
From: Benjamin <ben@neuralmagic.com>
Date: Tue, 23 Feb 2021 13:21:44 -0500
Subject: [PATCH 4/8] updating zoo stub path

---
 .../pytorch_sparse_quantized_transfer_learning.ipynb     | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
index bc8e2fa1ba5..6c8c79f125e 100644
--- a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
+++ b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
@@ -14,7 +14,7 @@
     "Sparse-quantized models combine [pruning](https://neuralmagic.com/blog/pruning-overview/) and [quantization](https://arxiv.org/abs/1609.07061) to reduce both the number of parameters and the precision of the remaining parameters to significantly increase the performance of neural networks. Using these optimizations, your model will obtain significantly better (around 7x vs unoptimized) performance at inference time using the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse).\n",
     "\n",
     "Sparse-quantized transfer learning takes two steps: First, fine-tune a pre-trained sparse model for the\n",
-    "transfer dataset while maintaining the pre-trained sparsity structure.  Second, perform [quantization aware training](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training) to quantize the now sparse model while still holding the same sparsity structure.  [SparseML](https://github.com/neuralmagic/sparseml) recipes make it easy to perform these optimizations.\n",
+    "transfer dataset while maintaining the pre-trianed sparsity structure.  Second, perform [quantization aware training](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training) to quantize the now sparse model while still holding the same sparsity structure.  [SparseML](https://github.com/neuralmagic/sparseml) recipes make it easy to perform these optimizations.\n",
     "\n",
     "In this notebook, you will:\n",
     "- Set up the model and dataset\n",
@@ -84,12 +84,13 @@
     "#######################################################\n",
     "print(\"loading model...\")\n",
     "# SparseZoo stub to pretrained sparse-quantized ResNet50 for imagenet dataset\n",
-    "zoo_checkpoint_path = (\n",
+    "zoo_stub_path = (\n",
     "    \"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned_quant-moderate\"\n",
+    "    \"?recipe_type=transfer_learn\"\n",
     ")\n",
     "model = ModelRegistry.create(\n",
     "    key=\"resnet50\",\n",
-    "    pretrained_path=zoo_checkpoint_path,\n",
+    "    pretrained_path=zoo_stub_path,\n",
     "    pretrained_dataset=\"imagenette\",\n",
     "    num_classes=10,\n",
     "    ignore_error_tensors=[\"classifier.fc.weight\", \"classifier.fc.bias\"],\n",
@@ -238,7 +239,7 @@
    "source": [
     "from sparsezoo import Zoo\n",
     "\n",
-    "recipe = Zoo.download_recipe_from_stub(f\"{zoo_checkpoint_path}?recipe_type=transfer_learn\")\n",
+    "recipe = Zoo.download_recipe_from_stub(zoo_stub_path)\n",
     "print(f\"Recipe downloaded to: {recipe_path}\")"
    ]
   },

From 2656d52dd16820d27b48a68d3668391deab1bac0 Mon Sep 17 00:00:00 2001
From: Benjamin <ben@neuralmagic.com>
Date: Tue, 23 Feb 2021 18:37:23 -0500
Subject: [PATCH 5/8] doc review updates

---
 ...h_sparse_quantized_transfer_learning.ipynb | 43 ++++++++++---------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
index 6c8c79f125e..38e99882a1c 100644
--- a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
+++ b/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
@@ -11,18 +11,19 @@
     "This notebook provides a step-by-step walkthrough for creating a performant sparse-quantized model\n",
     "by transfer learning the pruned structure from an already sparse-quantized model.\n",
     "\n",
-    "Sparse-quantized models combine [pruning](https://neuralmagic.com/blog/pruning-overview/) and [quantization](https://arxiv.org/abs/1609.07061) to reduce both the number of parameters and the precision of the remaining parameters to significantly increase the performance of neural networks. Using these optimizations, your model will obtain significantly better (around 7x vs unoptimized) performance at inference time using the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse).\n",
+    "Sparse-quantized models combine [pruning](https://neuralmagic.com/blog/pruning-overview/) and [quantization](https://arxiv.org/abs/1609.07061) to reduce both the number of parameters and the precision of the remaining parameters to significantly increase the performance of neural networks. Using these optimizations, your model will obtain significantly better (around 7x vs. unoptimized) performance at inference time using the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse).\n",
     "\n",
-    "Sparse-quantized transfer learning takes two steps: First, fine-tune a pre-trained sparse model for the\n",
-    "transfer dataset while maintaining the pre-trianed sparsity structure.  Second, perform [quantization aware training](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training) to quantize the now sparse model while still holding the same sparsity structure.  [SparseML](https://github.com/neuralmagic/sparseml) recipes make it easy to perform these optimizations.\n",
+    "Sparse-quantized transfer learning takes two steps. [SparseML](https://github.com/neuralmagic/sparseml) recipes make it easy to perform these optimizations:\n",
+    "- First, fine-tune a pre-trained sparse model for the transfer dataset while maintaining the pre-trained sparsity structure.\n",
+    "- Second, perform [quantization-aware training (QAT)](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training) to quantize the now sparse model while still holding the same sparsity structure.\n",
     "\n",
     "In this notebook, you will:\n",
     "- Set up the model and dataset\n",
     "- Define a generic PyTorch training flow\n",
     "- Integrate the PyTorch flow with SparseML for transfer learning\n",
-    "- Perform sparse transfer learning and quantization aware training using the PyTorch+SparseML flow\n",
+    "- Perform sparse transfer learning and quantization-aware training using the PyTorch and SparseML flow\n",
     "- Export to [ONNX](https://onnx.ai/) and convert the model from a QAT\n",
-    "- Compare DeepSparse engine benchmarks of the final sparse-quantized model to an unoptimized model\n",
+    "- Compare DeepSparse Engine benchmarks of the final sparse-quantized model to an unoptimized model\n",
     "\n",
     "Reading through this notebook will be reasonably quick to gain an intuition for how to plug SparseML into your PyTorch training flow for transfer learning and generically. Rough time estimates for fully pruning the default model are given. Note that training with the PyTorch CPU implementation will be much slower than a GPU:\n",
     "- 30 minutes on a GPU\n",
@@ -64,7 +65,7 @@
    "source": [
     "## Step 2 - Setting Up the Model and Dataset\n",
     "\n",
-    "By default, you will transfer learn from a sparse-quantized [ResNet50](https://arxiv.org/abs/1512.03385) model trained on the [ImageNet dataset](http://www.image-net.org/) to the much smaller [Imagenette dataset](https://github.com/fastai/imagenette). The transfer learning weights are downloaded from the [SparseZoo](https://github.com/neuralmagic/sparsezoo) model repo.   The Imagenette dataset is downloaded from its repository via a helper class from SparseML.\n",
+    "By default, you will transfer learn from a sparse-quantized [ResNet-50](https://arxiv.org/abs/1512.03385) model trained on the [ImageNet dataset](http://www.image-net.org/) to the much smaller [Imagenette dataset](https://github.com/fastai/imagenette). The transfer learning weights are downloaded from the [SparseZoo](https://github.com/neuralmagic/sparsezoo) model repository.   The Imagenette dataset is downloaded from its repository via a helper class from SparseML.\n",
     "\n",
     "When loading weights for transfer learning classification models, it is standard to override the final classifier layer to fit the output shape for the new dataset.  In the example below, this is done by specifying `ignore_error_tensors` as the weights that will be initialzed for the new model.  In other flows this could be accomplished by setting `model.classifier.fc = torch.nn.Linear(...)`."
    ]
@@ -83,7 +84,7 @@
     "# Define your model below\n",
     "#######################################################\n",
     "print(\"loading model...\")\n",
-    "# SparseZoo stub to pretrained sparse-quantized ResNet50 for imagenet dataset\n",
+    "# SparseZoo stub to pre-trained sparse-quantized ResNet-50 for imagenet dataset\n",
     "zoo_stub_path = (\n",
     "    \"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned_quant-moderate\"\n",
     "    \"?recipe_type=transfer_learn\"\n",
@@ -119,7 +120,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Step 3 - Set Up a PyTorch Training Loop\n",
+    "## Step 3 - Creating a PyTorch Training Loop\n",
     "SparseML can plug directly into your existing PyTorch training flow by overriding the Optimizer object. To demonstrate this, in the cell below, we define a simple PyTorch training loop adapted from [here](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html).  To prune and quantize your existing models using SparseML, you can use your own training flow."
    ]
   },
@@ -173,7 +174,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Step 4 - Set Up PyTorch Training Objects\n",
+    "## Step 4 - Building PyTorch Training Objects\n",
     "In this step, you will select hyperparameters, a device to train your model with, set up DataLoader objects, a loss function, and optimizer.  All of these variables and objects can be replaced to fit your training flow."
    ]
   },
@@ -212,15 +213,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Step 5 - Apply a SparseML Recipe and Prune Model\n",
+    "## Step 5 - Running Sparse-Quantized Transfer Learning with a SparseML Recipe\n",
     "\n",
-    "To run sparse-quantized transfer learning with SparseML, you will download a transfer learning recipe from SparseZoo and use it to create a `ScheduledModifierManager` object.  This manager will be used to wrap the optimizer object to maintain the pre-optimized model's sparsity structure while learning weights for the new dataset as well as performing quantization aware training.\n",
+    "To run sparse-quantized transfer learning with SparseML, you will download a transfer learning recipe from SparseZoo and use it to create a `ScheduledModifierManager` object.  This manager will be used to wrap the optimizer object to maintain the pre-optimized model's sparsity structure while learning weights for the new dataset as well as performing quantization-aware training (QAT).\n",
     "\n",
-    "You can create SparseML recipes to perform various model pruning schedules, quantization aware training, sparse transfer learning, and more.  If you are using a different model than the default, you will have to modify the recipe  file to match the new target's parameters.\n",
+    "You can create SparseML recipes to perform various model pruning schedules, QAT, sparse transfer learning, and more.  If you are using a different model than the default, you will have to modify the recipe  file to match the new target's parameters.\n",
     "\n",
     "Finally, using the wrapped optimizer object, you will call the training function to prune your model.\n",
     "\n",
-    "If the kernel shuts down during training, this may be an out of memory error, to resolve this, try lowering the `batch_size` in the cell above."
+    "If the kernel shuts down during training, this may be an out of memory error; to resolve this, try lowering the `batch_size` in the cell above."
    ]
   },
   {
@@ -228,7 +229,7 @@
    "metadata": {},
    "source": [
     "#### Downloading a Recipe from SparseZoo\n",
-    "The [SparseZoo](https://github.com/neuralmagic/sparsezoo) API provides precofigured recipes for its optimized model.  In the cell below, you will download a recipe for pruning ResNet50 on the Imagenette dataset and record it's saved path."
+    "The [SparseZoo](https://github.com/neuralmagic/sparsezoo) API provides preconfigured recipes for its optimized model.  In the cell below, you will download a recipe for pruning ResNet-50 on the Imagenette dataset and record its saved path."
    ]
   },
   {
@@ -287,7 +288,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Step 6 - View Model Sparsity\n",
+    "## Step 6 - Viewing Model Sparsity\n",
     "To see the effects of sparse-quantized transfer learning, in this step, you will print out the sparsities of each Conv and FC layer in your model."
    ]
   },
@@ -314,9 +315,9 @@
     "\n",
     "For PyTorch, exporting to ONNX is natively supported. In the cell block below, a convenience class, ModuleExporter(), is used to handle exporting.\n",
     "\n",
-    "Additionally, PyTorch, exports a graph setup for quantization aware training (QAT) to ONNX. To run a fully quantized graph, you will need to convert these QAT operations to fully quantized INT8 operations.  SparseML provides the `quantize_torch_qat_export` helper function to perform this conversion.\n",
+    "Additionally, PyTorch, exports a graph setup for quantization-aware training (QAT) to ONNX. To run a fully quantized graph, you will need to convert these QAT operations to fully quantized INT8 operations.  SparseML provides the `quantize_torch_qat_export` helper function to perform this conversion.\n",
     "\n",
-    "Once the model is saved as an ONNX ﬁle, it is ready to be used for inference with the DeepSparse Engine.  For saving a custom model, you can override the sample batch for ONNX graph freezing and locations to save to.\n",
+    "Once the model is saved as an ONNX ﬁle, it is ready to be used for inference with the DeepSparse Engine.\n",
     "\n",
     "If exporting the model only to PyTorch for inference, the graph can be converted to fully quantized in PyTorch only using `torch.quantization.convert`, however the resulting model will not be compatible with ONNX conversion."
    ]
@@ -354,7 +355,7 @@
    "source": [
     "## Step 8 - Benchmarking\n",
     "\n",
-    "Finally, to see the total effect of these optimizations, you will benchmark an unoptimized, dense ResNet50 model from SparseZoo against your sparse-quantized model using the `deepsparse` API.\n",
+    "Finally, to see the total effect of these optimizations, you will benchmark an unoptimized, dense ResNet-50 model from SparseZoo against your sparse-quantized model using the `deepsparse` API.\n",
     "\n",
     "Note, in order to view speedup from quantization, your CPU must run VNNI instructions.  The benchmarking cell below contains a check for VNNI instructions and will log a warning if they are not detected.  You can learn more about DeepSparse hardware compatibility [here](https://docs.neuralmagic.com/deepsparse/hardware.html)."
    ]
@@ -410,14 +411,14 @@
     "    return results\n",
     "\n",
     "\n",
-    "# base ResNet50 Imagenette model downloaded from SparseZoo\n",
+    "# base ResNet-50 Imagenette model downloaded from SparseZoo\n",
     "base_results = benchmark_imagenette_model(\n",
-    "    \"ResNet50 Imagenette Base\",\n",
+    "    \"ResNet-50 Imagenette Base\",\n",
     "    \"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenette/base-none\"\n",
     ")\n",
     "\n",
     "optimized_results = benchmark_imagenette_model(\n",
-    "    \"ResNet50 Imagenette pruned-quantized\", quantized_onnx_path\n",
+    "    \"ResNet-50 Imagenette pruned-quantized\", quantized_onnx_path\n",
     ")\n",
     "\n",
     "speed_up = base_results.ms_per_batch / optimized_results.ms_per_batch\n",

From 1cdb6fe730e7c8e84d7082bacad739ca96986d13 Mon Sep 17 00:00:00 2001
From: Benjamin <ben@neuralmagic.com>
Date: Tue, 23 Feb 2021 19:55:40 -0500
Subject: [PATCH 6/8] blog style readme, moving to examples directory

---
 .../README.md                                 | 67 +++++++++++++++++++
 ...h_sparse_quantized_transfer_learning.ipynb |  0
 2 files changed, 67 insertions(+)
 create mode 100644 examples/pytorch_sparse_quantized_transfer_learning/README.md
 rename {notebooks => examples/pytorch_sparse_quantized_transfer_learning}/pytorch_sparse_quantized_transfer_learning.ipynb (100%)

diff --git a/examples/pytorch_sparse_quantized_transfer_learning/README.md b/examples/pytorch_sparse_quantized_transfer_learning/README.md
new file mode 100644
index 00000000000..5281c43497b
--- /dev/null
+++ b/examples/pytorch_sparse_quantized_transfer_learning/README.md
@@ -0,0 +1,67 @@
+# PyTorch Sparse-Quantized Transfer Learning with SparseML
+
+[Pruning](https://neuralmagic.com/blog/pruning-overview/) and
+[quantization](https://arxiv.org/abs/1609.07061) are well established methods for accelerating
+neural networks.  Individually, both methods yield significant speedups for CPU inference
+(a theoretical maximum of 4x for INT8 quantization) and can make CPU deployments an attractive
+option for real time model inference.
+
+Sparse-quantized models leverage both techniques and can achieve speedups upwards of 6-7x when using
+the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse) with
+[compatible hardware](https://docs.neuralmagic.com/deepsparse/hardware.html).
+
+Using powerful [SparseML](https://github.com/neuralmagic/sparseml) recipes, it is easy to create sparse-quantized models.
+Additionally, the SparseML team is actively creating pre-trained sparse-quantized models that maintain accuracy
+targets and achieve high CPU speedups - and it is easy to leverage these models for speedups with your own datasets
+using sparse-quantized transfer learning.
+
+Sparse-quantized transfer learning takes place in two phases:
+1. Sparse transfer learning \- fine tuning the pre-trained model with the new dataset
+while maintaining the existing pre-optimized sparsity structure.  This creates a model 
+that learns to predict a new task, while preserving the predetermined optimized structure
+from pruning.
+2. [Quantization-aware training](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training)
+\- emulating the effects of INT8 quantization while training the model to overcome the loss of precision
+
+
+## ResNet-50 Imagenette Example
+
+The [SparseZoo](https://github.com/neuralmagic/sparseml) hosts a sparse-quantized ResNet-50 model trained
+on the ImageNet dataset.  It maintains 99% of the baseline accuracy and can achieve over 6.5x
+speedup using the DeepSparse Engine.  There are multiple paths to explore sparse-quantized
+transfer learning with this model.
+
+### Notebook
+`sparseml/examples/pytorch_sparse_quantized_transfer_learning/pytorch_sparse_quantized_transfer_learning.ipynb`
+is a jupyter notebook that provides a step-by-step walk-through for
+ - setting up sparse-quantized transfer learning
+ - integrating SparseML with any PyTorch training flow
+ - ONNX export
+ - benchmarking with the DeepSparse Engine 
+ 
+Run `jupyter notebook` and navigate to this notebook file to run the example.
+
+### Script
+`sparseml/scripts/pytorch_vision.py` is a script for running tasks related to pruning and
+quantization with SparseML for image classification and object detection use cases.
+Using the following example command, you can run sparse-quantized transfer learning on a custom
+[ImageFolder](https://pytorch.org/vision/0.8/datasets.html#imagefolder) based
+classification dataset.
+
+Note that for datasets other than Imagenette, you may need to edit
+the recipe to better train for the dataset following instructions in the downloaded recipe card.
+
+```
+python scripts/pytorch_vision.py train \
+    --recipe-path zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned_quant-moderate?recipe_type=transfer_learn \
+    --checkpoint-path zoo \
+    --arch-key resnet50 \
+    --model-kwargs '{"ignore_error_tensors": ["classifier.fc.weight", "classifier.fc.bias"]}' \
+    --dataset imagefolder \
+    --dataset-path /PATH/TO/IMAGEFOLDER/DATASET  \
+    --train-batch-size 32 --test-batch-size 64 \
+    --loader-num-workers 8 \
+    --optim Adam \
+    --optim-args '{}' \
+    --model-tag resnet50-imagenette-pruned_quant-transfer_learned
+```
\ No newline at end of file
diff --git a/notebooks/pytorch_sparse_quantized_transfer_learning.ipynb b/examples/pytorch_sparse_quantized_transfer_learning/pytorch_sparse_quantized_transfer_learning.ipynb
similarity index 100%
rename from notebooks/pytorch_sparse_quantized_transfer_learning.ipynb
rename to examples/pytorch_sparse_quantized_transfer_learning/pytorch_sparse_quantized_transfer_learning.ipynb

From b937b8c534db7d28cc6c6795fcabf8c4a0aed120 Mon Sep 17 00:00:00 2001
From: Benjamin <ben@neuralmagic.com>
Date: Wed, 24 Feb 2021 12:43:26 -0500
Subject: [PATCH 7/8] adding examples/ to Makefile tasks, copyrighting,
 examples README

---
 Makefile                                      |  6 ++---
 examples/README.md                            | 22 +++++++++++++++++++
 .../README.md                                 | 16 ++++++++++++++
 3 files changed, 41 insertions(+), 3 deletions(-)
 create mode 100644 examples/README.md

diff --git a/Makefile b/Makefile
index a1ae518cb52..9035094a1c6 100644
--- a/Makefile
+++ b/Makefile
@@ -1,10 +1,10 @@
 .PHONY: build docs test
 
 BUILDDIR := $(PWD)
-CHECKDIRS := integrations notebooks scripts src tests utils setup.py
-CHECKGLOBS := 'integrations/**/*.py' 'scripts/**/*.py' 'src/**/*.py' 'tests/**/*.py' 'utils/**/*.py' setup.py
+CHECKDIRS := examples integrations notebooks scripts src tests utils setup.py
+CHECKGLOBS := 'examples/**/*.py' 'integrations/**/*.py' 'scripts/**/*.py' 'src/**/*.py' 'tests/**/*.py' 'utils/**/*.py' setup.py
 DOCDIR := docs
-MDCHECKGLOBS := 'docs/**/*.md' 'docs/**/*.rst' 'integrations/**/*.md' 'notebooks/**/*.md' 'scripts/**/*.md'
+MDCHECKGLOBS := 'docs/**/*.md' 'docs/**/*.rst' 'examples/**/*.md' 'integrations/**/*.md' 'notebooks/**/*.md' 'scripts/**/*.md'
 MDCHECKFILES := CODE_OF_CONDUCT.md CONTRIBUTING.md DEVELOPING.md README.md
 
 BUILD_ARGS :=  # set nightly to build nightly release
diff --git a/examples/README.md b/examples/README.md
new file mode 100644
index 00000000000..cabf0db25c2
--- /dev/null
+++ b/examples/README.md
@@ -0,0 +1,22 @@
+<!--
+Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Examples
+
+This directory contains self-documented examples of end-to-end workflows using SparseML
+and its companion libraries.  Open a Pull Request to
+[contribute](https://github.com/neuralmagic/sparseml/blob/main/CONTRIBUTING.md)
+your own.
diff --git a/examples/pytorch_sparse_quantized_transfer_learning/README.md b/examples/pytorch_sparse_quantized_transfer_learning/README.md
index 5281c43497b..9d67c28f2c3 100644
--- a/examples/pytorch_sparse_quantized_transfer_learning/README.md
+++ b/examples/pytorch_sparse_quantized_transfer_learning/README.md
@@ -1,3 +1,19 @@
+<!--
+Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
 # PyTorch Sparse-Quantized Transfer Learning with SparseML
 
 [Pruning](https://neuralmagic.com/blog/pruning-overview/) and

From 0d7b96d51f9844fab8deefffd2dc5d647c93b2b0 Mon Sep 17 00:00:00 2001
From: Benjamin <ben@neuralmagic.com>
Date: Wed, 24 Feb 2021 17:38:51 -0500
Subject: [PATCH 8/8] removing README for future update

---
 .../README.md                                 | 83 -------------------
 ...h_sparse_quantized_transfer_learning.ipynb |  2 +-
 2 files changed, 1 insertion(+), 84 deletions(-)
 delete mode 100644 examples/pytorch_sparse_quantized_transfer_learning/README.md

diff --git a/examples/pytorch_sparse_quantized_transfer_learning/README.md b/examples/pytorch_sparse_quantized_transfer_learning/README.md
deleted file mode 100644
index 9d67c28f2c3..00000000000
--- a/examples/pytorch_sparse_quantized_transfer_learning/README.md
+++ /dev/null
@@ -1,83 +0,0 @@
-<!--
-Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
-
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
--->
-
-# PyTorch Sparse-Quantized Transfer Learning with SparseML
-
-[Pruning](https://neuralmagic.com/blog/pruning-overview/) and
-[quantization](https://arxiv.org/abs/1609.07061) are well established methods for accelerating
-neural networks.  Individually, both methods yield significant speedups for CPU inference
-(a theoretical maximum of 4x for INT8 quantization) and can make CPU deployments an attractive
-option for real time model inference.
-
-Sparse-quantized models leverage both techniques and can achieve speedups upwards of 6-7x when using
-the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse) with
-[compatible hardware](https://docs.neuralmagic.com/deepsparse/hardware.html).
-
-Using powerful [SparseML](https://github.com/neuralmagic/sparseml) recipes, it is easy to create sparse-quantized models.
-Additionally, the SparseML team is actively creating pre-trained sparse-quantized models that maintain accuracy
-targets and achieve high CPU speedups - and it is easy to leverage these models for speedups with your own datasets
-using sparse-quantized transfer learning.
-
-Sparse-quantized transfer learning takes place in two phases:
-1. Sparse transfer learning \- fine tuning the pre-trained model with the new dataset
-while maintaining the existing pre-optimized sparsity structure.  This creates a model 
-that learns to predict a new task, while preserving the predetermined optimized structure
-from pruning.
-2. [Quantization-aware training](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training)
-\- emulating the effects of INT8 quantization while training the model to overcome the loss of precision
-
-
-## ResNet-50 Imagenette Example
-
-The [SparseZoo](https://github.com/neuralmagic/sparseml) hosts a sparse-quantized ResNet-50 model trained
-on the ImageNet dataset.  It maintains 99% of the baseline accuracy and can achieve over 6.5x
-speedup using the DeepSparse Engine.  There are multiple paths to explore sparse-quantized
-transfer learning with this model.
-
-### Notebook
-`sparseml/examples/pytorch_sparse_quantized_transfer_learning/pytorch_sparse_quantized_transfer_learning.ipynb`
-is a jupyter notebook that provides a step-by-step walk-through for
- - setting up sparse-quantized transfer learning
- - integrating SparseML with any PyTorch training flow
- - ONNX export
- - benchmarking with the DeepSparse Engine 
- 
-Run `jupyter notebook` and navigate to this notebook file to run the example.
-
-### Script
-`sparseml/scripts/pytorch_vision.py` is a script for running tasks related to pruning and
-quantization with SparseML for image classification and object detection use cases.
-Using the following example command, you can run sparse-quantized transfer learning on a custom
-[ImageFolder](https://pytorch.org/vision/0.8/datasets.html#imagefolder) based
-classification dataset.
-
-Note that for datasets other than Imagenette, you may need to edit
-the recipe to better train for the dataset following instructions in the downloaded recipe card.
-
-```
-python scripts/pytorch_vision.py train \
-    --recipe-path zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned_quant-moderate?recipe_type=transfer_learn \
-    --checkpoint-path zoo \
-    --arch-key resnet50 \
-    --model-kwargs '{"ignore_error_tensors": ["classifier.fc.weight", "classifier.fc.bias"]}' \
-    --dataset imagefolder \
-    --dataset-path /PATH/TO/IMAGEFOLDER/DATASET  \
-    --train-batch-size 32 --test-batch-size 64 \
-    --loader-num-workers 8 \
-    --optim Adam \
-    --optim-args '{}' \
-    --model-tag resnet50-imagenette-pruned_quant-transfer_learned
-```
\ No newline at end of file
diff --git a/examples/pytorch_sparse_quantized_transfer_learning/pytorch_sparse_quantized_transfer_learning.ipynb b/examples/pytorch_sparse_quantized_transfer_learning/pytorch_sparse_quantized_transfer_learning.ipynb
index 38e99882a1c..d748e0936a7 100644
--- a/examples/pytorch_sparse_quantized_transfer_learning/pytorch_sparse_quantized_transfer_learning.ipynb
+++ b/examples/pytorch_sparse_quantized_transfer_learning/pytorch_sparse_quantized_transfer_learning.ipynb
@@ -240,7 +240,7 @@
    "source": [
     "from sparsezoo import Zoo\n",
     "\n",
-    "recipe = Zoo.download_recipe_from_stub(zoo_stub_path)\n",
+    "recipe_path = Zoo.download_recipe_from_stub(zoo_stub_path)\n",
     "print(f\"Recipe downloaded to: {recipe_path}\")"
    ]
   },