# Load the model

In this unit we will look at how to load a model along with it's persisted parameter states and inference model predictions.

在本单元中，我们将了解如何加载模型及其持久参数状态和推理模型预测。

In [1]:
%matplotlib inline
import torch
import onnxruntime
from torch import nn
import torch.onnx as onnx
import torchvision.models as models
from torchvision import datasets
from torchvision.transforms import ToTensor

To load the model, we will define the model class which contains the state and parameters of the neural network used to train the model.

为了加载模型，我们将定义模型类，其中包含用于训练模型的神经网络的状态和参数。

In [2]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

When loading model weights, we needed to instantiate the model class first, because the class defines the structure of a network. Next, we load the parameters using the `load_state_dict()` method.

在加载模型权重时，我们需要先实例化模型类，因为该类定义了网络的结构。 接下来，我们使用 `load_state_dict()` 方法加载参数。

In [4]:
model = NeuralNetwork()
model.load_state_dict(torch.load('data/model.pth'))
model.eval()

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)

> **Note:** Be sure to call `model.eval()` method before inferencing to set the dropout and batch normalization layers to evaluation mode. Failing to do this will yield inconsistent inference results.




> **注意：** 一定要在推理之前调用 `model.eval()` 方法，将 dropout 和 batch normalization 层设置为评估模式。 如果不这样做，将产生不一致的推理结果。

## Model Inference

Optimizing a models to run on a variety of platforms and programming languages is difficult. It's very time consuming to maximize performance across all the different combinations of frameworks and hardware. The **Open Neural Network Exchange (ONNX)** runtime provides a solution for you to train once and acelerate inference on any hardward, cloud or edge devices is needed. 

ONNX  is a common format supported by a number of vendors to share neural networks and other machine learning models. You can use ONNX format to do inference on your model on other programming languages and frameworks such as Java, JavaScript, C# and ML.NET


优化模型以在各种平台和编程语言上运行是困难的。 在所有不同的框架和硬件组合中最大化性能非常耗时。 **Open Neural Network Exchange (ONNX)** 运行时为您提供了一种解决方案，您只需在需要的任何硬件、云或边缘设备上进行一次训练并加速推理。

ONNX 是许多供应商支持的通用格式，用于共享神经网络和其他机器学习模型。 您可以使用 ONNX 格式在其他编程语言和框架（如 Java、JavaScript、C# 和 ML.NET）上对您的模型进行推理

## Exporting the model to ONNX

PyTorch also has native ONNX export support. Given the dynamic nature of the PyTorch execution graph, however, the export process must traverse the execution graph to produce a persisted ONNX model. For this reason, a test variable of the appropriate size should be passed in to the export routine (in our case, we will create a dummy zero tensor of the correct size.  You can get the size from the `shape` fuction on your training dataset. e.g tensor.shape):

PyTorch 还具有原生 ONNX 导出支持。 然而，鉴于 PyTorch 执行图的动态特性，导出过程必须遍历执行图以生成持久化的 ONNX 模型。 出于这个原因，应将适当大小的测试变量传递到导出例程（在我们的例子中，我们将创建一个正确大小的虚拟零张量。您可以从训练中的“shape”函数中获取大小 数据集。例如 tensor.shape)：

In [5]:
input_image = torch.zeros((1,28,28))
onnx_model = 'data/model.onnx'
onnx.export(model, input_image, onnx_model)

We will use our test dataset as sample data for inference from the ONNX model to make predictions.

我们将使用我们的测试数据集作为样本数据，从 ONNX 模型进行推理以进行预测。


In [10]:
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]
x, y = test_data[8][0], test_data[8][1]

We need to create inference session with `onnxruntime.InferenceSession`.  To inference the onnx model, use run and pass in the list of outputs you want returned (leave empty if you want all of them) and a map of the input values. The result is a list of the outputs.

我们需要使用 `onnxruntime.InferenceSession` 创建推理会话。 要推断 onnx 模型，请使用运行并传入要返回的输出列表（如果需要所有输出，请留空）和输入值映射。 结果是输出列表。

In [11]:
session = onnxruntime.InferenceSession(onnx_model, None)
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name

result = session.run([output_name], {input_name: x.numpy()})
predicted, actual = classes[result[0][0].argmax(0)], classes[y]
print(f'Predicted: "{predicted}", Actual: "{actual}"')

Predicted: "Sneaker", Actual: "Sandal"


There are a lot of things you can do with ONNX model, including running inference on different platforms and in different programming languages. 

您可以使用 ONNX 模型做很多事情，包括在不同平台和不同编程语言中运行推理。