## Wrap Your Model for Inference
Exporting a model for production means packaging your model in a stand-alone format that can be transferred and used to perform inference in a production environment, such as an API or a website.



## Production-Ready Preprocessing

Remember that the images need some preprocessing before being fed to the CNN. For example, typically you need to resize, center crop, and normalize the image with a transform pipeline similar to this:

In [None]:
import torchvision.transforms as T 

testval_transforms = T.Compose(
    [
        # The size here depends on your application. Here let's use 256x256
        T.Resize(256),
        # Let's take the central 224x224 part of the image
        T.CenterCrop(224),
        T.ToTensor(),
        T.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    ]
)

Obviously, if you do not do these operations in production the performance of your model is going to suffer greatly.

The best course of action is to make these transformations part of your standalone package instead of re-implementing them in the production environment. Let's see how.

We need to wrap our model in a wrapper class that is going to take care of applying the transformations and then run the transformed image through the CNN.

If we trained with the **nn.CrossEntropyLoss** as the loss function, we also need to apply a softmax function to the output of the model so that the output of the wrapper will be probabilities and not merely scores.

Let's see an example of such a wrapper class:

In [None]:
import torch
from torch import nn 
from torchvision import datasets
import torchvision.transforms as T
from __future__ import annotations


class Predictor(nn.Module):

    def __init__(
      self, 
      model: nn.Module, 
      class_names: list[str], 
      mean: torch.Tensor, # mean of the dataset eg. for image dataset (mean_R, mean_G, mean_B) for each collor chanel in tuple 
      std: torch.Tensor   # std of the dataset eg. for image dataset (stdR, stdG, stdB) for each collor chanel in tuple 
    ):

        super().__init__()

        self.model = model.eval()
        self.class_names = class_names

        self.transforms = nn.Sequential(
            T.Resize([256, ]),
            T.CenterCrop(224),
            T.ConvertImageDtype(torch.float),
            T.Normalize(mean.tolist(), std.tolist())
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        with torch.no_grad():
            # 1. apply transforms
            x = self.transforms(x)  # =
            # 2. get the logits
            x = self.model(x)  # =
            # 3. apply softmax
            #    HINT: remmeber to apply softmax across dim=1
            x = F.softmax(x, dim=1)  # =

            return x

This defines the transformations we want to apply. It looks very similar to the transform validation pipeline, with a few important differences:

* We do not use nn.Compose but nn.Sequential. Indeed the former is not supported by torch.script (the export functionality of PyTorch).
* In Resize the size specification must be a tuple or a list, and not a scalar as we were able to do during training.
* There is no ToTensor. Instead, we use T.ConvertImageDtype. Indeed, in this context the input to the forward method is going to be already a Tensor

# Export Using torchscript
We can now create an instance of our Predictor wrapper and save it to file using torch.script:

In [None]:
model = MyMOdel(*args, **kwargs)
predictor = Predictor(model, class_names, mean, std).cpu()

# Export using torch.jit.script
scripted_predictor = torch.jit.script(predictor)
scripted_predictor.save("standalone_model.pt")

Note that we move the Predictor instance to the CPU before exporting it. When reloading the model, the model will be loaded on the device it was taken from. So if we want to do inference on the CPU, we need to first move the model there. In many cases CPUs are enough for inference, and they are much cheaper than GPUs.

We then use torch.jit.script which converts our wrapped model into an intermediate format that can be saved to disk (which we do immediately after).

Now, in a different process or a different computer altogether, we can do:

In [None]:
import torch

predictor_reloaded = torch.jit.load("standalone_model.pt")

This will recreate our wrapped model. We can then use it as follows:

In [None]:
from PIL import Image
import torch
import torchvision
import torchvision.transforms as T

# Reload the model
learn_inf = torch.jit.load("checkpoints/transfer_exported.pt")

# Read an image and transform it to tensor to simulate what would
# happen in production
img = Image.open("static_images/test/09.Golden_Gate_Bridge/190f3bae17c32c37.jpg")
# We use .unsqueeze because the model expects a batch, so this
# creates a batch of 1 element
t = T.ToTensor()(img).unsqueeze_(0)

# Perform inference and get the softmax vector
softmax = learn_inf(pil_to_tensor).squeeze()
# Get index of the winning label
max_idx = softmax.argmax()
# Print winning label using the class_names attribute of the 
# model wrapper
print(f"Prediction: {learn_inf.class_names[max_idx]}")

NOTE that there are 2 different formats that can be used to export a model: script and trace. Scripting is more general, but in some cases you do have to use tracing.