# Exporting models using TorchScript

## Table of contents

1. [Understanding TorchScript and model exporting](#understanding-torchscript-and-model-exporting)
2. [Setting up the environment](#setting-up-the-environment)
3. [Building a simple PyTorch model](#building-a-simple-pytorch-model)
4. [Tracing a model with TorchScript](#tracing-a-model-with-torchscript)
5. [Scripting a model with TorchScript](#scripting-a-model-with-torchscript)
6. [Saving and loading TorchScript models](#saving-and-loading-torchscript-models)
7. [Running TorchScript models in C++](#running-torchscript-models-in-c)
8. [Comparing performance: TorchScript vs. native PyTorch](#comparing-performance-torchscript-vs-native-pytorch)
9. [Experimenting with optimizations](#experimenting-with-optimizations)

## Understanding TorchScript and model exporting

### **Key concepts**
TorchScript is a tool in PyTorch that allows models to be serialized and exported for use in production environments. By converting PyTorch models into an intermediate representation, TorchScript enables deployment in environments without a Python runtime, such as mobile devices or edge systems. It supports the same functionality as PyTorch while providing flexibility for optimized inference.

Key features of TorchScript include:
- **Tracing**: Converts a model into TorchScript by recording operations during a forward pass.
- **Scripting**: Directly converts a model into TorchScript by analyzing its code, including control flows like loops and conditionals.
- **Serialization**: Saves the model as a `.pt` file, enabling portability and reuse.
- **Integration**: TorchScript models can run in C++ environments using the PyTorch C++ API, making them ideal for production deployment.

TorchScript combines the dynamic nature of PyTorch with the static benefits required for efficient inference in production.

### **Applications**
Exporting models using TorchScript is essential for:
- **Mobile deployment**: Running models on Android and iOS devices with PyTorch Mobile.
- **Edge computing**: Deploying models on low-power devices for real-time applications.
- **Cross-platform compatibility**: Using TorchScript models in C++ applications without a Python dependency.
- **Optimized inference**: Improving inference speed and memory efficiency for production systems.

### **Advantages**
- **Portability**: Enables seamless deployment across various platforms and environments.
- **Performance optimization**: Static graphs allow for optimizations that improve inference speed and reduce memory usage.
- **Flexibility**: Supports dynamic models with scripting while enabling production-ready deployment.
- **Ease of integration**: Simplifies using PyTorch models in non-Python environments.

### **Challenges**
- **Debugging**: Errors in TorchScript conversion can be challenging to trace and resolve.
- **Limited Python support**: Some Python constructs and third-party libraries are not supported in TorchScript.
- **Model compatibility**: Custom layers or operations may require additional adaptation for TorchScript compatibility.
- **Static limitations**: Dynamic PyTorch functionalities may need to be rewritten or adjusted for TorchScript.

## Setting up the environment


##### **Q1: How do you install the necessary libraries, such as PyTorch, for exporting models using TorchScript?**


In [1]:
# !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

##### **Q2: How do you import the required PyTorch modules for exporting and working with TorchScript models?**


In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

##### **Q3: How do you configure the environment to use GPU acceleration with TorchScript?**

In [3]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')  # use GPU if available
print(device)

cuda


## Building a simple PyTorch model


##### **Q4: How do you define a simple neural network in PyTorch?**


In [4]:
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)  # fully connected layer 1
        self.fc2 = nn.Linear(128, 64)     # fully connected layer 2
        self.fc3 = nn.Linear(64, 10)      # output layer for 10 classes

In [5]:
model = SimpleNet().to(device)

##### **Q5: How do you implement the forward pass for the PyTorch model?**


In [6]:
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)            # flatten the input
        x = F.relu(self.fc1(x))          # apply ReLU after fc1
        x = F.relu(self.fc2(x))          # apply ReLU after fc2
        x = self.fc3(x)                  # output layer
        return x

In [7]:
model = SimpleNet().to(device)

##### **Q6: How do you train a simple PyTorch model on a small dataset or a synthetic dataset?**

In [8]:
transform = transforms.Compose([
    transforms.ToTensor(),                     # convert image to tensor
    transforms.Normalize((0.1307,), (0.3081,)) # normalize with MNIST mean and std
])

train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
test_dataset  = torchvision.datasets.MNIST(root='./data', train=False, transform=transform, download=True)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader  = torch.utils.data.DataLoader(test_dataset, batch_size=1000, shuffle=False)


100%|██████████| 9.91M/9.91M [00:01<00:00, 8.87MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 305kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 2.65MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 454kB/s]


In [9]:
model = SimpleNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [10]:
for epoch in range(3):
    model.train()
    total_loss = 0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)

        outputs = model(images)                         # forward pass
        loss = criterion(outputs, labels)               # compute loss
        optimizer.zero_grad()                           # reset gradients
        loss.backward()                                 # backpropagation
        optimizer.step()                                # update weights

        total_loss += loss.item()

    print(f'Epoch {epoch+1} - Training Loss: {total_loss / len(train_loader):.4f}')

Epoch 1 - Training Loss: 0.2762
Epoch 2 - Training Loss: 0.1155
Epoch 3 - Training Loss: 0.0797


## Tracing a model with TorchScript


##### **Q7: How do you use `torch.jit.trace` to trace a PyTorch model and convert it into TorchScript?**


In [11]:
model.eval()  # set model to evaluation mode
example_input = torch.randn(1, 1, 28, 28).to(device)  # dummy input for tracing
traced_model = torch.jit.trace(model, example_input)  # perform tracing

##### **Q8: How do you feed example inputs into the model during tracing to capture its computation graph?**


In [12]:
print(traced_model.graph)

graph(%self.1 : __torch__.SimpleNet,
      %x : Float(1, 1, 28, 28, strides=[784, 784, 28, 1], requires_grad=0, device=cuda:0)):
  %fc3 : __torch__.torch.nn.modules.linear.___torch_mangle_1.Linear = prim::GetAttr[name="fc3"](%self.1)
  %fc2 : __torch__.torch.nn.modules.linear.___torch_mangle_0.Linear = prim::GetAttr[name="fc2"](%self.1)
  %fc1 : __torch__.torch.nn.modules.linear.Linear = prim::GetAttr[name="fc1"](%self.1)
  %19 : int = prim::Constant[value=-1]() # C:\Users\Fellipe\AppData\Local\Temp\ipykernel_26344\2950993803.py:9:0
  %20 : int = prim::Constant[value=784]() # C:\Users\Fellipe\AppData\Local\Temp\ipykernel_26344\2950993803.py:9:0
  %21 : int[] = prim::ListConstruct(%19, %20)
  %input.1 : Float(1, 784, strides=[784, 1], requires_grad=0, device=cuda:0) = aten::view(%x, %21) # C:\Users\Fellipe\AppData\Local\Temp\ipykernel_26344\2950993803.py:9:0
  %43 : Tensor = prim::CallMethod[name="forward"](%fc1, %input.1)
  %input.5 : Float(1, 128, strides=[128, 1], requires_grad=1, de

##### **Q9: How do you run inference with the traced TorchScript model to verify that it works correctly?**

In [13]:
traced_model.eval()
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = traced_model(images)
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)

In [14]:
print(f'Traced Model Accuracy: {100 * correct / total:.2f}%')

Traced Model Accuracy: 96.95%


## Scripting a model with TorchScript


##### **Q10: How do you use `torch.jit.script` to script a PyTorch model and convert it into TorchScript?**


In [15]:
scripted_model = torch.jit.script(model)  # convert model using scripting

##### **Q11: How do you handle control flow in your PyTorch model when using scripting?**


In [16]:
class ControlFlowNet(nn.Module):
    def __init__(self):
        super(ControlFlowNet, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = F.relu(self.fc1(x))
        for _ in range(2):               # example of loop
            x = F.relu(self.fc2(x))
        if x.sum() > 0:                  # example of conditional
            x = self.fc3(x)
        else:
            x = torch.zeros_like(self.fc3(x))
        return x

In [17]:
cf_model = ControlFlowNet().to(device)
scripted_cf_model = torch.jit.script(cf_model)

##### **Q12: How do you compare the scripted model’s behavior to the original PyTorch model to ensure consistency?**

In [20]:
class ControlFlowNet(nn.Module):
    def __init__(self):
        super(ControlFlowNet, self).__init__()
        self.fc1 = nn.Linear(28*28, 128)
        self.fc2a = nn.Linear(128, 64)
        self.fc2b = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 10)

    def forward(self, x):
        x = x.view(-1, 28*28)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2a(x))
        x = F.relu(self.fc2b(x))
        if x.sum() > 0:
            x = self.fc3(x)
        else:
            x = torch.zeros_like(self.fc3(x))
        return x

In [21]:
cf_model = ControlFlowNet().to(device)
scripted_cf_model = torch.jit.script(cf_model)

In [22]:
cf_model.eval()
scripted_cf_model.eval()

RecursiveScriptModule(
  original_name=ControlFlowNet
  (fc1): RecursiveScriptModule(original_name=Linear)
  (fc2a): RecursiveScriptModule(original_name=Linear)
  (fc2b): RecursiveScriptModule(original_name=Linear)
  (fc3): RecursiveScriptModule(original_name=Linear)
)

In [23]:
input_sample = torch.randn(1, 1, 28, 28).to(device)
with torch.no_grad():
    original_output = cf_model(input_sample)
    scripted_output = scripted_cf_model(input_sample)

In [24]:
print(torch.allclose(original_output, scripted_output, atol=1e-6))

True


## Saving and loading TorchScript models


##### **Q13: How do you save a traced or scripted TorchScript model using `model.save()`?**


In [25]:
traced_model.save('traced_mnist_model.pt')   # save traced model
scripted_cf_model.save('scripted_cf_model.pt')  # save scripted model

##### **Q14: How do you load a saved TorchScript model using `torch.jit.load()` for inference?**


In [26]:
loaded_traced = torch.jit.load('traced_mnist_model.pt').to(device)
loaded_scripted = torch.jit.load('scripted_cf_model.pt').to(device)

##### **Q15: How do you verify that the saved and loaded TorchScript model produces the same results as the original model?**

In [27]:
loaded_traced.eval()
with torch.no_grad():
    sample = torch.randn(1, 1, 28, 28).to(device)
    original_out = traced_model(sample)
    loaded_out = loaded_traced(sample)

In [28]:
print(torch.allclose(original_out, loaded_out, atol=1e-6))

True


## Running TorchScript models in C++


##### **Q16: How do you export a TorchScript model to run it in a C++ environment?**


In [None]:
# see Q13

##### **Q17: How do you set up a simple C++ project using LibTorch to load and run the TorchScript model?**


In [None]:
# save the following as a main.cpp file:

# #include <torch/script.h>  // One-stop header.
# #include <iostream>
# #include <memory>

# int main() {
#     torch::jit::script::Module module;
#     try {
#         // Load the TorchScript model
#         module = torch::jit::load("traced_mnist_model.pt");

#         // Prepare example input
#         torch::Tensor input = torch::randn({1, 1, 28, 28});

#         // Run inference
#         std::vector<torch::jit::IValue> inputs;
#         inputs.push_back(input);
#         at::Tensor output = module.forward(inputs).toTensor();

#         std::cout << "Output Tensor: " << output << std::endl;
#     }
#     catch (const c10::Error& e) {
#         std::cerr << "Error loading the model.\n";
#         return -1;
#     }

#     return 0;
# }

##### **Q18: How do you pass input data to the TorchScript model in C++ for inference?**


In [None]:
# compile and run the main.cpp code. e.g.,

# c++ -std=c++17 main.cpp -I/path/to/libtorch/include -I/path/to/libtorch/include/torch/csrc/api/include \
#    -L/path/to/libtorch/lib -ltorch -lc10 -Wl,-rpath=/path/to/libtorch/lib -o run_model

# ./run_model

##### **Q19: How do you verify the outputs of the TorchScript model in C++ and compare them to the Python version?**

In [None]:
# you can manually log output values in C++ and cross-check with Python predictions, as included in main.cpp: std::cout << "Output Tensor: " << output << std::endl;

## Comparing performance: TorchScript vs. native PyTorch


##### **Q20: How do you compare the inference speed of the TorchScript model to the original PyTorch model on the same dataset?**


In [32]:
import time

def benchmark_inference(model, data_loader, label):
    model.eval()
    total_time = 0
    with torch.no_grad():
        for images, _ in data_loader:
            images = images.to(device)
            start = time.time()
            _ = model(images)
            end = time.time()
            total_time += (end - start)
    print(f'{label} total inference time: {total_time:.4f} seconds')

In [33]:
benchmark_inference(model, test_loader, 'Original PyTorch')
benchmark_inference(traced_model, test_loader, 'Traced TorchScript')

Original PyTorch total inference time: 0.0020 seconds
Traced TorchScript total inference time: 0.0017 seconds


##### **Q21: How do you measure memory usage during inference for both the TorchScript model and the native PyTorch model?**


In [34]:
def report_memory(label):
    torch.cuda.empty_cache()
    torch.cuda.reset_peak_memory_stats()
    _ = torch.randn(1, 1, 28, 28).to(device)  # warm-up
    torch.cuda.synchronize()
    before = torch.cuda.memory_allocated(device)

    with torch.no_grad():
        for _ in range(10):
            _ = model(torch.randn(64, 1, 28, 28).to(device))  # replace `model` as needed

    torch.cuda.synchronize()
    after = torch.cuda.max_memory_allocated(device)
    print(f'{label} peak memory usage: {(after - before)/1e6:.2f} MB')

In [35]:
if device.type == 'cuda':
    model.eval()
    report_memory('Original PyTorch')

    traced_model.eval()
    model = traced_model  # reuse function without redefining
    report_memory('Traced TorchScript')

Original PyTorch peak memory usage: 1.29 MB
Traced TorchScript peak memory usage: 1.29 MB


##### **Q22: How do you benchmark the performance of both models (TorchScript and PyTorch) in terms of latency and throughput?**

In [36]:
def latency_throughput(model, batch_size=64, runs=100):
    model.eval()
    input_sample = torch.randn(batch_size, 1, 28, 28).to(device)
    torch.cuda.synchronize()
    start = time.time()
    with torch.no_grad():
        for _ in range(runs):
            _ = model(input_sample)
    torch.cuda.synchronize()
    end = time.time()

    total_time = end - start
    latency = total_time / runs
    throughput = batch_size * runs / total_time
    return latency, throughput

In [37]:
if device.type == 'cuda':
    l1, t1 = latency_throughput(model)
    print(f'Original PyTorch — Latency: {l1*1000:.2f} ms, Throughput: {t1:.2f} samples/s')

    l2, t2 = latency_throughput(traced_model)
    print(f'Traced TorchScript — Latency: {l2*1000:.2f} ms, Throughput: {t2:.2f} samples/s')

Original PyTorch — Latency: 1.04 ms, Throughput: 61312.81 samples/s
Traced TorchScript — Latency: 0.84 ms, Throughput: 76078.74 samples/s


## Experimenting with optimizations


##### **Q23: How do you reduce the model’s precision to optimize performance when exporting with TorchScript?**


In [38]:
traced_fp16 = torch.jit.trace(model.half(), torch.randn(1, 1, 28, 28).half().to(device))
traced_fp16.save('traced_mnist_model_fp16.pt')  # save reduced-precision model



##### **Q24: How do you apply other optimizations, such as pruning or quantization, to improve the efficiency of the TorchScript model?**


In [40]:
from torch.nn.utils import prune

model = SimpleNet().to(device)  # make sure model is fully constructed first
model.eval()

prune.random_unstructured(model.fc1, name='weight', amount=0.3)  # apply pruning
prune.remove(model.fc1, 'weight')  # remove reparam hooks and finalize

Linear(in_features=784, out_features=128, bias=True)

In [41]:
traced_pruned = torch.jit.trace(model, torch.randn(1, 1, 28, 28).to(device))
traced_pruned.save('traced_mnist_model_pruned.pt')

##### **Q25: How do you experiment with different TorchScript backends to analyze performance changes?**


In [45]:
def backend_inference(model, backend):
    model = model.to(backend)
    model.eval()

    input_dtype = next(model.parameters()).dtype
    input_tensor = torch.randn(64, 1, 28, 28).to(backend).to(input_dtype)  # match dtype

    if backend.type == 'cuda':
        torch.cuda.synchronize()
    start = time.time()
    with torch.no_grad():
        _ = model(input_tensor)
    if backend.type == 'cuda':
        torch.cuda.synchronize()
    end = time.time()

    print(f'Backend: {backend}  |  Inference time: {(end - start)*1000:.2f} ms')

In [46]:
backend_inference(traced_model, torch.device('cpu'))
if torch.cuda.is_available():
    backend_inference(traced_model, torch.device('cuda'))

Backend: cpu  |  Inference time: 0.26 ms
Backend: cuda  |  Inference time: 139.42 ms


##### **Q26: How do you combine TorchScript with other optimization techniques to enhance inference speed?**

In [47]:
import torch.quantization

fused_model = torch.nn.Sequential(
    nn.Sequential(
        nn.Flatten(),
        nn.Linear(28*28, 128),
        nn.ReLU()
    ),
    nn.Sequential(
        nn.Linear(128, 64),
        nn.ReLU()
    ),
    nn.Linear(64, 10)
).to(device)

In [48]:
fused_model.eval()
example_input = torch.randn(1, 1, 28, 28).to(device)
traced_fused = torch.jit.trace(fused_model, example_input)
traced_fused.save('traced_fused_model.pt')

In [49]:
import shutil
import os
from pathlib import Path

shutil.rmtree('data', ignore_errors=True)

for pt_file in Path('.').glob('*.pt'):
    pt_file.unlink()