Exporting FastAI ResNet model to ONNX is a 2 step process - there's no native FastAI to ONNX export. Since, FastAI is a high-level API built on top of PyTorch - first we need to extract the PyTorch model. Only after that it's possible to get our ONNX model. Couple of things must be considered which I'll talk about and show later.

Steps to be taken:

1. Extract PyTorch model from FastAI
2. Convert PyTorch model to ONNX

In [2]:
from fastai.vision.all import *

In [3]:
# load FastAI learner
learn = load_learner('birds-res34-fast.pkl')

## Test inference with FastAI learner

In [4]:
# cockatoo
print(learn.predict('test/COCKATOO/1.jpg'))

# roadrunner
print(learn.predict('test/ROADRUNNER/3.jpg'))

('COCKATOO', tensor(153), tensor([4.1982e-09, 7.5033e-07, 5.0929e-09, 1.5086e-08, 2.7239e-07, 3.0972e-09,
        3.0564e-09, 2.1395e-09, 2.6006e-08, 2.0480e-06, 2.0302e-09, 7.2165e-05,
        1.0100e-07, 5.6200e-10, 6.4807e-08, 4.5042e-10, 6.0963e-09, 3.6655e-08,
        1.7121e-09, 2.8703e-08, 2.5718e-08, 6.7171e-09, 2.6483e-08, 1.2829e-07,
        2.0626e-08, 1.0889e-09, 2.8666e-07, 2.3959e-09, 2.1662e-08, 1.7993e-10,
        1.6995e-07, 4.9092e-08, 1.8571e-09, 2.1351e-09, 1.0439e-08, 2.6217e-10,
        2.7309e-07, 5.9627e-08, 1.2785e-09, 7.2161e-07, 2.2780e-11, 3.2073e-09,
        1.5691e-08, 2.3097e-09, 3.3332e-10, 1.1445e-09, 4.2474e-10, 7.1732e-10,
        1.2535e-11, 4.4392e-09, 5.8820e-07, 1.7659e-07, 8.9970e-10, 3.6871e-07,
        3.2198e-10, 1.5383e-04, 1.4266e-08, 2.8286e-11, 7.0295e-10, 8.3036e-09,
        1.7798e-10, 2.1028e-07, 4.2871e-10, 1.4762e-05, 6.3007e-09, 1.2159e-09,
        2.0997e-07, 6.0095e-09, 3.8880e-09, 9.8153e-09, 1.6129e-07, 8.2425e-11,
        2.6071

('ROADRUNNER', tensor(422), tensor([4.4763e-07, 3.1679e-09, 1.1058e-08, 5.2037e-09, 7.1648e-09, 1.0560e-10,
        1.6886e-09, 4.3852e-07, 3.6378e-10, 1.1667e-09, 7.4717e-07, 1.3858e-09,
        7.2033e-09, 1.8232e-08, 7.2110e-08, 1.9992e-06, 5.5346e-09, 5.6703e-10,
        2.0271e-10, 8.5286e-09, 4.0974e-09, 1.4701e-07, 4.3112e-08, 6.0880e-10,
        2.0841e-08, 6.1344e-09, 1.7666e-09, 6.1014e-08, 4.3754e-08, 7.3994e-07,
        5.1411e-08, 1.2294e-08, 1.1537e-06, 3.0249e-10, 9.8658e-09, 6.6940e-07,
        7.1454e-09, 1.4618e-09, 1.0383e-07, 7.6908e-09, 1.2885e-09, 1.7874e-08,
        3.7115e-09, 2.0936e-07, 7.1194e-05, 4.3644e-08, 1.2604e-08, 6.2225e-09,
        2.2992e-08, 5.1324e-10, 1.1535e-09, 7.4416e-09, 4.4712e-09, 1.1186e-08,
        1.8156e-07, 2.7481e-09, 7.4994e-09, 8.8192e-10, 2.9684e-07, 4.4173e-09,
        6.8604e-09, 6.0816e-09, 4.3659e-06, 2.6606e-10, 2.0127e-08, 7.0674e-08,
        9.2133e-09, 5.5016e-08, 5.0181e-09, 5.3931e-08, 9.5037e-09, 6.6529e-07,
        7.61

## 1. Getting the PyTorch model

By using .model attribute on the FastAI learner we get the 'pure' PyTorch model. By using eval() we are setting the model to 'prediction' mode - backward propagation is disabled.

#### Note:
FastAI learner wraps the PyTorch model with additional operations defined in `DataBlock` in this case we defined a `Resize` operation. By default, FastAI learner also includes a softmax layer and a normalization layer.

### What does this mean ?
If we run `learn.predict('path/to/image')` our FastAI learner resizes the image to the size defined in our `DataBlock`, normalizes the color channels, passes the image through the neural net, scales the inference results between 0-1.

If we'd try to run inference on a PyTorch model extracted from a FastAI wrapper it would fail (most likely) as the image has an incorrect resolution. That's why we need to add transformations to a PyTorch models before inference.

In [5]:
# transformations performed on data loaders
learn.dls.transform

(#2) [[noop:
encodes: (object,object) -> noopdecodes: , PILBase.create:
encodes: (Path,object) -> create
(str,object) -> create
(Tensor,object) -> create
(ndarray,object) -> create
(bytes,object) -> create
(Image,object) -> createdecodes: ],parent_label:
encodes: (object,object) -> parent_labeldecodes: ]

In [6]:
# extract labels from learner
labels = learn.dls.vocab
labels

['ABBOTTS BABBLER', 'ABBOTTS BOOBY', 'ABYSSINIAN GROUND HORNBILL', 'AFRICAN CROWNED CRANE', 'AFRICAN EMERALD CUCKOO', 'AFRICAN FIREFINCH', 'AFRICAN OYSTER CATCHER', 'AFRICAN PIED HORNBILL', 'AFRICAN PYGMY GOOSE', 'ALBATROSS', 'ALBERTS TOWHEE', 'ALEXANDRINE PARAKEET', 'ALPINE CHOUGH', 'ALTAMIRA YELLOWTHROAT', 'AMERICAN AVOCET', 'AMERICAN BITTERN', 'AMERICAN COOT', 'AMERICAN DIPPER', 'AMERICAN FLAMINGO', 'AMERICAN GOLDFINCH', 'AMERICAN KESTREL', 'AMERICAN PIPIT', 'AMERICAN REDSTART', 'AMERICAN ROBIN', 'AMERICAN WIGEON', 'AMETHYST WOODSTAR', 'ANDEAN GOOSE', 'ANDEAN LAPWING', 'ANDEAN SISKIN', 'ANHINGA', 'ANIANIAU', 'ANNAS HUMMINGBIRD', 'ANTBIRD', 'ANTILLEAN EUPHONIA', 'APAPANE', 'APOSTLEBIRD', 'ARARIPE MANAKIN', 'ASHY STORM PETREL', 'ASHY THRUSHBIRD', 'ASIAN CRESTED IBIS', 'ASIAN DOLLARD BIRD', 'ASIAN GREEN BEE EATER', 'ASIAN OPENBILL STORK', 'AUCKLAND SHAQ', 'AUSTRAL CANASTERO', 'AUSTRALASIAN FIGBIRD', 'AVADAVAT', 'AZARAS SPINETAIL', 'AZURE BREASTED PITTA', 'AZURE JAY', 'AZURE TANAGER', '

### Let's add the missing parts to our Pytorch model

As mentioned previously, FastAI learner wraps out PyTorch model. We are adding 2 additional layers to the models to achieve the same results as the FastAI learner. 

### Why can't we just export the learner to ONNX?
AFAIK FastAI currently doesn't support ONNX exports. We need to extract the PyTorch model as an intermediary step before we can convert it to ONNX. 

--------

### Additional layers:

#### Normalization layer
First we need to add a normalization layer. Pre-trained PyTorch models are trained on the ImageNet dataset for which the following normalization variables are suggested `mean=[0.485, 0.456, 0.406]`, `std=[0.229, 0.224, 0.225]`. Source: https://pytorch.org/vision/stable/models.html. Normalization is ought to increase model performance.


#### Softmax layer
By adding a softmax layer at the end of our final model definition we are making a our results 'human readble'. As en example here's an inference result without the softmax layer `('not_hot_dog', array([[-3.0275817,  1.2424631]], dtype=float32))`. *Helpful? Not really imo.*

Here's the inference with the added softmax layer: `('not_hot_dog', array([[0.01378838, 0.98621166]], dtype=float32))`.

In [9]:
import torch
import torchvision
import torchvision.transforms as transforms

# https://pytorch.org/vision/stable/models.html

pytorch_model = learn.model.eval() # gets the PyTorch model
softmax_layer = torch.nn.Softmax(dim=1) # define softmax
normalization_layer = torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) # normalization layer

# assembling the final model
final_model = nn.Sequential(
    normalization_layer,
    pytorch_model,
    softmax_layer
)

# Note: Image resizing will be handled separately


In [10]:
# loading an image and converting to tensor
from PIL import Image

def image_transform(path: str, size: int) -> torch.Tensor:
    '''Helper function to transform image.'''
    image = Image.open(path)

    # transformation pipeline
    transformation = transforms.Compose([
                transforms.Resize([size,size]), # resizes image
                transforms.ToTensor() # converts to image to tensor
            ])

    image_tensor = transformation(image).unsqueeze(0)
    print('Tensor shape: ', image_tensor.shape)

    return image_tensor

In [11]:
# test image paths
hot_dog_test = 'test/COCKATOO/1.jpg'
not_hot_dog_test = 'test/ROADRUNNER/3.jpg'

In [12]:
# get image tensors
hot_dog_tensor = image_transform(hot_dog_test, 256)
not_hot_dog_tensor = image_transform(not_hot_dog_test, 256)

Tensor shape:  torch.Size([1, 3, 256, 256])
Tensor shape:  torch.Size([1, 3, 256, 256])


In [13]:
# run inference on test images

with torch.no_grad():
    results = final_model(hot_dog_tensor)
labels[np.argmax(results.detach().numpy())], results.detach().numpy().astype(float)

('COCKATOO',
 array([[3.01397378e-08, 4.04880893e-06, 1.96300789e-08, 7.04394836e-08,
         3.92995304e-07, 1.36720999e-08, 2.44684060e-08, 1.22050405e-08,
         2.79061567e-07, 3.23763538e-06, 1.49686805e-08, 1.56375754e-04,
         7.13960958e-07, 4.87535567e-09, 5.51419078e-07, 2.02684580e-09,
         1.72113737e-08, 9.45411216e-08, 2.76054468e-08, 1.07042325e-07,
         1.68884341e-07, 2.19816680e-08, 1.44551308e-07, 4.96786470e-07,
         1.80395759e-07, 1.60860285e-08, 1.55613907e-06, 1.43558117e-08,
         7.48111546e-08, 1.62756064e-09, 1.23222878e-06, 3.22883068e-07,
         5.73042325e-09, 1.24934498e-08, 9.66928724e-08, 2.76543366e-09,
         1.05631568e-06, 6.93971401e-07, 3.53235796e-09, 2.54751694e-06,
         9.62988578e-11, 2.02615880e-08, 7.28539504e-08, 2.18257945e-08,
         2.47345433e-09, 1.37052258e-09, 2.43031373e-09, 4.32706049e-09,
         4.93186984e-11, 3.44192195e-08, 2.46775858e-06, 1.57767784e-06,
         6.51657217e-09, 1.07011044e-0

In [14]:
# run inference on test images

with torch.no_grad():
    results = final_model(not_hot_dog_tensor)
labels[np.argmax(results.detach().numpy())], results.detach().numpy()

('ROADRUNNER',
 array([[3.39912822e-06, 1.43256447e-08, 3.31093588e-08, 3.70598663e-08,
         3.54585445e-08, 7.42300443e-10, 3.32185834e-09, 8.50267611e-07,
         1.44287993e-09, 7.18258397e-10, 1.39974759e-06, 4.69306727e-09,
         2.13622542e-08, 1.60234322e-07, 3.69171289e-07, 1.18748085e-05,
         1.91432683e-08, 1.29190492e-09, 1.66858394e-09, 2.69956146e-08,
         6.54862076e-09, 5.29290162e-07, 1.80254091e-07, 1.51070112e-09,
         1.50414436e-07, 8.75272050e-08, 1.36786493e-08, 3.08783768e-07,
         1.62586701e-07, 1.46050593e-06, 1.25604430e-07, 3.01507832e-08,
         5.45711919e-06, 5.66591829e-10, 3.82348873e-08, 3.97587883e-06,
         3.30651844e-08, 4.63302419e-09, 3.20599668e-07, 3.47670728e-08,
         1.83582749e-09, 4.30640554e-08, 2.26385612e-08, 5.21142454e-07,
         2.12006853e-04, 1.31605461e-07, 4.46509176e-08, 2.07857269e-08,
         1.17345827e-07, 2.41685316e-09, 4.19461132e-09, 2.16956799e-08,
         2.00545873e-08, 3.09614272e

## 2. Export PyTorch model to ONNX

It's REALLY important to define the input shape of an ONNX model. We trained the FastAI/PyTorch model on 256 x 256 image. And we need to use the same for export. The input tensor must be in BCHW format - 1x3x256x256. (Batch x Channels x Height x Width).

PyTorch documentation: https://pytorch.org/docs/master/onnx.html

In [16]:
torch.onnx.export(
    final_model, 
    torch.randn(1, 3, 256, 256),
    "models/bird_model_resnet34_256_256.onnx",
    do_constant_folding=True,
    export_params=True, # if set to False exports untrained model
    input_names=["image_1_3_256_256"],
    output_names=["bird"],
    opset_version=11
)

verbose: False, log level: Level.ERROR



### (Optional) Validate ONNX model

In [17]:
import onnx

# Load the ONNX model
model = onnx.load('models/bird_model_resnet34_256_256.onnx')

# Check that the IR is well formed
onnx.checker.check_model(model)

# Print a human readable representation of the graph
# onnx.helper.printable_graph(model.graph)

We got out ONNX model. Let's compare the results.

In [19]:
import numpy as np
import onnxruntime as rt

np.set_printoptions(suppress=True)

In [20]:
from PIL import Image

def image_transform_onnx(path: str, size: int) -> np.ndarray:
    '''Image transform helper for onnx runtime inference.'''

    image = Image.open(path)
    image = image.resize((size,size))
    # print(image.shape, image.mode)


    # now our image is represented by 3 layers - Red, Green, Blue
    # each layer has a 224 x 224 values representing
    image = np.array(image)
    # print('Conversion to tensor: ',image.shape)

    # dummy input for the model at export - torch.randn(1, 3, 224, 224)
    image = image.transpose(2,0,1).astype(np.float32)
    # print('Transposing the tensor: ',image.shape)

    # our image is currently represented by values ranging between 0-255
    # we need to convert these values to 0.0-1.0 - those are the values that are expected by our model

    # print('Integer value: ', image[0][0][40])
    image /= 255
    # print('Float value: ', image[0][0][40])

    # expanding the alread existing tensor with the final dimension (similar to unsqueeze(0))
    # currently our tensor only has rank of 3 which needs to be expanded to 4 - torch.randn(1, 3, 224, 224)
    # 1 can be considered the batch size

    image = image[None, ...]
    # print('Final shape of our tensor', image.shape, '\n')
    return image


In [21]:
hot_dog_tensor_onnx = image_transform_onnx('test/COCKATOO/1.jpg', 256)
not_hot_dog_tensor_onnx = image_transform_onnx('test/ROADRUNNER/3.jpg', 256)

In [22]:
# initialize onnx runtime inference session
sess = rt.InferenceSession('models/bird_model_resnet34_256_256.onnx')

# input & output names
input_name = sess.get_inputs()[0].name
output_name = sess.get_outputs()[0].name

# input dimensions
input_dims = sess.get_inputs()[0].shape

input_name, output_name, input_dims

('image_1_3_256_256', 'bird', [1, 3, 256, 256])

In [23]:
results = sess.run([output_name], {input_name: hot_dog_tensor_onnx})[0]
labels[np.argmax(results)], results, labels

('COCKATOO',
 array([[0.00000002, 0.00000285, 0.00000002, 0.00000007, 0.00000034,
         0.00000001, 0.00000002, 0.00000001, 0.00000017, 0.00000268,
         0.00000002, 0.00013275, 0.00000061, 0.        , 0.00000033,
         0.        , 0.00000001, 0.00000008, 0.00000002, 0.00000007,
         0.00000013, 0.00000001, 0.00000009, 0.00000033, 0.00000013,
         0.00000001, 0.00000111, 0.00000001, 0.00000006, 0.        ,
         0.0000009 , 0.00000022, 0.        , 0.00000001, 0.00000006,
         0.        , 0.00000079, 0.00000042, 0.        , 0.00000235,
         0.        , 0.00000002, 0.00000007, 0.00000002, 0.        ,
         0.        , 0.        , 0.        , 0.        , 0.00000003,
         0.00000181, 0.0000011 , 0.        , 0.00000087, 0.        ,
         0.00044059, 0.00000004, 0.        , 0.        , 0.00000002,
         0.        , 0.00000119, 0.        , 0.00004179, 0.00000004,
         0.        , 0.00000048, 0.00000001, 0.00000001, 0.00000002,
         0.00000118, 

In [24]:
results = sess.run([output_name], {input_name: not_hot_dog_tensor_onnx})[0]
labels[np.argmax(results)], results, labels

('ROADRUNNER',
 array([[0.00000239, 0.00000001, 0.00000003, 0.00000005, 0.00000003,
         0.        , 0.        , 0.00000064, 0.        , 0.        ,
         0.00000157, 0.00000001, 0.00000002, 0.00000013, 0.00000031,
         0.00000697, 0.00000002, 0.        , 0.        , 0.00000003,
         0.00000001, 0.00000042, 0.00000012, 0.        , 0.00000017,
         0.00000006, 0.00000001, 0.00000034, 0.00000014, 0.00000125,
         0.00000014, 0.00000003, 0.00000304, 0.        , 0.00000003,
         0.00000304, 0.00000003, 0.        , 0.00000022, 0.00000003,
         0.        , 0.00000004, 0.00000002, 0.00000058, 0.00018795,
         0.0000001 , 0.00000004, 0.00000002, 0.00000009, 0.        ,
         0.        , 0.00000002, 0.00000002, 0.00000002, 0.00000041,
         0.00000001, 0.00000002, 0.        , 0.00000064, 0.00000001,
         0.00000001, 0.00000002, 0.00000561, 0.        , 0.00000005,
         0.00000014, 0.00000004, 0.00000019, 0.00000001, 0.00000008,
         0.00000003

**If correct, all three model versions (FastAI, PyTorch, ONNX) have the same results (with minor differences).**

## Inference time

One of the advantages of ONNX runtime - in most of the cases it's faster than it's original format. Let's see if that holds up.

### FastAI

In [25]:
%time
learn.predict('test/COCKATOO/1.jpg')

CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 5.96 µs


('COCKATOO',
 tensor(153),
 tensor([4.1982e-09, 7.5033e-07, 5.0929e-09, 1.5086e-08, 2.7239e-07, 3.0972e-09,
         3.0564e-09, 2.1395e-09, 2.6006e-08, 2.0480e-06, 2.0302e-09, 7.2165e-05,
         1.0100e-07, 5.6200e-10, 6.4807e-08, 4.5042e-10, 6.0963e-09, 3.6655e-08,
         1.7121e-09, 2.8703e-08, 2.5718e-08, 6.7171e-09, 2.6483e-08, 1.2829e-07,
         2.0626e-08, 1.0889e-09, 2.8666e-07, 2.3959e-09, 2.1662e-08, 1.7993e-10,
         1.6995e-07, 4.9092e-08, 1.8571e-09, 2.1351e-09, 1.0439e-08, 2.6217e-10,
         2.7309e-07, 5.9627e-08, 1.2785e-09, 7.2161e-07, 2.2780e-11, 3.2073e-09,
         1.5691e-08, 2.3097e-09, 3.3332e-10, 1.1445e-09, 4.2474e-10, 7.1732e-10,
         1.2535e-11, 4.4392e-09, 5.8820e-07, 1.7659e-07, 8.9970e-10, 3.6871e-07,
         3.2198e-10, 1.5383e-04, 1.4266e-08, 2.8286e-11, 7.0295e-10, 8.3036e-09,
         1.7798e-10, 2.1028e-07, 4.2871e-10, 1.4762e-05, 6.3007e-09, 1.2159e-09,
         2.0997e-07, 6.0095e-09, 3.8880e-09, 9.8153e-09, 1.6129e-07, 8.2425e-11,
 

### PyTorch

In [26]:
%time

hot_dog_tensor = image_transform('test/COCKATOO/1.jpg', 256)

with torch.no_grad():
    results = final_model(hot_dog_tensor)
labels[np.argmax(results.detach().numpy())], results.detach().numpy().astype(float)

CPU times: user 1 µs, sys: 0 ns, total: 1 µs
Wall time: 4.05 µs
Tensor shape:  torch.Size([1, 3, 256, 256])


('COCKATOO',
 array([[0.00000003, 0.00000405, 0.00000002, 0.00000007, 0.00000039,
         0.00000001, 0.00000002, 0.00000001, 0.00000028, 0.00000324,
         0.00000001, 0.00015638, 0.00000071, 0.        , 0.00000055,
         0.        , 0.00000002, 0.00000009, 0.00000003, 0.00000011,
         0.00000017, 0.00000002, 0.00000014, 0.0000005 , 0.00000018,
         0.00000002, 0.00000156, 0.00000001, 0.00000007, 0.        ,
         0.00000123, 0.00000032, 0.00000001, 0.00000001, 0.0000001 ,
         0.        , 0.00000106, 0.00000069, 0.        , 0.00000255,
         0.        , 0.00000002, 0.00000007, 0.00000002, 0.        ,
         0.        , 0.        , 0.        , 0.        , 0.00000003,
         0.00000247, 0.00000158, 0.00000001, 0.00000107, 0.        ,
         0.00050981, 0.00000007, 0.        , 0.        , 0.00000002,
         0.        , 0.00000207, 0.        , 0.00006181, 0.00000004,
         0.        , 0.00000062, 0.00000001, 0.00000001, 0.00000002,
         0.00000137, 

### ONNX

In [31]:
%time

hot_dog_tensor_onnx = image_transform_onnx('test_images/hot_dog_114.jpg', 256)
results = sess.run([output_name], {input_name: hot_dog_tensor_onnx})[0]
labels[np.argmax(results)], results, labels

CPU times: user 3 µs, sys: 1 µs, total: 4 µs
Wall time: 15.3 µs


('hot_dog',
 array([[0.9986059 , 0.00139411]], dtype=float32),
 ['hot_dog', 'not_hot_dog'])

No real difference between the three. 

## Gotchas

It's really important to use the expected input with ONNX. Let's check the following scenario when using a 1x3x224x224 tensor with on a model with a defined input of 1x3x256x256. I've been struggling to figure out the correct ONNX settings. I hope this will help to some of you.

Let's see what happens

In [33]:
hot_dog_tensor_onnx = image_transform_onnx('test_images/hot_dog_114.jpg', 224)
not_hot_dog_tensor_onnx = image_transform_onnx('test_images/not_hot_dog_160.jpg', 224)

In [35]:
# This will throw an error becuase of incorrect input size.

results = sess.run([output_name], {input_name: hot_dog_tensor_onnx})[0]
labels[np.argmax(results)], results, labels

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: image_1_3_256_256 for the following indices
 index: 2 Got: 224 Expected: 256
 index: 3 Got: 224 Expected: 256
 Please fix either the inputs or the model.

## How to Debug

To check the inputs of a model you can use a tool like Netron to visualize it: https://netron.app. Desktop version available here: https://github.com/lutzroeder/netron

OR you can access the expected dimension by the following line:


In [36]:
# shows the required model input
sess.get_inputs()[0].shape

[1, 3, 256, 256]

## You got your own ONNX model.