# Torchcelerate - the easy way to boost your PyTorch performance

This example shows how Torchcelerate helps your accelerate your PyTorch models.

We're going to use a pre-trained model for this tutorial

In [1]:
import torch
import torchvision
import torchcelerate
import time

model = torchvision.models.resnet18()
model = model.eval() # set the model to inference mode

We use torchcelerate.optimize to prepare the model for fast, cross-platform inference. To let it do it's optimization magic we need to give it some information such as inputs and outputs.

In [2]:
test_input = torch.randn(1, 3, 224, 224, requires_grad=True)
optimized_model = torchcelerate.optimize(model, test_input,
                                    input_names = ['input'],
                                    output_names = ['output'],
                                    dynamic_axes={'input' : {0 : 'batch_size'},
                                                  'output' : {0 : 'batch_size'}})

That's it! Now we are ready to try it out. We're going to run both the original model and the Torchcelerate optimized model to check performance.

In [3]:
orig_starttime = time.time()
original_outputs = model.forward(test_input)
orig_endtime = time.time()
orig_time = orig_endtime - orig_starttime

opt_starttime = time.time()
outputs = optimized_model.forward(test_input)
opt_endtime = time.time()
opt_time = opt_endtime - opt_starttime

print("Original model took", orig_time, "seconds")
print("Optimized model took", opt_time, "seconds")
print("That's", int(100*(orig_time-opt_time)/orig_time), "% acceleration!")

Original model took 0.18813419342041016 seconds
Optimized model took 0.04770064353942871 seconds
That's 74 % acceleration!


We can also compare the outputs to make sure the results are acceptable after optimization

In [4]:
print(original_outputs)
print(outputs)

tensor([[ 0.5409, -1.7889,  0.4953,  1.4966,  1.5532, -0.6748,  0.4967,  0.0080,
         -0.7749,  0.1430,  1.7342,  0.5113,  3.1367, -1.0850, -2.6150, -0.2505,
          1.1643,  0.7908, -0.5415,  0.8689,  1.2530, -3.0874, -2.0024,  0.8584,
         -2.3748, -1.4146, -1.5917,  0.9275,  1.4231, -0.1743,  2.0658,  3.3379,
          3.0613,  0.5718,  0.4866, -1.5528, -1.0676, -1.1098, -0.9922, -1.2844,
         -0.1427, -0.1648,  1.5590, -1.1996,  2.3850,  0.5008,  0.2289,  0.7929,
         -1.8735, -2.0238, -0.0350,  0.1827, -0.5313, -0.3525, -1.4313, -0.7938,
          0.0187, -1.6973,  1.1340, -0.6501, -0.3290, -1.8808,  1.0377, -0.2515,
         -1.9111, -1.2934,  0.0679, -1.2730, -0.8414, -2.8211,  1.6330,  1.3422,
          2.5195, -0.0696, -0.2981,  0.1135, -2.3061,  0.5882,  0.9333,  0.9191,
         -0.2194, -1.1296, -0.3638,  1.3422,  0.6586,  1.1029, -1.6921,  0.0595,
          0.2097, -1.5352,  0.2369, -0.4392, -1.2429, -1.1682, -0.9856, -1.4423,
         -0.2836,  1.2888, -

Once you're happy with the optimized results, you can deploy your model by packaging it as a docker container.

If you are deploying the model to an edge devices or embedding it in an application where a container doesn't make sense, Torchcelerate makes it easy to save the model in a portable format.

In [5]:
optimized_model.serialize("model.onnx")

Now you can run it in Node.js, Java, C++, C#, WindowsML or even Python with a much smaller footprint.