# Torchcelerate - the easy way to boost your PyTorch performance

This example shows how Torchcelerate helps your accelerate your PyTorch models.

We're going to use a pre-trained model for this tutorial

In [9]:
import torch
import torchvision
import torchcelerate
import time

model = torchvision.models.resnet18()
model = model.eval() # set the model to inference mode

We use torchcelerate.optimize to prepare the model for fast, cross-platform inference. To let it do it's optimization magic we need to give it some information such as inputs and outputs.

In [10]:
test_input = torch.randn(1, 3, 224, 224, requires_grad=True)
optimized_model = torch_ort.optimize(model, test_input,
                                    input_names = ['input'],
                                    output_names = ['output'],
                                    dynamic_axes={'input' : {0 : 'batch_size'},
                                                  'output' : {0 : 'batch_size'}})

That's it! Now we are ready to try it out. We're going to run both the original model and the Torchcelerate optimized model to check performance.

In [11]:
orig_starttime = time.time()
original_outputs = model.forward(test_input)
orig_endtime = time.time()
orig_time = orig_endtime - orig_starttime

opt_starttime = time.time()
outputs = optimized_model.forward(test_input)
opt_endtime = time.time()
opt_time = opt_endtime - opt_starttime

print("Original model took", orig_time, "seconds")
print("Optimized model took", opt_time, "seconds")
print("That's", int(100*(orig_time-opt_time)/orig_time), "% acceleration!")

Original model took 0.1371173858642578 seconds
Optimized model took 0.11565089225769043 seconds
That's 15 % acceleration!


We can also compare the outputs to make sure the results are acceptable after optimization

In [12]:
print(original_outputs)
print(outputs)

tensor([[-3.0301e+00,  1.5017e+00,  1.7229e+00,  5.9493e-01,  7.3447e-01,
          4.7310e-01, -4.8493e-01,  4.1464e-01,  1.2711e-01, -8.9616e-01,
         -2.0839e+00, -3.8615e-01,  1.0806e+00,  3.1364e-01, -9.9639e-02,
          8.7608e-01,  1.9981e+00, -9.7835e-01,  1.8209e-01,  1.9579e-01,
         -9.1019e-01,  1.1318e+00, -1.3006e+00, -1.4981e+00, -1.2248e+00,
          1.6257e+00,  1.0572e+00,  1.1614e+00,  1.3524e+00, -7.5889e-01,
          7.7599e-01, -6.4452e-01,  9.3460e-01, -2.8979e+00, -1.0220e-01,
         -3.6070e-01,  2.4523e-01,  9.3848e-01,  1.4830e+00,  6.6916e-01,
         -3.0603e-01,  2.0074e+00, -1.1785e+00, -1.4629e+00, -7.1329e-01,
         -5.9399e-01,  9.9804e-01, -1.3742e-01, -1.6901e+00,  6.1053e-01,
          2.5910e+00, -1.8619e+00, -1.1156e+00, -1.3316e+00, -6.2659e-01,
         -2.1345e+00, -2.3102e+00, -3.6217e+00, -5.1365e-01, -4.1392e-01,
         -1.8277e-01, -1.2753e+00,  1.0947e+00,  1.9954e+00,  5.4220e-01,
         -1.6181e+00,  9.8495e-01, -1.

Once you're happy with the optimized results, you can deploy your model by packaging it as a docker container.

If you are deploying the model to an edge devices or embedding it in an application where a container doesn't make sense, Torchcelerate makes it easy to save the model in a portable format.

In [13]:
optimized_model.serialize("model.onnx")

Now you can run it in Node.js, Java, C++, C#, WindowsML or even Python with a much smaller footprint.