New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
❓ [Question] Runtimes for timm + TensorRT #1860
Comments
Hello - thanks for the detailed results - to answer the questions:
total_runtime = 0
for _ in range(n_runs):
with torch.no_grad():
start = time()
model(inputs)
torch.cuda.synchronize()
end = time()
total_runtime += end - start
avg_runtime = total_runtime/n_runs
I will evaluate the key models you have mentioned on our current |
Hi @gs-olive, Thanks for your detailed answer. I updated the script and tried I will try ONNX + TensorRT and give you feedback. |
Following this blog post, I created a new script to use
The results are the ones reported directly by I checked I get similar results using """
Script to benchmark a model using ONNX export + TensorRT
To run this script using the latest pytorch docker image, save it into a directory (DIR) and run:
docker run --gpus all --rm --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --volume $DIR:/app nvcr.io/nvidia/pytorch:23.04-py3 /bin/bash -c "pip install --pre timm onnx onnxruntime onnx-simplifier && python /app/benchmark.py"
"""
import os
import warnings
import time
from tempfile import TemporaryDirectory
import torch
import torch_tensorrt # import required to load the .ts model
import timm
import onnx
from onnxsim import simplify
# Parameters
n_warmups = 5
n_runs = 100
opset_version = 18
# Load model
model_name = 'resnet50'
shape = (16, 3, 224, 224)
model = timm.create_model(model_name, exportable=True)
model.eval().cuda().half()
with TemporaryDirectory() as tmpdir:
name = lambda ext: f'{tmpdir}/{model_name}.{ext}'
# 1. Compile model using ONNX export + TensorRT
# Export to ONNX
with torch.inference_mode(), torch.autocast("cuda"):
inputs = torch.randn(*shape, dtype=torch.half, device='cuda')
torch.onnx.export(model, inputs, name('onnx'), export_params=True, opset_version=opset_version,
do_constant_folding=True, input_names = ['input_0'], output_names = ['output_0'])
# Simplify using onnx-simplifier
model = onnx.load(name('onnx'))
simplified_model, check = simplify(model)
if not check:
warnings.warn('Simplified ONNX model could not be validated, using original ONNX model')
else:
onnx.save(simplified_model, name('onnx'))
# Convert to TensorRT using default settings
os.system(f'trtexec --onnx={name("onnx")} --saveEngine={model_name}.trt --fp16')
exit() # The command below is not working yet
os.system(f'torchtrtc {name("trt")} {name("ts")} --embed-engine --device-type=gpu')
# 2. Get runtime
model = torch.jit.load(name('ts'))
model.eval().half().cuda()
inputs = torch.randn(*shape, dtype=torch.half, device='cuda')
# Warmup
for _ in range(n_warmups):
with torch.no_grad():
model(inputs)
torch.cuda.synchronize()
# Benchmark
runtimes = []
for _ in range(n_runs):
with torch.no_grad():
start = time()
model(inputs)
torch.cuda.synchronize()
runtimes.append(time() - start)
# Print result
print('*' * 80)
print(f"Average: {1000*sum(runtimes)/n_runs:.2f}ms")
print('*' * 80) |
I just noticed the
|
For the |
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days |
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days |
@SimJeg hi, did you manage to figure out the reason for the slower runtime for convnext when using TRT? |
❓ Question
I created a script to compare inference runtimes with
torch
,torch.compile
andtorch_tensorrt.compile
for any timm model, input shape and dtype and some runtimes are worse using TensorRT, why ?What you have already tried
I used latest NVIDIA pytorch container(
nvcr.io/nvidia/pytorch:23.04-py3
, released today) on a g5.2xlarge AWS instance (A10g GPU). You can find the script (benchmark.py
) at the end of this issue and the command used to run it below :with
$DIR
the path to the directory where I saved the script. Here are a few results :(error° :
Expected input tensors to have type Half, found type float
, maybe some forcing on Layernorm layers is applied and I should enable mixed precision somehow ?)Everything goes well for the resnet50 model but for the convnext_large and vit models the
torch_tensorrt.compile
option get lower throughput and even fail in one case. And of course these models are the ones I am interested in 😅Several questions :
I can provide more details if needed (e.g. stack track),
Thanks for your help and support,
Simon
The text was updated successfully, but these errors were encountered: