Crash on poolings with kernel volume >= 100 000 #937

drproktor · 2023-09-11T15:27:08Z

Description

Crash on poolings with larger-than-317 pool sizes.

We face the problem that we get a hard crash (SEGFAULT) when the user increases the image above a certain size. Even if there is a limit for the kernel size I would expect that an exception is thrown and not a hard crash. The problem stems from the unchecked return value here:

onnx-tensorrt/onnx2trt_utils.cpp

Line 1511 in 0462dc3

    
           nvinfer1::IPoolingLayer* poolingLayer = ctx->network()->addPoolingNd(*tensorPtr, type, kernel_size);

A minimal script to reproduce the error is given below.

This is duplicate of NVIDIA/TensorRT#2094 since the bug occurs in this repository's source code.

Environment

TensorRT Version: 8.6.1
ONNX-TensorRT Version / Branch: main
GPU Type: Any
Nvidia Driver Version: 535.86.05
CUDA Version: 11.7.99
CUDNN Version: 8.5.0.96
Operating System + Version: Ubuntu 22.04.3 LTS
Python Version (if applicable):
TensorFlow + TF2ONNX Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Steps To Reproduce

# Build network and export to ONNX 14
import torch

ksize = (317, 317)

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.pool = torch.nn.MaxPool2d(kernel_size=ksize)

    def forward(self, x):
        return self.pool(x)


x = torch.rand((1, 3, *ksize), dtype=torch.float32)
torch.onnx.export(Net().eval(), x, "output.onnx", opset_version=14)

# Check if the model is strictly valid
import onnx
onnx_model = onnx.load("output.onnx")
model = onnx.checker.check_model(onnx_model, full_check=True)

# Compile with TensorRT but get crashed.
import tensorrt as trt
builder = trt.Builder(trt.Logger(trt.Logger.WARNING))
network = builder.create_network(1 << (int)(
    trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 * 1 << 30)
parser = trt.OnnxParser(network, trt.Logger(trt.Logger.WARNING))
assert parser.parse(onnx._serialize(onnx_model))
builder.build_engine(network, config)

Output

============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

[09/11/2023-14:53:15] [TRT] [E] [network.cpp::addPoolingNd::1093] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addPoolingNd::1093, condition: allDimsGtEq(windowSize, 1) && volume(windowSize) < MAX_KERNEL_DIMS_PRODUCT(nbSpatialDims)
)
Segmentation fault (core dumped)

Origin

The text was updated successfully, but these errors were encountered:

Raise an exception in case an unsupported pooling operation occurs. Signed-off-by: Max Huber <maxh@mailbox.org>

drproktor mentioned this issue Sep 11, 2023

[Bug] crash on poolings with larger-than-317 pool sizes NVIDIA/TensorRT#2094

Closed

drproktor added a commit to drproktor/onnx-tensorrt that referenced this issue Sep 12, 2023

Fix pooling for large kernel sizes onnx#937

ecbd179

Raise an exception in case an unsupported pooling operation occurs. Signed-off-by: Max Huber <maxh@mailbox.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash on poolings with kernel volume >= 100 000 #937

Crash on poolings with kernel volume >= 100 000 #937

drproktor commented Sep 11, 2023 •

edited

Loading

Crash on poolings with kernel volume >= 100 000 #937

Crash on poolings with kernel volume >= 100 000 #937

Comments

drproktor commented Sep 11, 2023 • edited Loading

Description

Environment

Steps To Reproduce

Origin

drproktor commented Sep 11, 2023 •

edited

Loading