Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on poolings with kernel volume >= 100 000 #937

Open
drproktor opened this issue Sep 11, 2023 · 0 comments
Open

Crash on poolings with kernel volume >= 100 000 #937

drproktor opened this issue Sep 11, 2023 · 0 comments

Comments

@drproktor
Copy link

drproktor commented Sep 11, 2023

Description

Crash on poolings with larger-than-317 pool sizes.

We face the problem that we get a hard crash (SEGFAULT) when the user increases the image above a certain size. Even if there is a limit for the kernel size I would expect that an exception is thrown and not a hard crash. The problem stems from the unchecked return value here:

nvinfer1::IPoolingLayer* poolingLayer = ctx->network()->addPoolingNd(*tensorPtr, type, kernel_size);

A minimal script to reproduce the error is given below.

This is duplicate of NVIDIA/TensorRT#2094 since the bug occurs in this repository's source code.

Environment

TensorRT Version: 8.6.1
ONNX-TensorRT Version / Branch: main
GPU Type: Any
Nvidia Driver Version: 535.86.05
CUDA Version: 11.7.99
CUDNN Version: 8.5.0.96
Operating System + Version: Ubuntu 22.04.3 LTS
Python Version (if applicable):
TensorFlow + TF2ONNX Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Steps To Reproduce

# Build network and export to ONNX 14
import torch

ksize = (317, 317)

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.pool = torch.nn.MaxPool2d(kernel_size=ksize)

    def forward(self, x):
        return self.pool(x)


x = torch.rand((1, 3, *ksize), dtype=torch.float32)
torch.onnx.export(Net().eval(), x, "output.onnx", opset_version=14)

# Check if the model is strictly valid
import onnx
onnx_model = onnx.load("output.onnx")
model = onnx.checker.check_model(onnx_model, full_check=True)

# Compile with TensorRT but get crashed.
import tensorrt as trt
builder = trt.Builder(trt.Logger(trt.Logger.WARNING))
network = builder.create_network(1 << (int)(
    trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 * 1 << 30)
parser = trt.OnnxParser(network, trt.Logger(trt.Logger.WARNING))
assert parser.parse(onnx._serialize(onnx_model))
builder.build_engine(network, config)

Output

============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

[09/11/2023-14:53:15] [TRT] [E] [network.cpp::addPoolingNd::1093] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/network.cpp::addPoolingNd::1093, condition: allDimsGtEq(windowSize, 1) && volume(windowSize) < MAX_KERNEL_DIMS_PRODUCT(nbSpatialDims)
)
Segmentation fault (core dumped)

Origin

drproktor added a commit to drproktor/onnx-tensorrt that referenced this issue Sep 12, 2023
Raise an exception in case an unsupported pooling
operation occurs.

Signed-off-by: Max Huber <maxh@mailbox.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant