Check duplicate issues.
Description
SOFIE generates broken code for Conv whenever the dilation attribute is greater than 1. The generated inference crashes with a segfault from an out-of-bounds write and the generated code uses a wrong (negative) intermediate output dimension.
The cause is that dilation gets applied twice. In Initialize/DoShapeInference of ROperator_Conv, fAttrKernelShape is overwritten with the dilation-expanded kernel size, k + (dilation - 1) * (k - 1). That expanded value is then passed to UTILITY::Im2col in Generate() as the kernel_h / kernel_w argument, while fAttrDilations is also passed as the dilation argument.
However, Im2col already applies dilation itself (it samples at kernel_row * dilation_h and computes the receptive field internally as dilation * (kernel_h - 1) + 1), so passing the already-expanded kernel together with the dilation double-counts it
For eg- a 3x3 kernel with dilation 2 the generated Im2col call becomes:
Im2col(..., 1, 7, 7, 5, 5, 0, 0, 1, 1, 2, 2, ...)
so kernel_h = kernel_w = 5 (already expanded) and dilation = 2. Im2col then computes output_h = (7 + 0 - (2 * (5 - 1) + 1)) / 1 + 1 = -1, a negative dimension. Its loop for (output_rows = output_h; output_rows; output_rows--) never reaches 0, so it writes far past the _xcol buffer (allocated for the correct 25 x 9 = 225 floats) and segfaults.
The bug only shows up with dilation > 1. Every Conv model in the test suite uses dilation 1, where the expansion k + (1 - 1) * (k - 1) = k is a no-op, so the wrong path is never exercised.
Expected behavior: generated Conv code should match the ONNX reference output for any valid dilation, the same way it already does for padding and strides.
Reproducer
Only ROOT (with SOFIE) and the onnx python package are needed.
Build a Conv model with dilation 2, 3x3 kernel, input 1x1x7x7 (no padding, unit stride):
# make_model.py
import onnx
from onnx import helper, TensorProto
X = helper.make_tensor_value_info("X", TensorProto.FLOAT, [1, 1, 7, 7])
Y = helper.make_tensor_value_info("Y", TensorProto.FLOAT, [1, 1, 3, 3])
W = helper.make_tensor("W", TensorProto.FLOAT, [1, 1, 3, 3],
[0.1 * i for i in range(1, 10)])
node = helper.make_node("Conv", ["X", "W"], ["Y"],
kernel_shape=[3, 3], strides=[1, 1],
pads=[0, 0, 0, 0], dilations=[2, 2])
m = helper.make_model(helper.make_graph([node], "conv_dilation", [X], [Y], [W]),
opset_imports=[helper.make_opsetid("", 13)])
onnx.checker.check_model(m)
onnx.save(m, "conv_dilation.onnx")
python3 make_model.py
Generate the SOFIE code:
// gen.C
#include "TMVA/RModelParser_ONNX.hxx"
#include "TMVA/RModel.hxx"
void gen() {
using namespace TMVA::Experimental::SOFIE;
RModel model = RModelParser_ONNX().Parse("conv_dilation.onnx");
model.Generate();
model.OutputGenerated("conv_dilation_generated.hxx");
}
root -l -b -q gen.C
Look at the generated im2col call:
grep -n "Im2col" conv_dilation_generated.hxx
Observed:
Im2col<float>(tensor_X + x_offset, 1, 7, 7, 5, 5, 0, 0, 1, 1, 2, 2, tensor_X_xcol);
The kernel size passed is 5, 5 (the dilation-expanded value) while dilation is also passed as 2, 2, so the dilation is effectively applied twice. With those arguments Im2col computes a negative output dimension, and running inference on the generated model segfaults.
ROOT version
ROOT 6.41.01
Installation method
Built from source
Operating system
Ubuntu 22.04.2 LTS
Additional context
No response
Check duplicate issues.
Description
SOFIE generates broken code for Conv whenever the dilation attribute is greater than 1. The generated inference crashes with a segfault from an out-of-bounds write and the generated code uses a wrong (negative) intermediate output dimension.
The cause is that dilation gets applied twice. In Initialize/DoShapeInference of ROperator_Conv,
fAttrKernelShapeis overwritten with the dilation-expanded kernel size, k + (dilation - 1) * (k - 1). That expanded value is then passed toUTILITY::Im2col in Generate()as the kernel_h / kernel_w argument, while fAttrDilations is also passed as the dilation argument.However, Im2col already applies dilation itself (it samples at kernel_row * dilation_h and computes the receptive field internally as dilation * (kernel_h - 1) + 1), so passing the already-expanded kernel together with the dilation double-counts it
For eg- a 3x3 kernel with dilation 2 the generated Im2col call becomes:
so kernel_h = kernel_w = 5 (already expanded) and dilation = 2. Im2col then computes
output_h = (7 + 0 - (2 * (5 - 1) + 1)) / 1 + 1 = -1, a negative dimension. Its loopfor (output_rows = output_h; output_rows; output_rows--)never reaches 0, so it writes far past the _xcol buffer (allocated for the correct 25 x 9 = 225 floats) and segfaults.The bug only shows up with dilation > 1. Every Conv model in the test suite uses dilation 1, where the expansion k + (1 - 1) * (k - 1) = k is a no-op, so the wrong path is never exercised.
Expected behavior: generated Conv code should match the ONNX reference output for any valid dilation, the same way it already does for padding and strides.
Reproducer
Only ROOT (with SOFIE) and the onnx python package are needed.
Build a Conv model with dilation 2, 3x3 kernel, input 1x1x7x7 (no padding, unit stride):
Generate the SOFIE code:
Look at the generated im2col call:
Observed:
The kernel size passed is 5, 5 (the dilation-expanded value) while dilation is also passed as 2, 2, so the dilation is effectively applied twice. With those arguments Im2col computes a negative output dimension, and running inference on the generated model segfaults.
ROOT version
ROOT 6.41.01
Installation method
Built from source
Operating system
Ubuntu 22.04.2 LTS
Additional context
No response