-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LocalFunction] Shape mismatch attempting to re-use buffer #17061
Comments
Most likely it is an ONNX shape inference bug. |
cc @jcwchen |
To clarify, will this issue be resolved by ONNX 1.14.1 (and also if ORT consumes ONNX 1.14.1 commit)? Or it is a new issue that needs to be fixed in ONNX? BTW, ONNX 1.14.1 is coming out soon and we should be able to consume ONNX 1.14.1 before next ORT release. |
This is new issue, the repro was from a local build of ORT w/ ONNX 1.14.1. |
Model does pass below shape inference checks without error, this is part of the repro provided. onnx.shape_inference.infer_shapes(
onnx_model, check_type=True, strict_mode=True, data_prop=True
)
onnx.checker.check_model(onnx_model, full_check=True) |
It doesn't mean the shape inference functions generate correct results. It didn't crash, but the results could be wrong. For example, if a matmul has two inputs: matrix A and matrix B. A has shape: [m,n]. B has shape [n,k], then the result should have shape [m,k]. But the shape inference functions wrongly thinks the output shape is [m,m], onnx checker would still pass. However, ONNX Runtime would allocate a wrong buffer for the output. |
I will take a look at it |
ONNX shape inference might overlook some issues coming from subgraph or local function due to onnx/onnx#5463. Fresh PR onnx/onnx#5488 should fix that and we can run shape inference again with that fix (and enable strict_mode) to ensure whether ONNX can catch it. |
@jcwchen Is this going to be in 1.14.1? |
No, because onnx/onnx#5488 hasn't been merged yet and ONNX 1.14.1 will probably be out this week. To workaround, if you want onnx shape inference be able to catch local function related issues, you can try https://pypi.org/project/onnx-weekly/ instead after onnx/onnx#5488 has been merged. |
I am actually debugging in C++ with ORT built with ONNX 1.14.1 |
IIUC, this ONNX issue (not honor check_type and strict_mode for local function) should be fine for ORT, because ORT does explicitly set check_type and strict_mode when using shape inference for location function:
The fix onnx/onnx#5488 is only solving the issue that ONNX's shape inference API suppresses local function related shape inference error. |
The inlined version of the model runs w/o any errors. Most of the time is spent on optimizations when enabled. |
Sounds like it should throw some errors, but it doesn't? If yes, there might be other unidentified issues/limitations for ONNX's function shape inference and onnx/onnx#5488 won't fix that I think. |
The non-inlined version does re-pro the error in the description. The inlining is done using onnx. The script also performes shape inferencing. I will try to run this without it and see what happens. |
The root cause of the problem has been identified. We are going to issue a temporary fix. Note. The debugging and testing was done against ONNX rel-1.14.1 branch. We are assuming this is what we are going to ship.
|
### Description Temporarily disable symbol tables. ### Motivation and Context Local symbol tables mark unrelated shapes re-use and cause inference to error out. #17061
### Description Temporarily disable symbol tables. ### Motivation and Context Local symbol tables mark unrelated shapes re-use and cause inference to error out. #17061
@yuslepukhin , can you comment on the fix for this issue? Has it been included in release 1.16.0? Thanks |
I believe so. It was approved for that. |
…#17267) ### Description Temporarily disable symbol tables. ### Motivation and Context Local symbol tables mark unrelated shapes re-use and cause inference to error out. microsoft#17061
Similar to #16813 (unsure if related), this issue occurs when running dynamo exporter produced model w/ local functions. The fully inlined model runs successfully.
NOTE that ORT must be built with ONNX v1.14.1 or above, otherwise segfault may occur during ONNX shape inference.
Example model and repro script can be found here: torchbench_hf_Bart.
The text was updated successfully, but these errors were encountered: