Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

While i was doing the inference with onnxruntime, i got this error: return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node. Name:'Slice_1214' Status Message: slice.cc:260 FillVectorsFromInput Starts must be a 1-D array #8735

Closed
SkylerZheng opened this issue Aug 13, 2021 · 21 comments
Labels
contributions welcome lower priority issues for the core ORT teams more info needed issues that cannot be triaged until more information is submitted by the original user

Comments

@SkylerZheng
Copy link

Describe the bug
A clear and concise description of what the bug is. To avoid repetition please make sure this is not one of the known issues mentioned on the respective release page.

Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • ONNX Runtime installed from (source or binary):
  • ONNX Runtime version:
  • Python version:
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

To Reproduce

  • Describe steps/code to reproduce the behavior.
  • Attach the ONNX model to the issue (where applicable) to expedite investigation.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

@SkylerZheng
Copy link
Author

Can anyone help to see how to fix this bug? My input to my pytorch model contains ndarray of shape (1,)

@ytaous ytaous added the more info needed issues that cannot be triaged until more information is submitted by the original user label Aug 13, 2021
@ytaous
Copy link
Contributor

ytaous commented Aug 13, 2021

Can you please provide more details?

  1. reproducible steps
  2. sample code/inputs
  3. system information above, etc.

@SkylerZheng
Copy link
Author

I firstly convert pytorch nodel to onnx model
Then I do onnx optimization with onnxoptimizer.optimize
Then I do the inference on the optimized onnx model. But I also did the inference with the onnx model before optimization. Both gave me the aforementioned error. I guess it's because the input shape problem?

This is the input to my pytorch model:
image

And this is the ort_inputs:
image

@SkylerZheng
Copy link
Author

I tried to use num_bbs and txt_lens with shape (1,) ended up with same error

@ytaous
Copy link
Contributor

ytaous commented Aug 13, 2021

Hi, it's still unclear to me which inputs is causing issue.
Per https://github.com/onnx/onnx/blob/master/docs/Operators.md#Slice - starts has to be 1-D.
You can save the onnx model and check (using netron tool) where the input for Slice_1214 is coming from in the graph.
To move further, it's better for you to provide more details on system info, and sample code/input for us to reproduce locally.
Thanks.

@SkylerZheng
Copy link
Author

SkylerZheng commented Aug 13, 2021

Hi, I checked the model graph with netron, I found the slice_1214, it's the red box in the following picture. Here txt_lens = [39], num_bbs=[23], what the start and end would be like for this case?

Screen Shot 2021-08-13 at 2 42 34 PM

@ytaous
Copy link
Contributor

ytaous commented Aug 13, 2021

Your starts would be 1677, output is 1681. If you click on concat below, it should show the input is 1681. Now you can click those 3 nodes above slice and find out what's 1677, and keep search upwards ...

@SkylerZheng
Copy link
Author

Can you help to check this graph picture, I still do not know how to solve the problem here. Starts must be 1-D array, what is the datatype or data shape of my starts here then?

image

@SkylerZheng
Copy link
Author

SkylerZheng commented Aug 13, 2021

And here is the pytorch code for the slice part.

############
def get_image_hidden(self, sequence_output, txt_lens, num_bbs):
"""
Extracting the img_hidden part from sequence_output.
Inputs:
- sequence_output: (n, txt_len+num_bb, hid_size)
- txt_lens : [txt_len]
- num_bbs : [num_bb]
Output:
- img_hidden : (n, max_num_bb, hid_size)
"""
outputs = []
max_bb = max(num_bbs)
hid_size = sequence_output.size(-1)
for seq_out, len
, nbb in zip(sequence_output.split(1, dim=0),
txt_lens, num_bbs):
img_hid = seq_out[:, len_:len_+nbb, :]
if nbb < max_bb:
img_hid = torch.cat(
[img_hid, self._get_pad(
img_hid, max_bb-nbb, hid_size)],
dim=1)
outputs.append(img_hid)
img_hidden = torch.cat(outputs, dim=0)
return img_hidden

############

@ytaous
Copy link
Contributor

ytaous commented Aug 13, 2021

Hi, the graph looks ok, it could be that at runtime, the actual data fed into the starts is not 1-D.

can you please click on the 3 nodes (1 split and 2 unsqueeze) above slice and see where 1677 is coming from? just to confirm.

I saw your post earlier ort inputs has [[23]] and [[39]] which is not consistent with torch inputs [23] and [39]. why the input shape are different? Have you tried with [23] and [39] as ort input since it has to be 1-D? (I thought you might have done so.)

Again, can u please share your system info, including torch/python version?

Also, will be great if you can reduce the model size and share a reproducible model.onnx with code and sample inputs, I can debug it further and see where it's broken.

Another option would be that you can debug the code yourself using vscode. In this case, you need to build onnxruntime src in debug mode.

@SkylerZheng
Copy link
Author

Have you tried with [23] and [39] as ort input since it has to be 1-D?----> I tried, ended up with the same error.

torch==1.9.0
python == 3.7
image

How should I share my onnx model with you?

@ytaous
Copy link
Contributor

ytaous commented Aug 13, 2021

yes, pls, and if you can provide a standalone code to repro the issue will be great.
thx

@SkylerZheng
Copy link
Author

I'm going to debug onnxruntime with the debug mode source code first. Thank you so much.

@ytaous
Copy link
Contributor

ytaous commented Aug 14, 2021

sounds good, if you have trouble setting it up, pls feel to ping back.

@SkylerZheng
Copy link
Author

Absolutely, thank you very much!

@ytaous
Copy link
Contributor

ytaous commented Aug 14, 2021

contribution is more than welcome if you actually find an issue and have a fix for it

@ytaous ytaous added the contributions welcome lower priority issues for the core ORT teams label Aug 14, 2021
@SkylerZheng
Copy link
Author

Hi, how to build onnxruntime from souce for debug? I git clone it, and use this to build "./build.sh --cmake_extra_defines onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1". I built successfully, but when I import onnxruntime, it shows no module named onnxruntime. Is it a path problem or did I do the build in the right way?

@ytaous
Copy link
Contributor

ytaous commented Aug 14, 2021

pls see - https://onnxruntime.ai/docs/how-to/build/inferencing.html
Debug mode would be "--config Debug" instead.

@tianleiwu
Copy link
Contributor

tianleiwu commented Aug 14, 2021

@SkylerZheng
I suggest to debug like the following:

  • disable optimizations one by one: (1) onnxoptimizer.optimize (2) Set onnxruntime session option graph optimization level to disable all.
    If the problem is resolved, then it is caused by optimizer or graph optimization.

  • try symbolic shape inference (python -m onnxruntime.tools.symbolic_shape_infer) on the onnx model. Then check the shape of problem input to see whether it is 1D. If not, it is likely caused by onnx exporting, and you could try change your modeling script and retry this step until the input is 1D.

@SkylerZheng
Copy link
Author

I think it's an error happened during the conversion from pytorch to onnx model. Here is an exception occured during the conversion.

[W shape_type_inference.cpp:419] Warning: Constant folding in symbolic shape inference fails: index_select(): Index is supposed to be a vector
Exception raised from index_select_out_cpu_ at /pytorch/aten/src/ATen/native/TensorAdvancedIndexing.cpp:758 (most recent call first):

And in the modeling script, the slicing happened here:

    for seq_out, len_, nbb in zip(sequence_output.split(1, dim=0),
                                  txt_lens, num_bbs):
        img_hid = seq_out[:, len_:len_+nbb, :]
        if nbb < max_bb:
            img_hid = torch.cat(
                    [img_hid, self._get_pad(
                        img_hid, max_bb-nbb, hid_size)],
                    dim=1)
        outputs.append(img_hid)

Could you tell me how to debug with onnx and onnxruntime? I built both of them from source. But I donot know how to debug with them under pycharm (I use python only)

@SkylerZheng
Copy link
Author

SkylerZheng commented Aug 16, 2021

Hi, I solved the problem. I simply changed the len and len+nbb to 1-d array manually with np.asarray, and now it works. Thank you very much for your help during this process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions welcome lower priority issues for the core ORT teams more info needed issues that cannot be triaged until more information is submitted by the original user
Projects
None yet
Development

No branches or pull requests

3 participants