Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to convert DINOv2 to ONNX? #216

Open
PeterKim1 opened this issue Sep 15, 2023 · 13 comments
Open

How to convert DINOv2 to ONNX? #216

PeterKim1 opened this issue Sep 15, 2023 · 13 comments

Comments

@PeterKim1
Copy link

Hi. Thanks for your great works.

I want to convert dinov2 to onnx, but failed.

I try to refer #19 this Issue.

I apply #19 (comment) this, after that, #19 (comment) this error occur.

So, I try to apply #19 (comment) this, but error still occur.

Are there any guidelines for onnx converting?

I need to use this model quickly for semantic segmentation tasks.

Thanks.

@seddonm1
Copy link

changing this bit in vision_transformer.py on line 187 will allow export.

patch_pos_embed = nn.functional.interpolate(
    patch_pos_embed.reshape(1, int(math.sqrt(N)), int(math.sqrt(N)), dim).permute(0, 3, 1, 2),
    scale_factor=(float(w0 / math.sqrt(N)), float(h0 / math.sqrt(N))),
    mode="bicubic",
)

@dacquaviva
Copy link

dacquaviva commented Sep 25, 2023

thanks @seddonm1 fort the work around, it works for me for batch size 1. However, I am trying to have dynamic batch size, i am able to convert the model to ONNX with dynamic batch size, but when I load it I get error, anyone managed?

@barbolo
Copy link

barbolo commented Apr 1, 2024

I've exported class token + patch tokens: #167 (comment)

@barbolo
Copy link

barbolo commented Apr 6, 2024

I've tried to export to ONNX using dynamic input and output shapes. The model is exported and seems fine, however the ONNX model throws an exception during inference when the input is not the same as the input sample fed during the export.

For example, when I export a model with an input with shape [1, 3, 168, 168] (batch_size x C x H x W) the last hidden state (class token + patch token) has 145 features. When I try to use that model with an input with shape [1, 3, 112, 112 ] (which should output 65 features), the following exception is thrown:

[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Add node. Name:'/embeddings/Add' Status Message: /Users/runner/work/1/s/onnxruntime/core/providers/cpu/math/element_wise_ops.h:560 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 65 by 145

@WulongGuo
Copy link

WulongGuo commented Apr 24, 2024

I've tried to export to ONNX using dynamic input and output shapes. The model is exported and seems fine, however the ONNX model throws an exception during inference when the input is not the same as the input sample fed during the export.

For example, when I export a model with an input with shape [1, 3, 168, 168] (batch_size x C x H x W) the last hidden state (class token + patch token) has 145 features. When I try to use that model with an input with shape [1, 3, 112, 112 ] (which should output 65 features), the following exception is thrown:

[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Add node. Name:'/embeddings/Add' Status Message: /Users/runner/work/1/s/onnxruntime/core/providers/cpu/math/element_wise_ops.h:560 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 65 by 145

@barbolo , hello, I got the same error. Have you figured out how to solve this porblem?

@barbolo
Copy link

barbolo commented Apr 24, 2024

@WulongGuo no, I haven't. And I'm not sure there is a solution. I've seen other ViT like repositories with downloadable ONNX/OpenVINO models and all of them have fixed input shapes.

For my use case, I'm interested in reducing the inference time, so I've exported the model for the input shapes I'm using and I'm loading them in memory. This approach uses more memory, but the inference time is optimized.

@WulongGuo
Copy link

@barbolo ok, thanks for your reply. I would just use the fixed-input version.

@Zalways
Copy link

Zalways commented Apr 29, 2024

I've tried to export to ONNX using dynamic input and output shapes. The model is exported and seems fine, however the ONNX model throws an exception during inference when the input is not the same as the input sample fed during the export.

For example, when I export a model with an input with shape [1, 3, 168, 168] (batch_size x C x H x W) the last hidden state (class token + patch token) has 145 features. When I try to use that model with an input with shape [1, 3, 112, 112 ] (which should output 65 features), the following exception is thrown:

[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Add node. Name:'/embeddings/Add' Status Message: /Users/runner/work/1/s/onnxruntime/core/providers/cpu/math/element_wise_ops.h:560 void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 65 by 145

i also met this problem, but the model can inference with difference input shapes in python before exported, but when i exported the onnx model, the onnx model only can inference on the input that have same w&h, can't inference on other shapes

@Zalways
Copy link

Zalways commented Apr 29, 2024

i'll appreciate if you could solve this problem

@100rab-S
Copy link

@Zalways @barbolo

i also met this problem, but the model can inference with difference input shapes in python before exported, but when i exported the onnx model, the onnx model only can inference on the input that have same w&h, can't inference on other shapes

I'm facing the same issue. It is something with the nodes in the model. Although we make the image shape dimensions dynamic during export, those are somehow still static in the ONNX model. Then during inference, it throws the above-mentioned error. This is also reflected in the warnings while exporting to the ONNX model. Seems like we can only use static shapes (although the model can be exported with dynamic axes)!

The downside of this issue is we have to now downscale or upscale the images to a static shape 😢.
@patricklabatut I also tried exporting using the latest dynamo_export from Torch, but that failed too. Has anyone successfully exported a fully dynamic ONNX model?

/Users/sourabh/Library/Caches/pypoetry/virtualenvs/devit-KwOi2sgN-py3.8/lib/python3.8/site-packages/torch/onnx/utils.py:1548: OnnxExporterWarning: Exporting to ONNX opset version 19 is not supported. by 'torch.onnx.export()'. The highest opset version supported is 17. To use a newer opset version, consider 'torch.onnx.dynamo_export()'. Note that dynamo_export() is in preview. Please report errors with dynamo_export() as Github issues to https://github.com/pytorch/pytorch/issues.
warnings.warn(
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/patch_embed.py:72: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert H % patch_H == 0, f"Input image height {H} is not a multiple of patch height {patch_H}"
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/patch_embed.py:73: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert W % patch_W == 0, f"Input image width {W} is not a multiple of patch width: {patch_W}"
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:183: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if npatch == N and w == h:
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:195: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
sqrt_N = math.sqrt(N)
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:196: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
sx, sy = float(w0) / sqrt_N, float(h0) / sqrt_N
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:204: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert int(w0) == patch_pos_embed.shape[-2]
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:204: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert int(w0) == patch_pos_embed.shape[-2]
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:205: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert int(h0) == patch_pos_embed.shape[-1]
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:205: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert int(h0) == patch_pos_embed.shape[-1]

@Zalways
Copy link

Zalways commented Jun 5, 2024

@Zalways @barbolo

i also met this problem, but the model can inference with difference input shapes in python before exported, but when i exported the onnx model, the onnx model only can inference on the input that have same w&h, can't inference on other shapes

I'm facing the same issue. It is something with the nodes in the model. Although we make the image shape dimensions dynamic during export, those are somehow still static in the ONNX model. Then during inference, it throws the above-mentioned error. This is also reflected in the warnings while exporting to the ONNX model. Seems like we can only use static shapes (although the model can be exported with dynamic axes)!

The downside of this issue is we have to now downscale or upscale the images to a static shape 😢. @patricklabatut I also tried exporting using the latest dynamo_export from Torch, but that failed too. Has anyone successfully exported a fully dynamic ONNX model?

/Users/sourabh/Library/Caches/pypoetry/virtualenvs/devit-KwOi2sgN-py3.8/lib/python3.8/site-packages/torch/onnx/utils.py:1548: OnnxExporterWarning: Exporting to ONNX opset version 19 is not supported. by 'torch.onnx.export()'. The highest opset version supported is 17. To use a newer opset version, consider 'torch.onnx.dynamo_export()'. Note that dynamo_export() is in preview. Please report errors with dynamo_export() as Github issues to https://github.com/pytorch/pytorch/issues.
warnings.warn(
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/patch_embed.py:72: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert H % patch_H == 0, f"Input image height {H} is not a multiple of patch height {patch_H}"
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/layers/patch_embed.py:73: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert W % patch_W == 0, f"Input image width {W} is not a multiple of patch width: {patch_W}"
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:183: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if npatch == N and w == h:
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:195: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
sqrt_N = math.sqrt(N)
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:196: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
sx, sy = float(w0) / sqrt_N, float(h0) / sqrt_N
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:204: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert int(w0) == patch_pos_embed.shape[-2]
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:204: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert int(w0) == patch_pos_embed.shape[-2]
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:205: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert int(h0) == patch_pos_embed.shape[-1]
/Users/sourabh/.cache/torch/hub/facebookresearch_dinov2_main/dinov2/models/vision_transformer.py:205: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert int(h0) == patch_pos_embed.shape[-1]

have you solved this problem?

@100rab-S
Copy link

100rab-S commented Jun 9, 2024

@Zalways Nope.

@huangcj-code
Copy link

@100rab-S Hello, I've encountered a similar issue myself. I'm curious to know, would there be any adverse effects if I resize the images to match the static input size?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants