I want to convert DINOv2 model to onnx, but error occur. #26790

PeterKim1 · 2023-10-13T11:33:08Z

Hi.

Thanks for your great works.

I want to use DINOv2 for segmentation task, so I try to use DINOv2 in HF, transformers.

I use https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DINOv2/Train_a_linear_classifier_on_top_of_DINOv2_for_semantic_segmentation.ipynb <- this ipynb notebook.

If you run that ipynb notebook, than can make same model as me.

I need to convert this model to onnx, so I use these codes:

torch.onnx.export(model,
                  torch.randn(1, 3, 448, 448, device = 'cuda'),
                  './huggingface_DINOv2.onnx',
                  input_names = ['input_0'],
                  output_names = ['output_0'],
                  opset_version=11)

But this errors occur.

============= Diagnostic Run torch.onnx.export version 2.0.0+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

TypeError Traceback (most recent call last)
Cell In[101], line 1
----> 1 torch.onnx.export(model,
2 torch.randn(1, 3, 448, 448, device = 'cuda'),
3 './huggingface_DINOv2.onnx',
4 input_names = ['input_0'],
5 output_names = ['output_0'],
6 opset_version=11)

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:506, in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
188 @_beartype.beartype
189 def export(
190 model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction],
(...)
206 export_modules_as_functions: Union[bool, Collection[Type[torch.nn.Module]]] = False,
207 ) -> None:
208 r"""Exports a model into ONNX format.
209
210 If model is not a :class:torch.jit.ScriptModule nor a
(...)
503 All errors are subclasses of :class:errors.OnnxExporterError.
504 """
--> 506 _export(
507 model,
508 args,
509 f,
510 export_params,
511 verbose,
512 training,
513 input_names,
514 output_names,
515 operator_export_type=operator_export_type,
516 opset_version=opset_version,
517 do_constant_folding=do_constant_folding,
518 dynamic_axes=dynamic_axes,
519 keep_initializers_as_inputs=keep_initializers_as_inputs,
520 custom_opsets=custom_opsets,
521 export_modules_as_functions=export_modules_as_functions,
522 )

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:1548, in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions)
1545 dynamic_axes = {}
1546 _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
-> 1548 graph, params_dict, torch_out = _model_to_graph(
1549 model,
1550 args,
1551 verbose,
1552 input_names,
1553 output_names,
1554 operator_export_type,
1555 val_do_constant_folding,
1556 fixed_batch_size=fixed_batch_size,
1557 training=training,
1558 dynamic_axes=dynamic_axes,
1559 )
1561 # TODO: Don't allocate a in-memory string for the protobuf
1562 defer_weight_export = (
1563 export_type is not _exporter_states.ExportTypes.PROTOBUF_FILE
1564 )

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:1113, in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
1110 args = (args,)
1112 model = _pre_trace_quant_model(model, args)
-> 1113 graph, params, torch_out, module = _create_jit_graph(model, args)
1114 params_dict = _get_named_param_dict(graph, params)
1116 try:

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:989, in _create_jit_graph(model, args)
984 graph = _C._propagate_and_assign_input_shapes(
985 graph, flattened_args, param_count_list, False, False
986 )
987 return graph, params, torch_out, None
--> 989 graph, torch_out = _trace_and_get_graph_from_model(model, args)
990 _C._jit_pass_onnx_lint(graph)
991 state_dict = torch.jit._unique_state_dict(model)

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:893, in _trace_and_get_graph_from_model(model, args)
891 prev_autocast_cache_enabled = torch.is_autocast_cache_enabled()
892 torch.set_autocast_cache_enabled(False)
--> 893 trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
894 model,
895 args,
896 strict=False,
897 _force_outplace=False,
898 _return_inputs_states=True,
899 )
900 torch.set_autocast_cache_enabled(prev_autocast_cache_enabled)
902 warn_on_static_input_change(inputs_states)

File /usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py:1268, in _get_trace_graph(f, args, kwargs, strict, _force_outplace, return_inputs, _return_inputs_states)
1266 if not isinstance(args, tuple):
1267 args = (args,)
-> 1268 outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
1269 return outs

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py:127, in ONNXTracedModule.forward(self, *args)
124 else:
125 return tuple(out_vars)
--> 127 graph, out = torch._C._create_graph_by_tracing(
128 wrapper,
129 in_vars + module_state,
130 _create_interpreter_name_lookup_fn(),
131 self.strict,
132 self._force_outplace,
133 )
135 if self._return_inputs:
136 return graph, outs[0], ret_inputs[0]

File /usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py:118, in ONNXTracedModule.forward..wrapper(*args)
116 if self._return_inputs_states:
117 inputs_states.append(_unflatten(in_args, in_desc))
--> 118 outs.append(self.inner(*trace_inputs))
119 if self._return_inputs_states:
120 inputs_states[0] = (inputs_states[0], trace_inputs)

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1488, in Module._slow_forward(self, *input, **kwargs)
1486 recording_scopes = False
1487 try:
-> 1488 result = self.forward(*input, **kwargs)
1489 finally:
1490 if recording_scopes:

Cell In[91], line 32, in Dinov2ForSemanticSegmentation.forward(self, pixel_values, output_hidden_states, output_attentions, labels)
28 def forward(self, pixel_values, output_hidden_states=False, output_attentions=False, labels=None):
29 #print(pixel_values.shape)
30 #print(labels.shape)
31 # use frozen features
---> 32 outputs = self.dinov2(pixel_values,
33 output_hidden_states=output_hidden_states,
34 output_attentions=output_attentions)
36 #print("outputs shape? :", outputs.shape)
37 #print("?? : ", type(outputs))
38 # get the patch embeddings - so we exclude the CLS token
39 patch_embeddings = outputs.last_hidden_state[:,1:,:]

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1488, in Module._slow_forward(self, *input, **kwargs)
1486 recording_scopes = False
1487 try:
-> 1488 result = self.forward(*input, **kwargs)
1489 finally:
1490 if recording_scopes:

File ~/.local/lib/python3.8/site-packages/transformers/models/dinov2/modeling_dinov2.py:645, in Dinov2Model.forward(self, pixel_values, bool_masked_pos, head_mask, output_attentions, output_hidden_states, return_dict)
638 # Prepare head mask if needed
639 # 1.0 in head_mask indicate we keep the head
640 # attention_probs has shape bsz x n_heads x N x N
641 # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
642 # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
643 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
--> 645 embedding_output = self.embeddings(pixel_values, bool_masked_pos=bool_masked_pos)
647 encoder_outputs = self.encoder(
648 embedding_output,
649 head_mask=head_mask,
(...)
652 return_dict=return_dict,
653 )
654 sequence_output = encoder_outputs[0]

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1488, in Module._slow_forward(self, *input, **kwargs)
1486 recording_scopes = False
1487 try:
-> 1488 result = self.forward(*input, **kwargs)
1489 finally:
1490 if recording_scopes:

File ~/.local/lib/python3.8/site-packages/transformers/models/dinov2/modeling_dinov2.py:131, in Dinov2Embeddings.forward(self, pixel_values, bool_masked_pos)
128 embeddings = torch.cat((cls_tokens, embeddings), dim=1)
130 # add positional encoding to each token
--> 131 embeddings = embeddings + self.interpolate_pos_encoding(embeddings, height, width)
133 embeddings = self.dropout(embeddings)
135 return embeddings

File ~/.local/lib/python3.8/site-packages/transformers/models/dinov2/modeling_dinov2.py:106, in Dinov2Embeddings.interpolate_pos_encoding(self, embeddings, height, width)
104 patch_pos_embed = patch_pos_embed.reshape(1, int(math.sqrt(num_positions)), int(math.sqrt(num_positions)), dim)
105 patch_pos_embed = patch_pos_embed.permute(0, 3, 1, 2)
--> 106 patch_pos_embed = nn.functional.interpolate(
107 patch_pos_embed,
108 scale_factor=(height / math.sqrt(num_positions), width / math.sqrt(num_positions)),
109 mode="bicubic",
110 align_corners=False,
111 )
112 if int(height) != patch_pos_embed.shape[-2] or int(width) != patch_pos_embed.shape[-1]:
113 raise ValueError("Width or height does not match with the interpolated position embeddings")

File /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:3967, in interpolate(input, size, scale_factor, mode, align_corners, recompute_scale_factor, antialias)
3965 if antialias:
3966 return torch._C._nn._upsample_bicubic2d_aa(input, output_size, align_corners, scale_factors)
-> 3967 return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors)
3969 if input.dim() == 3 and mode == "bilinear":
3970 raise NotImplementedError("Got 3D input, but bilinear mode needs 4D input")

TypeError: upsample_bicubic2d() received an invalid combination of arguments - got (Tensor, NoneType, bool, tuple), but expected one of:

(Tensor input, tuple of ints output_size, bool align_corners, tuple of floats scale_factors)
didn't match because some of the arguments have invalid types: (Tensor, !NoneType!, bool, !tuple of (Tensor, Tensor)!)
(Tensor input, tuple of ints output_size, bool align_corners, float scales_h, float scales_w, *, Tensor out)

I don't know well about DINOv2's codes.

but what i understand about this error message is some codes need to modify.

https://github.com/huggingface/transformers/blob/main/src/transformers/models/dinov2/modeling_dinov2.py#L106:L111

 patch_pos_embed = nn.functional.interpolate
            patch_pos_embed,
            scale_factor=(height / math.sqrt(num_positions), width / math.sqrt(num_positions)),
            mode="bicubic",
            align_corners=False,
        )

I don't know if I'm right, but can you review the code snippets I pointed out?

Thanks.

The text was updated successfully, but these errors were encountered:

PeterKim1 · 2023-10-13T13:16:39Z

Check This!

facebookresearch/dinov2#19

PeterKim1 closed this as completed Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want to convert DINOv2 model to onnx, but error occur. #26790

I want to convert DINOv2 model to onnx, but error occur. #26790

PeterKim1 commented Oct 13, 2023 •

edited

Loading

PeterKim1 commented Oct 13, 2023

I want to convert DINOv2 model to onnx, but error occur. #26790

I want to convert DINOv2 model to onnx, but error occur. #26790

Comments

PeterKim1 commented Oct 13, 2023 • edited Loading

PeterKim1 commented Oct 13, 2023

PeterKim1 commented Oct 13, 2023 •

edited

Loading