Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to convert DINOv2 model to onnx, but error occur. #26790

Closed
PeterKim1 opened this issue Oct 13, 2023 · 1 comment
Closed

I want to convert DINOv2 model to onnx, but error occur. #26790

PeterKim1 opened this issue Oct 13, 2023 · 1 comment

Comments

@PeterKim1
Copy link

PeterKim1 commented Oct 13, 2023

Hi.

Thanks for your great works.

I want to use DINOv2 for segmentation task, so I try to use DINOv2 in HF, transformers.

I use https://github.com/NielsRogge/Transformers-Tutorials/blob/master/DINOv2/Train_a_linear_classifier_on_top_of_DINOv2_for_semantic_segmentation.ipynb <- this ipynb notebook.

If you run that ipynb notebook, than can make same model as me.

I need to convert this model to onnx, so I use these codes:

torch.onnx.export(model,
                  torch.randn(1, 3, 448, 448, device = 'cuda'),
                  './huggingface_DINOv2.onnx',
                  input_names = ['input_0'],
                  output_names = ['output_0'],
                  opset_version=11)

But this errors occur.

============= Diagnostic Run torch.onnx.export version 2.0.0+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================


TypeError Traceback (most recent call last)
Cell In[101], line 1
----> 1 torch.onnx.export(model,
2 torch.randn(1, 3, 448, 448, device = 'cuda'),
3 './huggingface_DINOv2.onnx',
4 input_names = ['input_0'],
5 output_names = ['output_0'],
6 opset_version=11)

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:506, in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
188 @_beartype.beartype
189 def export(
190 model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction],
(...)
206 export_modules_as_functions: Union[bool, Collection[Type[torch.nn.Module]]] = False,
207 ) -> None:
208 r"""Exports a model into ONNX format.
209
210 If model is not a :class:torch.jit.ScriptModule nor a
(...)
503 All errors are subclasses of :class:errors.OnnxExporterError.
504 """
--> 506 _export(
507 model,
508 args,
509 f,
510 export_params,
511 verbose,
512 training,
513 input_names,
514 output_names,
515 operator_export_type=operator_export_type,
516 opset_version=opset_version,
517 do_constant_folding=do_constant_folding,
518 dynamic_axes=dynamic_axes,
519 keep_initializers_as_inputs=keep_initializers_as_inputs,
520 custom_opsets=custom_opsets,
521 export_modules_as_functions=export_modules_as_functions,
522 )

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:1548, in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, onnx_shape_inference, export_modules_as_functions)
1545 dynamic_axes = {}
1546 _validate_dynamic_axes(dynamic_axes, model, input_names, output_names)
-> 1548 graph, params_dict, torch_out = _model_to_graph(
1549 model,
1550 args,
1551 verbose,
1552 input_names,
1553 output_names,
1554 operator_export_type,
1555 val_do_constant_folding,
1556 fixed_batch_size=fixed_batch_size,
1557 training=training,
1558 dynamic_axes=dynamic_axes,
1559 )
1561 # TODO: Don't allocate a in-memory string for the protobuf
1562 defer_weight_export = (
1563 export_type is not _exporter_states.ExportTypes.PROTOBUF_FILE
1564 )

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:1113, in _model_to_graph(model, args, verbose, input_names, output_names, operator_export_type, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size, training, dynamic_axes)
1110 args = (args,)
1112 model = _pre_trace_quant_model(model, args)
-> 1113 graph, params, torch_out, module = _create_jit_graph(model, args)
1114 params_dict = _get_named_param_dict(graph, params)
1116 try:

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:989, in _create_jit_graph(model, args)
984 graph = _C._propagate_and_assign_input_shapes(
985 graph, flattened_args, param_count_list, False, False
986 )
987 return graph, params, torch_out, None
--> 989 graph, torch_out = _trace_and_get_graph_from_model(model, args)
990 _C._jit_pass_onnx_lint(graph)
991 state_dict = torch.jit._unique_state_dict(model)

File /usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py:893, in _trace_and_get_graph_from_model(model, args)
891 prev_autocast_cache_enabled = torch.is_autocast_cache_enabled()
892 torch.set_autocast_cache_enabled(False)
--> 893 trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
894 model,
895 args,
896 strict=False,
897 _force_outplace=False,
898 _return_inputs_states=True,
899 )
900 torch.set_autocast_cache_enabled(prev_autocast_cache_enabled)
902 warn_on_static_input_change(inputs_states)

File /usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py:1268, in _get_trace_graph(f, args, kwargs, strict, _force_outplace, return_inputs, _return_inputs_states)
1266 if not isinstance(args, tuple):
1267 args = (args,)
-> 1268 outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
1269 return outs

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py:127, in ONNXTracedModule.forward(self, *args)
124 else:
125 return tuple(out_vars)
--> 127 graph, out = torch._C._create_graph_by_tracing(
128 wrapper,
129 in_vars + module_state,
130 _create_interpreter_name_lookup_fn(),
131 self.strict,
132 self._force_outplace,
133 )
135 if self._return_inputs:
136 return graph, outs[0], ret_inputs[0]

File /usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py:118, in ONNXTracedModule.forward..wrapper(*args)
116 if self._return_inputs_states:
117 inputs_states.append(_unflatten(in_args, in_desc))
--> 118 outs.append(self.inner(*trace_inputs))
119 if self._return_inputs_states:
120 inputs_states[0] = (inputs_states[0], trace_inputs)

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1488, in Module._slow_forward(self, *input, **kwargs)
1486 recording_scopes = False
1487 try:
-> 1488 result = self.forward(*input, **kwargs)
1489 finally:
1490 if recording_scopes:

Cell In[91], line 32, in Dinov2ForSemanticSegmentation.forward(self, pixel_values, output_hidden_states, output_attentions, labels)
28 def forward(self, pixel_values, output_hidden_states=False, output_attentions=False, labels=None):
29 #print(pixel_values.shape)
30 #print(labels.shape)
31 # use frozen features
---> 32 outputs = self.dinov2(pixel_values,
33 output_hidden_states=output_hidden_states,
34 output_attentions=output_attentions)
36 #print("outputs shape? :", outputs.shape)
37 #print("?? : ", type(outputs))
38 # get the patch embeddings - so we exclude the CLS token
39 patch_embeddings = outputs.last_hidden_state[:,1:,:]

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1488, in Module._slow_forward(self, *input, **kwargs)
1486 recording_scopes = False
1487 try:
-> 1488 result = self.forward(*input, **kwargs)
1489 finally:
1490 if recording_scopes:

File ~/.local/lib/python3.8/site-packages/transformers/models/dinov2/modeling_dinov2.py:645, in Dinov2Model.forward(self, pixel_values, bool_masked_pos, head_mask, output_attentions, output_hidden_states, return_dict)
638 # Prepare head mask if needed
639 # 1.0 in head_mask indicate we keep the head
640 # attention_probs has shape bsz x n_heads x N x N
641 # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
642 # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
643 head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
--> 645 embedding_output = self.embeddings(pixel_values, bool_masked_pos=bool_masked_pos)
647 encoder_outputs = self.encoder(
648 embedding_output,
649 head_mask=head_mask,
(...)
652 return_dict=return_dict,
653 )
654 sequence_output = encoder_outputs[0]

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py:1488, in Module._slow_forward(self, *input, **kwargs)
1486 recording_scopes = False
1487 try:
-> 1488 result = self.forward(*input, **kwargs)
1489 finally:
1490 if recording_scopes:

File ~/.local/lib/python3.8/site-packages/transformers/models/dinov2/modeling_dinov2.py:131, in Dinov2Embeddings.forward(self, pixel_values, bool_masked_pos)
128 embeddings = torch.cat((cls_tokens, embeddings), dim=1)
130 # add positional encoding to each token
--> 131 embeddings = embeddings + self.interpolate_pos_encoding(embeddings, height, width)
133 embeddings = self.dropout(embeddings)
135 return embeddings

File ~/.local/lib/python3.8/site-packages/transformers/models/dinov2/modeling_dinov2.py:106, in Dinov2Embeddings.interpolate_pos_encoding(self, embeddings, height, width)
104 patch_pos_embed = patch_pos_embed.reshape(1, int(math.sqrt(num_positions)), int(math.sqrt(num_positions)), dim)
105 patch_pos_embed = patch_pos_embed.permute(0, 3, 1, 2)
--> 106 patch_pos_embed = nn.functional.interpolate(
107 patch_pos_embed,
108 scale_factor=(height / math.sqrt(num_positions), width / math.sqrt(num_positions)),
109 mode="bicubic",
110 align_corners=False,
111 )
112 if int(height) != patch_pos_embed.shape[-2] or int(width) != patch_pos_embed.shape[-1]:
113 raise ValueError("Width or height does not match with the interpolated position embeddings")

File /usr/local/lib/python3.8/dist-packages/torch/nn/functional.py:3967, in interpolate(input, size, scale_factor, mode, align_corners, recompute_scale_factor, antialias)
3965 if antialias:
3966 return torch._C._nn._upsample_bicubic2d_aa(input, output_size, align_corners, scale_factors)
-> 3967 return torch._C._nn.upsample_bicubic2d(input, output_size, align_corners, scale_factors)
3969 if input.dim() == 3 and mode == "bilinear":
3970 raise NotImplementedError("Got 3D input, but bilinear mode needs 4D input")

TypeError: upsample_bicubic2d() received an invalid combination of arguments - got (Tensor, NoneType, bool, tuple), but expected one of:

  • (Tensor input, tuple of ints output_size, bool align_corners, tuple of floats scale_factors)
    didn't match because some of the arguments have invalid types: (Tensor, !NoneType!, bool, !tuple of (Tensor, Tensor)!)
  • (Tensor input, tuple of ints output_size, bool align_corners, float scales_h, float scales_w, *, Tensor out)

I don't know well about DINOv2's codes.

but what i understand about this error message is some codes need to modify.

https://github.com/huggingface/transformers/blob/main/src/transformers/models/dinov2/modeling_dinov2.py#L106:L111

 patch_pos_embed = nn.functional.interpolate
            patch_pos_embed,
            scale_factor=(height / math.sqrt(num_positions), width / math.sqrt(num_positions)),
            mode="bicubic",
            align_corners=False,
        )

I don't know if I'm right, but can you review the code snippets I pointed out?

Thanks.

@PeterKim1
Copy link
Author

Check This!

facebookresearch/dinov2#19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant