Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure when not using FSDP mixed precision #266

Open
schmidt-ai opened this issue Oct 13, 2023 · 1 comment
Open

Failure when not using FSDP mixed precision #266

schmidt-ai opened this issue Oct 13, 2023 · 1 comment

Comments

@schmidt-ai
Copy link

When training without providing the mixed_precision argument to FSDP, there is an error related to dtype mismatch in dinov2/layers/block.py. Is this expected?

Full stacktrace:

File "/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl | Link
-- | --
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: return forward_call(*args, **kwargs) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: File "/.venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: output = self._fsdp_wrapped_module(*args, **kwargs) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: return forward_call(*args, **kwargs) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: ret = self.forward_features(*args, **kwargs) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: File "/.venv/lib/python3.10/site-packages/dinov2/models/vision_transformer.py", line 207, in forward_features_list | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: x = blk(x) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: File "/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: File "/.venv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 748, in forward | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: output = self._fsdp_wrapped_module(*args, **kwargs) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: File "/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: File "/.venv/lib/python3.10/site-packages/dinov2/layers/block.py", line 258, in forward | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: return self.forward_nested(x_or_x_list) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: File "/.venv/lib/python3.10/site-packages/dinov2/layers/block.py", line 226, in forward_nested | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: x_list = drop_add_residual_stochastic_depth_list( | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: File "/.venv/lib/python3.10/site-packages/dinov2/layers/block.py", line 200, in drop_add_residual_stochastic_depth_list | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: attn_bias, x_cat = get_attn_bias_and_cat(x_list, branges) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: File "/.venv/lib/python3.10/site-packages/dinov2/layers/block.py", line 180, in get_attn_bias_and_cat | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: cat_tensors = index_select_cat([x.flatten(1) for x in x_list], branges).view(1, -1, x_list[0].shape[-1]) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: return _IndexSelectCat.apply(*sources, *indices) | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]: IndexSelect.OPERATOR( | Link
  |   | 2023-10-12T17:54:11.976-06:00 | [3]:RuntimeError: Expected output.scalar_type() == at::ScalarType::Half to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
@qasfb
Copy link
Contributor

qasfb commented Oct 13, 2023

Can you try with this ? qasfb-patch-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants