[Fix] PT - convert BF16 tensor to float before calling .numpy() #1342

chunyuan-w · 2023-10-10T02:04:11Z

.numpy() in PyTorch only supports limited scalar types: aten_to_numpy_dtype.
When running BF16 with autocast, an error will be thrown here when calling .numpy(): TypeError: Got unsupported ScalarType BFloat16.
Convert BF16 tensor to float before calling .numpy() to fix this error.

chunyuan-w · 2023-10-10T02:11:03Z

cc @jgong5, @Guobing-Chen, @leslie-fang-intel

jgong5 · 2023-10-10T05:17:56Z

doctr/models/detection/differentiable_binarization/pytorch.py

+        def need_conversion_to_float(dtype):
+            # pytorch: torch/csrc/utils/tensor_numpy.cpp:aten_to_numpy_dtype
+            return dtype in [torch.bfloat16]
+
+        numpy_dtype_converter = lambda x: x.float() if need_conversion_to_float(x.dtype) else x


directly checking dtype in [torch.bfloat16] is simpler?

Updated as suggested.

felixdittrich92

Hi @chunyuan-w @jgong5 👋

Thanks for the fix 👍

Some points:

We should add a function for the conversion in:
https://github.com/mindee/doctr/blob/main/doctr/models/utils/pytorch.py
and for TF in
https://github.com/mindee/doctr/blob/main/doctr/models/utils/tensorflow.py
because i expect we need this fix on multiple places:

doctr/doctr/models/detection/differentiable_binarization/pytorch.py

Line 215 in 50d65d7

    
           for preds in self.postprocessor(prob_map.detach().cpu().permute((0, 2, 3, 1)).numpy())

doctr/doctr/models/detection/differentiable_binarization/tensorflow.py

Line 251 in 50d65d7

    
           out["preds"] = [dict(zip(self.class_names, preds)) for preds in self.postprocessor(prob_map.numpy())]

doctr/doctr/models/detection/linknet/pytorch.py

Line 186 in 50d65d7

    
           for preds in self.postprocessor(prob_map.detach().cpu().permute((0, 2, 3, 1)).numpy())

doctr/doctr/models/detection/linknet/tensorflow.py

Line 238 in 50d65d7

    
           out["preds"] = [dict(zip(self.class_names, preds)) for preds in self.postprocessor(prob_map.numpy())]

Than a short test for the function in:
https://github.com/mindee/doctr/blob/main/tests/pytorch/test_models_utils_pt.py
and
https://github.com/mindee/doctr/blob/main/tests/tensorflow/test_models_utils_tf.py

Afterwards you can run
make style
make quality (sometimes it shows an typing issue in https://github.com/mindee/doctr/tree/main/doctr/models/artefacts which can be ignored)
make test-common
make test-torch
make test-tf

EDIT:

After double checking we need the conversion also for each recognition model (except CRNN)
e.g.:

doctr/doctr/models/recognition/vitstr/pytorch.py

Line 110 in 50d65d7

out["preds"] = self.postprocessor(decoded_features)

And for the detection models i suggest to convert directly the prob_map if needed
e.g.:

doctr/doctr/models/detection/differentiable_binarization/pytorch.py

Line 206 in 50d65d7

prob_map = torch.sigmoid(logits)

felixdittrich92 · 2023-10-11T14:01:52Z

@chunyuan-w see: #1344

In your PR you can do the same for PyTorch and we are fine to merge 🤗

chunyuan-w · 2023-10-11T15:09:10Z

@chunyuan-w see: #1344

In your PR you can do the same for PyTorch and we are fine to merge 🤗

Thanks for the reference. Let me further refine this PR following #1344.

felixdittrich92

Thanks for the fix @chunyuan-w 👍

Could please add a short comment that it fixes the issue in torchbench ? :)
@odulcy-mindee mypy fix applied in #1344

chunyuan-w · 2023-10-12T06:55:20Z

Thanks for the fix @chunyuan-w 👍

Could please add a short comment that it fixes the issue in torchbench ? :) @odulcy-mindee mypy fix applied in #1344

Thanks for merging it!
I just submitted a draft PR to torchbench to update the doctr version in torchbench to include this fix:
pytorch/benchmark#1979

felixdittrich92 · 2023-10-12T13:07:16Z

Thanks for the update 👍

Summary: Update the version of `doctr` to include the fix in mindee/doctr#1342 for BF16 mode. Remove the change of `rapidfuzz==2.15.1` in `requirements.txt` (#1555) since the version has been set in the model repo in the updated version (mindee/doctr#1176). Pull Request resolved: #1979 Reviewed By: aaronenyeshi Differential Revision: D50242780 Pulled By: xuzhao9 fbshipit-source-id: d8ed9164d463a1217114408106b2c745431bd159

convert tensor to float before calling .numpy()

1d1ab41

chunyuan-w changed the title ~~convert BF16 tensor to float before calling .numpy()~~ [Fix] convert BF16 tensor to float before calling .numpy() Oct 10, 2023

chunyuan-w marked this pull request as ready for review October 10, 2023 02:11

chunyuan-w mentioned this pull request Oct 10, 2023

[Inductor] [cpu][amp] Eager model failed to run for some torchbench models pytorch/pytorch#110841

Closed

jgong5 reviewed Oct 10, 2023

View reviewed changes

simplify dtype check

effeb06

felixdittrich92 requested changes Oct 10, 2023

View reviewed changes

felixdittrich92 added this to the 0.7.1 milestone Oct 10, 2023

chunyuan-w added 2 commits October 11, 2023 20:38

make numpy_dtype_converter a common util func

25e60bb

add fix for detection/linknet

7b49c20

felixdittrich92 mentioned this pull request Oct 11, 2023

[Fix] TF - add bf16 numpy dtype conversion #1344

Closed

felixdittrich92 added topic: text recognition Related to the task of text recognition and removed framework: tensorflow Related to TensorFlow backend labels Oct 11, 2023

chunyuan-w added 3 commits October 12, 2023 13:24

refine the code following mindee#1344

1545c17

remove redundant parentheses

1ef6c33

add fix for recognition models

11d7ea0

chunyuan-w changed the title ~~[Fix] convert BF16 tensor to float before calling .numpy()~~ [Fix] PT - convert BF16 tensor to float before calling .numpy() Oct 12, 2023

felixdittrich92 approved these changes Oct 12, 2023

View reviewed changes

felixdittrich92 merged commit 56c8356 into mindee:main Oct 12, 2023
67 of 68 checks passed

chunyuan-w mentioned this pull request Oct 12, 2023

update doctr to commit 56c8356 to fix BF16 mode pytorch/benchmark#1979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] PT - convert BF16 tensor to float before calling .numpy() #1342

[Fix] PT - convert BF16 tensor to float before calling .numpy() #1342

chunyuan-w commented Oct 10, 2023

chunyuan-w commented Oct 10, 2023

jgong5 Oct 10, 2023

chunyuan-w Oct 10, 2023

felixdittrich92 left a comment •

edited

Loading

felixdittrich92 commented Oct 11, 2023

chunyuan-w commented Oct 11, 2023

felixdittrich92 left a comment

chunyuan-w commented Oct 12, 2023 •

edited

Loading

felixdittrich92 commented Oct 12, 2023

[Fix] PT - convert BF16 tensor to float before calling .numpy() #1342

[Fix] PT - convert BF16 tensor to float before calling .numpy() #1342

Conversation

chunyuan-w commented Oct 10, 2023

chunyuan-w commented Oct 10, 2023

jgong5 Oct 10, 2023

Choose a reason for hiding this comment

chunyuan-w Oct 10, 2023

Choose a reason for hiding this comment

felixdittrich92 left a comment • edited Loading

Choose a reason for hiding this comment

felixdittrich92 commented Oct 11, 2023

chunyuan-w commented Oct 11, 2023

felixdittrich92 left a comment

Choose a reason for hiding this comment

chunyuan-w commented Oct 12, 2023 • edited Loading

felixdittrich92 commented Oct 12, 2023

felixdittrich92 left a comment •

edited

Loading

chunyuan-w commented Oct 12, 2023 •

edited

Loading