Bfloat16 tensor .numpy() support #90574

JulesGM · 2022-12-09T22:13:05Z

🚀 The feature, motivation and pitch

Numpy doesn't support bfloat16, and doesn't plan to do so. The effect of this is that code that makes any tensor.numpy() call breaks when you make it use bfloat16. I was thinking that bfloat16 getting outputted to np.float32 would make sense, as it just keeps the exponent and ads a few mantissa bits. This must be very quick. This would make all code that is supported with float32 or float16 be compatible with bfloat16 out of the box, and feels like reasonable behavior to me.

Additional context

The to_numpy function seems to be here https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/tensor_numpy.cpp#L159

and the function that decides the output np.dtype seems to be here:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/utils/tensor_numpy.cpp#L267

cc @mruberry @rgommers

The text was updated successfully, but these errors were encountered:

jingxu10 · 2022-12-12T08:28:22Z

Would you share more detailed information like which kind case motivated this feature?

rgommers · 2022-12-12T17:06:40Z

I'm not sure that that upcast is always the right thing to do. Arguably some users would want np.float32 for precision, and some others would want np.float16 for memory use. So probably raising an exception with an informative error message for bfloat16 is the appropriate thing to do.

JulesGM · 2022-12-12T17:48:00Z

I thought there could be a way to set the default conversion type. This could be the best of both worlds. Raise an exception if there isn't a default type, and otherwise, use it.

This would allow using frameworks that were built without bfloat16 in mind to work at all, like stable_baselines3.

JulesGM · 2022-12-12T17:51:05Z

Again, the situation is that a number of frameworks are written assuming that you can do tensor.numpy() at any time, which is true for everything but for bfloat16.

Implementing something like torch.default_bfloat16_numpy_type(torch.float32) would solve this problem in a very reasonably clean way.

vadimkantorov · 2022-12-12T23:35:26Z

one way is to have some default preserving precision (e.g. .numpy(upcast=True, downcast=False) or .numpy(dtype=None) / .numpy(dtype=torch.float32) / .numpy(dtype=torch.float16)) and then having the user to maintain/pass global state for passing to numpy(). This might work because numpy() isn't being called in a recursive way currently, so might uphold propagation of this global state onto user. Enabling existing libraries for bfloat16 might be an argument for adding torch global state, but in my experience it's better in the mid-term to inflict some changes than adding more global state

JulesGM · 2022-12-12T23:50:25Z

That sounds good to me. Having a way to set a global state for the conversion default (per dtype or for bfloat16 specifically) and then allowing the specification of output dtypes in .numpy() sounds reasonable. The part with the global state is what sounds more important to me now. @rgommers would be curious to know your thoughts on this.

rgommers · 2022-12-13T19:48:31Z

Again, the situation is that a number of frameworks are written assuming that you can do tensor.numpy() at any time, which is true for everything but for bfloat16.

For completeness: there are more important exceptions (non-cpu and requires-grad tensors). So it's code that is doing a_tensor.clone().cpu().numpy() (e.g here for stable_baselines).

Related: gh-36560 wanted to make that easier by adding a force keyword, and .numpy() already has that: Tensor.numpy docs.

An alternative here is to make force=True cast bfloat16 to float32. That would answer the request here without more special-case keywords.

This might work because numpy() isn't being called in a recursive way currently, so might uphold propagation of this global state onto user.

Global state is pretty painful to manage though, you start having to worry about things like multi-threading/processing.

If a user has to set anything at all, I don't think it matters if it's global state, or using .astype, or force=True. WDYT about using force=True?

JulesGM · 2022-12-13T21:24:24Z

The idea with a global call is to not have to modify the code of the other frameworks, that are calling .numpy without thinking it may not work. So adding a parameter to the `.numpy` call does not fix this. There seems to be machinery to do something related to this, as there is a global for the default tensor dtype. (Also, as of `1.13.0`, `.numpy(force=True)` doesn't work with bfloat16, by the way)

…

On Tue., Dec. 13, 2022, 2:48 p.m. Ralf Gommers, ***@***.***> wrote: Again, the situation is that a number of frameworks are written assuming that you can do tensor.numpy() at any time, which is true for everything but for bfloat16. For completeness: there are more important exceptions (non-cpu and requires-grad tensors). So it's code that is doing a_tensor.clone().cpu().numpy() (e.g here <https://github.com/DLR-RM/stable-baselines3/blob/6d55a09f810bc0d7d38ad04ade92f2b720308b58/stable_baselines3/common/buffers.py#L443> for stable_baselines). Related: gh-36560 <#36560> wanted to make that easier by adding a force keyword, and .numpy() already has that: Tensor.numpy docs <https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html#torch.Tensor.numpy> . An alternative here is to make force=True cast bfloat16 to float32. That would answer the request here without more special-case keywords. This might work because numpy() isn't being called in a recursive way currently, so might uphold propagation of this global state onto user. Global state is pretty painful to manage though, you start having to worry about things like multi-threading/processing. If a user has to set anything at all, I don't think it matters if it's global state, or using .astype, or force=True. WDYT about using force=True ? — Reply to this email directly, view it on GitHub <#90574 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYU34NNH2FDSZQITCGAIATWNDHJVANCNFSM6AAAAAASZZT6OU> . You are receiving this because you authored the thread.Message ID: ***@***.***>

rgommers · 2022-12-13T21:49:27Z

(Also, as of 1.13.0, .numpy(force=True) doesn't work with bfloat16, by the way)

That was my point - let's make it work by upcasting to float32.

The idea with a global call is to not have to modify the code of the other frameworks

To me that's not enough of a reason to add global state personally (the code that doesn't work can be improved instead), so I'm -0.5 on this one. It's not my decision though, so perhaps @mruberry or someone else can weigh in here.

vadimkantorov · 2022-12-13T21:54:16Z

That was my point - let's make it work by upcasting to float32.

(for extra more controls, given force=True, one might add another argument backup_dtype - but maybe should be only introduced if there're user requests)

crypdick · 2024-03-07T15:43:03Z

For any Googlers finding this discussion, the best workaround I've found so far is converting to fp32 manually:

if embeddings.dtype == torch.bfloat16:
    embeddings = embeddings.float()
embeddings = embeddings.numpy()

jingxu10 added the module: bfloat16 label Dec 12, 2022

samdow added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: numpy Related to numpy support, and also numpy compatibility of our operators labels Dec 12, 2022

ricardo-larosa mentioned this issue Dec 7, 2023

Fix unsupported ScalarType BFloat16 ml-explore/mlx-examples#10

Merged

explainerauthors mentioned this issue Jan 3, 2024

BUG: axlearn.common.utils.as_tensor calls .numpy() which doesn't work with bfloat16 apple/axlearn#271

Closed

Michael-F-Ellis mentioned this issue Apr 7, 2024

CombinedPipeline fails to accept bfloat16 image tensor as input huggingface/diffusers#7598

Closed

thiagocrepaldi added the onnx-needs-info needs information from the author / reporter before ONNX team can take action label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bfloat16 tensor .numpy() support #90574

Bfloat16 tensor .numpy() support #90574

JulesGM commented Dec 9, 2022 •

edited by pytorch-bot bot

jingxu10 commented Dec 12, 2022

rgommers commented Dec 12, 2022

JulesGM commented Dec 12, 2022 •

edited

JulesGM commented Dec 12, 2022

vadimkantorov commented Dec 12, 2022 •

edited

JulesGM commented Dec 12, 2022

rgommers commented Dec 13, 2022

JulesGM commented Dec 13, 2022 via email

rgommers commented Dec 13, 2022

vadimkantorov commented Dec 13, 2022

crypdick commented Mar 7, 2024

Bfloat16 tensor .numpy() support #90574

Bfloat16 tensor .numpy() support #90574

Comments

JulesGM commented Dec 9, 2022 • edited by pytorch-bot bot

🚀 The feature, motivation and pitch

Additional context

jingxu10 commented Dec 12, 2022

rgommers commented Dec 12, 2022

JulesGM commented Dec 12, 2022 • edited

JulesGM commented Dec 12, 2022

vadimkantorov commented Dec 12, 2022 • edited

JulesGM commented Dec 12, 2022

rgommers commented Dec 13, 2022

JulesGM commented Dec 13, 2022 via email

rgommers commented Dec 13, 2022

vadimkantorov commented Dec 13, 2022

crypdick commented Mar 7, 2024

JulesGM commented Dec 9, 2022 •

edited by pytorch-bot bot

JulesGM commented Dec 12, 2022 •

edited

vadimkantorov commented Dec 12, 2022 •

edited