Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bf16 Support #4193

Merged
merged 1 commit into from May 25, 2022
Merged

Fix bf16 Support #4193

merged 1 commit into from May 25, 2022

Conversation

manbearian
Copy link
Contributor

@manbearian manbearian commented May 10, 2022

Fix bf16 Support
Fix bfloat16 support in helper and numpy_helper.py.

Motivation and Context
Without native bfloat16 support in numpy implementing bfloat16 support for ONNX is a bit complicated. Recently support was added by encoding bfloat16 as float16. this is not correct as this is a lossy encoding; it's also a bit confusing as a design and indeed broke some bespoke tools on my end that didn't expect the bfloat16 to be encoded as float16 in the ONNX file.

Open Questions

I've implemented some changes here that give rudimentary support (not completely broken) but i do have a two questions that i believe need to be answered before committing these changes.

  1. what is the proper binary encoding of bfloat16 in ONNX protobuf format (is this documented? should it be?)
  2. it appears that "raw" encoding and normal encoding for 16-bit values are of different sizes, is this intended?

Not implemented Changes

  1. support for big-endian encodings is not implemented in my code
  2. support for "correct" truncation of NaN payloads is not implemented when converting from f32->bf16 in my code

Addresses #4189

@manbearian manbearian requested a review from a team as a code owner May 10, 2022 16:20
@jcwchen jcwchen linked an issue May 10, 2022 that may be closed by this pull request
onnx/helper.py Outdated Show resolved Hide resolved
@manbearian
Copy link
Contributor Author

How can we move forward on this? I have some time this week to continue to hack on this if there's more folks would like to see.

@gramalingam
Copy link
Contributor

It would be great if we could make this work for both big-endian and small-endian. Is that hard? Is there some reference for the transformation you are using? I am trying to understand why trunction of 32-bit to 16-bit is incorrect.

onnx/helper.py Outdated Show resolved Hide resolved
@manbearian
Copy link
Contributor Author

manbearian commented May 20, 2022

It would be great if we could make this work for both big-endian and small-endian. Is that hard? Is there some reference for the transformation you are using? I am trying to understand why trunction of 32-bit to 16-bit is incorrect.

Endianness makes by brain hurt, but i'll see if i can work through the pain and implement it. :) (i was kind of hoping someone would say it isn't needed ;)

See my other reply as to why i chose rounding over truncation for converting from fp32 to bf16 and a pointer to the pytorch bfloat16 code which is a correct implementation that i drew inspiration from.

@manbearian manbearian force-pushed the dev/ianb/bf16 branch 2 times, most recently from 0ea5952 to eb78e05 Compare May 20, 2022 21:19
@manbearian
Copy link
Contributor Author

NaN is now implemented correctly (my previous implementation could potentially convert NaN to infinity)
Big-Endian is now 'supported'... at least i believe i have the right implementation, but i have no way to test.

@manbearian
Copy link
Contributor Author

@gramalingam and @souptc thoughts?

@souptc
Copy link
Contributor

souptc commented May 23, 2022

i see. i think it is correct. I actually wondering how does our cpu/gpu EP handle it.

@manbearian

This comment was marked as resolved.

onnx/numpy_helper.py Outdated Show resolved Hide resolved
onnx/numpy_helper.py Outdated Show resolved Hide resolved
onnx/numpy_helper.py Outdated Show resolved Hide resolved
onnx/test/helper_test.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented May 24, 2022

This pull request introduces 2 alerts when merging 9e0d884 into 28ed7f5 - view on LGTM.com

new alerts:

  • 1 for Use of the return value of a procedure
  • 1 for Unused local variable

@jcwchen jcwchen added this to the 1.12 milestone May 24, 2022
@gramalingam
Copy link
Contributor

Nit: if we decide that rounding is the right behavior instead of truncating, it would be good to fix the "Cast" implementation in the test-case generator also.

@manbearian
Copy link
Contributor Author

manbearian commented May 24, 2022

Nit: if we decide that rounding is the right behavior instead of truncating, it would be good to fix the "Cast" implementation in the test-case generator also.

@gramalingam
i'm looking at this code now, and suspect i see how it can be changed, but i'm not sure how it gets used. Are there any pointers to how it works?

@manbearian
Copy link
Contributor Author

i see. i think it is correct. I actually wondering how does our cpu/gpu EP handle it.

do you have a pointer to where the code might be, i'm happy to take a look.

@jcwchen
Copy link
Member

jcwchen commented May 24, 2022

Nit: if we decide that rounding is the right behavior instead of truncating, it would be good to fix the "Cast" implementation in the test-case generator also.

@gramalingam i'm looking at this code now, and suspect i see how it can be changed, but i'm not sure how it gets used. Are there any pointers to how it works?

https://github.com/onnx/onnx/blob/main/onnx/backend/test/case/node/cast.py This file is used to create the node test model (For instance, https://github.com/onnx/onnx/tree/main/onnx/backend/test/data/node/test_cast_FLOAT_to_BFLOAT16). You can update the code with correct behavior in cast.py first and then use tools/update_doc.sh to regenerate related node test models.

@jcwchen
Copy link
Member

jcwchen commented May 24, 2022

i see. i think it is correct. I actually wondering how does our cpu/gpu EP handle it.

do you have a pointer to where the code might be, i'm happy to take a look.

Probably here. However, different EPs seem to have different behaviors. We are trying to reach out the code owner to understand the inconsistency and which behavior is correct.

@manbearian manbearian requested a review from a team as a code owner May 24, 2022 23:36
@manbearian
Copy link
Contributor Author

@jwchen and @gramalingam i updated cast.py/castlike.py to use rounding for f32->bf16. Let me know if this is what you were looking for.

I was expecting to see some tests fail, but i believe the tests aren't checking for exact values, so the single bit difference isn't detectable.

@gramalingam
Copy link
Contributor

I suspect that the checkin for test-data for ops other than cast / cast-like are spurious, I don't think they should change?

@jcwchen
Copy link
Member

jcwchen commented May 25, 2022

Thank you @manbearian for the quick update! As @gramalingam mentioned, please remove other irrelevant operators' update (like updating output.pb due to different numpy.random behaviors by different machines). tools/update_doc.sh will update every node test data and we should only take Cast/CastLike related tests.

I was expecting to see some tests fail, but i believe the tests aren't checking for exact values, so the single bit difference isn't detectable.

You are right -- these models/input.pb/output.pb just exactly follow what onnx/backend/test/case/node/[operator_name].py defines. The CI only checks that whether the uploaded model can be reproduced by CI environments.

Signed-off-by: Ian Bearman <ianb@microsoft.com>
@manbearian manbearian force-pushed the dev/ianb/bf16 branch 2 times, most recently from eac0d62 to ee27388 Compare May 25, 2022 16:42
@manbearian
Copy link
Contributor Author

after talking with @gramalingam offline we decided to drop the CAST/CASTLIKE changes for now. I have them stored in a branch if we want to bring them back. There was also a suggestion that the f32->bf16 helper support both rounding and truncation modes. I can add support for that with a future PR if there is interest (please open an issue and assign it to me).

@gramalingam
Copy link
Contributor

Thanks Ian! The rationale for our decision was the question about what Cast should do (rounding or truncation) is separate (it involves a tradeoff between efficiency and precision) and will require more time to get consensus. However, the helper function make_tensor is not as critical (users can round or truncate as they wish), but it is important to fix the error in make_tensor's current handling for this release.

@gramalingam gramalingam merged commit fdd8902 into onnx:main May 25, 2022
broune pushed a commit to broune/onnx that referenced this pull request May 6, 2023
Signed-off-by: Ian Bearman <ianb@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Encoding BFLOAT16 Constant to ONNX Fails
4 participants