[ONNX] Add initial support for FP8 ONNX export #107962

abock · 2023-08-25T15:30:08Z

This PR resurrects @tcherckez-nvidia's #106379 with changes to resolve conflicts against newer main and defines our own constants for the new ONNX types to avoid breaking Meta's internal usage of an old ONNX.

::torch::onnx::TensorProto_DataType_FLOAT8E4M3FN=17
::torch::onnx::TensorProto_DataType_FLOAT8E5M2=19

pytorch-bot · 2023-08-25T15:30:10Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107962

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

A100 runners down: apt-get install nvidia-docker2, Could not get lock /var/lib/dpkg/lock-frontend

❌ 1 New Failure, 2 Unrelated Failures

As of commit 3f8fb7a with merge base c68d0a7 ():

NEW FAILURE - The following job has failed:

linux-focal-py3-clang9-android-ndk-r19c-gradle-custom-build-single / build-and-test (default, 1, 1, linux.2xlarge) (gh)

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

abock · 2023-08-25T15:31:17Z

@justinchuby please review as the conflicts were with #107829. Supporting FP8 here means opset 19 in TS.

test/onnx/onnx_test_common.py

justinchuby · 2023-08-25T15:41:01Z

test/onnx/test_op_consistency.py

Should we test opset18? We don’t support it in onnx.export

We don't until we add helper functions inputs/attributes changes into symbolic_opset*.py. I think CI would break if we bump it in this PR.

What do you mean opset 18 is not supported by the onnx exporter? [col2im op]
(https://github.com/pytorch/pytorch/blob/main/torch/onnx/symbolic_opset18.py) was implemented at symbolic_opset18.py, for example. This is the onnxruntime unit test for it

Maybe some tests need to be skipped for opset 18, but col2im is an op needed by several models out there, including internal customers

(#107829) So opset18 support is dependent on the various Reduce* ops being updated because its axis attribute was promoted to input. There is a good number of ops using reduce directly or implicitly, and fixing all of them would not be realistic for 2.1, if at all. To avoid confusions for users we say opset18 is not supported, so they don't export a model to opset18 only to find more errors down the road. However, users can still choose to ignore the warning if they know what they are doing (e.g. when they need to use col2im)

torch/onnx/_constants.py

Add support for ONNX_NAMESPACE::TensorProto_DataType_FLOAT8E5M2 and ONNX_NAMESPACE::TensorProto_DataType_FLOAT8E4M3FN to enable export of torch models that use FP8 (E4M3 and E5M2) to ONNX (opset 19)

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

…rted yet on ORT

Define constants for the FP8 ONNX types to avoid breaking Meta's internal usage of ONNX which pre-dates 1.14 and thus does not support FLOAT8 types. - TensorProto_DataType_FLOAT8E4M3FN=17 - TensorProto_DataType_FLOAT8E5M2=19 cf. #106379 (comment)

facebook-github-bot · 2023-08-26T02:21:18Z

@kit1980 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kit1980 · 2023-08-26T02:21:54Z

I've imported this to verify the Meta-internal builds.

justinchuby · 2023-09-08T17:29:52Z

@kit1980 do you need to import again? Thanks!

justinchuby · 2023-09-08T18:26:03Z

@pytorchbot merge -i

pytorchmergebot · 2023-09-08T18:27:58Z

Merge started

Your change will be merged while ignoring the following 3 checks: pull / linux-focal-py3-clang9-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test (default, 1, 1, linux.2xlarge), pull / linux-focal-py3-clang9-android-ndk-r19c-gradle-custom-build-single / build-and-test (default, 1, 1, linux.2xlarge), Meta Internal-Only Changes Check

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-08T19:54:43Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm5.6-py3.8 / test (default, 3, 3, linux.rocm.gpu)

Details for Dev Infra team

Raised by workflow job

justinchuby · 2023-09-08T20:05:08Z

@pytorchbot merge -i

pytorchmergebot · 2023-09-08T20:06:59Z

Merge started

Your change will be merged while ignoring the following 2 checks: trunk / linux-focal-rocm5.6-py3.8 / test (default, 3, 3, linux.rocm.gpu), Meta Internal-Only Changes Check

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-08T20:27:25Z

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

justinchuby · 2023-09-08T20:37:07Z

@pytorchbot merge -i

pytorchmergebot · 2023-09-08T20:40:32Z

Merge started

Your change will be merged while ignoring the following 4 checks: pull / linux-focal-py3-clang9-android-ndk-r19c-gradle-custom-build-single / build-and-test (default, 1, 1, linux.2xlarge), pull / linux-focal-py3-clang9-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test (default, 1, 1, linux.2xlarge), trunk / linux-focal-rocm5.6-py3.8 / test (default, 3, 3, linux.rocm.gpu), Meta Internal-Only Changes Check

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

justinchuby · 2023-09-09T01:04:13Z

torch/csrc/jit/passes/onnx/scalar_type_analysis.cpp

        {c10::kDouble, 11},
        {c10::kQInt8, 12},
        {c10::kQUInt8, 13},
        {c10::kQInt32, 14},


Is this right?

@BowenBao looks like the enums are wrong for qint8/bfloat16 etc. But that's ok for the release because we don't need this pass for the dynamo exporter.

@tcherckez-nvidia

This PR resurrects @tcherckez-nvidia's #106379 with changes to resolve conflicts against newer `main` and defines our own constants for the new ONNX types to [avoid breaking Meta's internal usage of an old ONNX](#106379 (comment)). - `::torch::onnx::TensorProto_DataType_FLOAT8E4M3FN=17` - `::torch::onnx::TensorProto_DataType_FLOAT8E5M2=19` Pull Request resolved: #107962 Approved by: https://github.com/justinchuby, https://github.com/titaiwangms

justinchuby · 2023-09-09T04:49:57Z

[ONNX] Add initial support for FP8 ONNX export (#107962) #108939 for the 2.1 release

@tcherckez-nvidia

This PR resurrects @tcherckez-nvidia's #106379 with changes to resolve conflicts against newer `main` and defines our own constants for the new ONNX types to [avoid breaking Meta's internal usage of an old ONNX](#106379 (comment)). - `::torch::onnx::TensorProto_DataType_FLOAT8E4M3FN=17` - `::torch::onnx::TensorProto_DataType_FLOAT8E5M2=19` Pull Request resolved: #107962 Approved by: https://github.com/justinchuby, https://github.com/titaiwangms Co-authored-by: Aaron Bockover <abock@microsoft.com>

abock requested review from justinchuby and kit1980 August 25, 2023 15:30

abock requested review from BowenBao, thiagocrepaldi and wschin as code owners August 25, 2023 15:30

pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Aug 25, 2023

pytorchbot added the open source label Aug 25, 2023

abock mentioned this pull request Aug 25, 2023

Add initial support for FP8 ONNX export #106379

Closed

justinchuby approved these changes Aug 25, 2023

View reviewed changes

justinchuby added this to the 2.1.0 milestone Aug 25, 2023

abock self-assigned this Aug 25, 2023

justinchuby reviewed Aug 25, 2023

View reviewed changes

test/onnx/onnx_test_common.py Outdated Show resolved Hide resolved

justinchuby reviewed Aug 25, 2023

View reviewed changes

titaiwangms approved these changes Aug 25, 2023

View reviewed changes

titaiwangms added module: onnx Related to torch.onnx topic: new features topic category labels Aug 25, 2023

BowenBao reviewed Aug 25, 2023

View reviewed changes

torch/onnx/_constants.py Outdated Show resolved Hide resolved

tcherckez-nvidia and others added 6 commits August 25, 2023 19:04

Add initial support for FP8 ONNX export

04dc723

Add support for ONNX_NAMESPACE::TensorProto_DataType_FLOAT8E5M2 and ONNX_NAMESPACE::TensorProto_DataType_FLOAT8E4M3FN to enable export of torch models that use FP8 (E4M3 and E5M2) to ONNX (opset 19)

Add a simple cast test per dtype

cea781a

Adding checks to the tests

c364d2a

Linting

bd97b88

Update torch/csrc/jit/serialization/export.cpp

4608a7f

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

Disabling tests which use ORT to run with opset 19, which isn't suppo…

e07634c

…rted yet on ORT

abock force-pushed the abock/onnx-fp8 branch from 16bf234 to 29ea3a5 Compare August 25, 2023 19:04

abock force-pushed the abock/onnx-fp8 branch from 29ea3a5 to 7cd274e Compare August 25, 2023 19:06

justinchuby self-assigned this Sep 8, 2023

justinchuby added 2 commits September 8, 2023 10:23

Format

730bfb8

Apply comments

b1e6406

Use ONNX_TORCHSCRIPT_EXPORTER_MAX_OPSET where appropriate

3f8fb7a

justinchuby added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 8, 2023

pytorchmergebot added the merging label Sep 8, 2023

pytorchmergebot removed the merging label Sep 8, 2023

pytorchmergebot added the merging label Sep 8, 2023

pytorchmergebot removed the merging label Sep 8, 2023

pytorchmergebot added the merging label Sep 8, 2023

pytorchmergebot added Merged and removed merging labels Sep 8, 2023

pytorchmergebot closed this in bd12294 Sep 8, 2023

justinchuby reviewed Sep 9, 2023

View reviewed changes

This was referenced Sep 9, 2023

[ONNX] Add initial support for FP8 ONNX export (#107962) #108939

Merged

[v.2.1.0] Release Tracker #108055

Closed

justinchuby deleted the abock/onnx-fp8 branch September 9, 2023 04:50

Uh oh!

[ONNX] Add initial support for FP8 ONNX export #107962

[ONNX] Add initial support for FP8 ONNX export #107962

Uh oh!

Conversation

abock commented Aug 25, 2023

Uh oh!

pytorch-bot bot commented Aug 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/107962

❗ 1 Active SEVs

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

abock commented Aug 25, 2023

Uh oh!

Uh oh!

justinchuby Aug 25, 2023

Choose a reason for hiding this comment

Uh oh!

titaiwangms Aug 25, 2023

Choose a reason for hiding this comment

Uh oh!

thiagocrepaldi Aug 28, 2023

Choose a reason for hiding this comment

Uh oh!

justinchuby Aug 28, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Aug 26, 2023

Uh oh!

kit1980 commented Aug 26, 2023

Uh oh!

justinchuby commented Sep 8, 2023

Uh oh!

justinchuby commented Sep 8, 2023

Uh oh!

pytorchmergebot commented Sep 8, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 8, 2023

Merge failed

Uh oh!

justinchuby commented Sep 8, 2023

Uh oh!

pytorchmergebot commented Sep 8, 2023

Merge started

Uh oh!

pytorchmergebot commented Sep 8, 2023

Merge failed

Uh oh!

justinchuby commented Sep 8, 2023

Uh oh!

pytorchmergebot commented Sep 8, 2023

Merge started

Uh oh!

justinchuby Sep 9, 2023

Choose a reason for hiding this comment

Uh oh!

justinchuby Sep 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinchuby commented Sep 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

pytorch-bot bot commented Aug 25, 2023 •

edited

Loading

justinchuby Sep 9, 2023 •

edited

Loading

justinchuby commented Sep 9, 2023 •

edited

Loading