Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. #62257

Merged

Conversation

hwangdeyu
Copy link
Collaborator

@hwangdeyu hwangdeyu commented Jul 27, 2021

  • This use_external_data_format parameter is used for large models cannot be exported because of the 2GB protobuf limit.

  • When use_external_data_format set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

  • This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then use_external_data_format = True automatically.

@facebook-github-bot facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Jul 27, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 27, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 1d12c01 (more details on the Dr. CI page):


  • 6/6 failures possibly* introduced in this PR
    • 1/6 non-scanned failure(s)

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (1/3)

Step: "Unknown" (full log | diagnosis details | 🔁 rerun)

2021-09-01T10:43:07.1074873Z RuntimeError: test_cpp_api_parity failed!
2021-09-01T10:43:06.6087491Z     baton.wait()
2021-09-01T10:43:06.6088150Z   File "C:\actions-runner\_work\pytorch\pytorch\pytorch-1188677400\build\win_tmp\build\torch\utils\file_baton.py", line 42, in wait
2021-09-01T10:43:06.6089555Z     time.sleep(self.wait_seconds)
2021-09-01T10:43:06.6089976Z KeyboardInterrupt
2021-09-01T10:43:06.6090556Z No CUDA runtime is found, using CUDA_HOME='C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA\v10.1'
2021-09-01T10:43:07.1071487Z Traceback (most recent call last):
2021-09-01T10:43:07.1072329Z   File "run_test.py", line 1092, in <module>
2021-09-01T10:43:07.1072690Z     main()
2021-09-01T10:43:07.1073674Z   File "run_test.py", line 1071, in main
2021-09-01T10:43:07.1074416Z     raise RuntimeError(err_message)
2021-09-01T10:43:07.1074873Z RuntimeError: test_cpp_api_parity failed!
2021-09-01T10:43:07.3104269Z Terminate batch job (Y/N)? 
2021-09-01T10:43:07.3105536Z 
2021-09-01T10:43:07.3106122Z (base) C:\actions-runner\_work\pytorch\pytorch\pytorch-1188677400\test>if ERRORLEVEL 1 exit /b 1 
2021-09-01T10:43:07.3130827Z + cleanup
2021-09-01T10:43:07.3131145Z + retcode=1
2021-09-01T10:43:07.3131417Z + set +x
2021-09-01T10:43:07.3293122Z ##[error]The operation was canceled.
2021-09-01T10:43:07.3753681Z ##[group]Run # -ir => recursive include all files in pattern
2021-09-01T10:43:07.3754318Z �[36;1m# -ir => recursive include all files in pattern�[0m
2021-09-01T10:43:07.3754870Z �[36;1m7z a "test-reports-$Env:COMMIT_SHA1-$Env:WORKFLOW_ID.zip" -ir'!test\*.xml'�[0m

See CircleCI build pytorch_linux_backward_compatibility_check_test (2/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 01 04:11:12 The PR is introducing backward ...m to confirm whether this change is wanted or not.
Sep 01 04:11:12 processing existing schema:  alltoall_base(__torch__.torch.classes.dist_c10d.ProcessGroup _0, Tensor _1, Tensor _2, int[] _3, int[] _4) -> (__torch__.torch.classes.dist_c10d.Work _0)
Sep 01 04:11:12 processing existing schema:  alltoall(__torch__.torch.classes.dist_c10d.ProcessGroup _0, Tensor[] _1, Tensor[] _2) -> (__torch__.torch.classes.dist_c10d.Work _0)
Sep 01 04:11:12 processing existing schema:  send(__torch__.torch.classes.dist_c10d.ProcessGroup _0, Tensor[] _1, int _2, int _3) -> (__torch__.torch.classes.dist_c10d.Work _0)
Sep 01 04:11:12 processing existing schema:  recv(__torch__.torch.classes.dist_c10d.ProcessGroup _0, Tensor[] _1, int _2, int _3) -> (__torch__.torch.classes.dist_c10d.Work _0)
Sep 01 04:11:12 processing existing schema:  recv_anysource(__torch__.torch.classes.dist_c10d.ProcessGroup _0, Tensor[] _1, int _2) -> (__torch__.torch.classes.dist_c10d.Work _0)
Sep 01 04:11:12 processing existing schema:  barrier(__torch__.torch.classes.dist_c10d.ProcessGroup _0) -> (__torch__.torch.classes.dist_c10d.Work _0)
Sep 01 04:11:12 processing existing schema:  __init__(__torch__.torch.classes.dist_c10d.frontend _0) -> (NoneType _0)
Sep 01 04:11:12 processing existing schema:  new_process_group_helper(__torch__.torch.classes.dist_c10d.frontend _0, int _1, int _2, int[] _3, str _4, __torch__.torch.classes.dist_c10d.Store _5, str? _6, int _7) -> (__torch__.torch.classes.dist_c10d.ProcessGroup _0)
Sep 01 04:11:12 processing existing schema:  get_process_group_by_name(__torch__.torch.classes.dist_c10d.frontend _0, str _1) -> (__torch__.torch.classes.dist_c10d.ProcessGroup _0)
Sep 01 04:11:12 processing existing schema:  get_name_of_process_group(__torch__.torch.classes.dist_c10d.frontend _0, __torch__.torch.classes.dist_c10d.ProcessGroup _1) -> (str _0)
Sep 01 04:11:12 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
Sep 01 04:11:12 
Sep 01 04:11:12 Broken ops: [
Sep 01 04:11:12 	quantized::embedding_bag_2bit_rowwise_offsets(Tensor weight, Tensor indices, Tensor? offsets=None, bool scale_grad_by_freq=False, int mode=0, bool pruned_weights=False, Tensor? per_sample_weights=None, Tensor? compressed_indices_mapping=None, bool include_last_offset=False) -> (Tensor)
Sep 01 04:11:12 	prim::VarStack(...) -> (Tensor)
Sep 01 04:11:12 	quantized::linear_relu_dynamic_fp16(Tensor X, __torch__.torch.classes.quantized.LinearPackedParamsBase W_prepack) -> (Tensor Y)
Sep 01 04:11:12 	q::_Bfloat16QuantizedToFloat(Tensor input) -> (Tensor)
Sep 01 04:11:12 	q::_FloatToBfloat16Quantized(Tensor input) -> (Tensor)
Sep 01 04:11:12 ]
Sep 01 04:11:12 =================== sccache compilation log ===================
Sep 01 04:11:12 =========== If your build fails, please take a look at the log above for possible reasons ===========

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_test (3/3)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Sep 01 05:42:57 [ FAILED ] TensorTest.TestBatchNorm1D
Sep 01 05:42:57 [ RUN      ] XlaUtilCacheTest.BasicTest
Sep 01 05:42:57 [       OK ] XlaUtilCacheTest.BasicTest (0 ms)
Sep 01 05:42:57 [----------] 1 test from XlaUtilCacheTest (0 ms total)
Sep 01 05:42:57 
Sep 01 05:42:57 [----------] Global test environment tear-down
Sep 01 05:42:57 [==========] 613 tests from 8 test suites ran. (439277 ms total)
Sep 01 05:42:57 [  PASSED  ] 611 tests.
Sep 01 05:42:57 [  SKIPPED ] 1 test, listed below:
Sep 01 05:42:57 [  SKIPPED ] AtenXlaTensorTest.TestGroupNormBackward
Sep 01 05:42:57 [  FAILED  ] 1 test, listed below:
Sep 01 05:42:57 [  FAILED  ] TensorTest.TestBatchNorm1D
Sep 01 05:42:57 
Sep 01 05:42:57  1 FAILED TEST
Sep 01 05:42:57 + cleanup
Sep 01 05:42:57 + retcode=1
Sep 01 05:42:57 + set +x
Sep 01 05:42:57 =================== sccache compilation log ===================
Sep 01 05:42:57 =========== If your build fails, please take a look at the log above for possible reasons ===========
Sep 01 05:42:57 Compile requests                      0
Sep 01 05:42:57 Compile requests executed             0
Sep 01 05:42:57 Cache hits                            0

2 failures not recognized by patterns:

Job Step Action
CircleCI pytorch_linux_xenial_py3_6_gcc5_4_jit_legacy_test Report results 🔁 rerun
CircleCI pytorch_linux_xenial_py3_6_gcc5_4_test Report results 🔁 rerun

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@hwangdeyu hwangdeyu force-pushed the deyu/remove_use_external_data_format branch 2 times, most recently from 8301178 to 43bc6de Compare July 27, 2021 14:34
torch/onnx/__init__.py Outdated Show resolved Hide resolved
torch/onnx/utils.py Outdated Show resolved Hide resolved
@hwangdeyu hwangdeyu force-pushed the deyu/remove_use_external_data_format branch from 4053816 to 228b093 Compare August 9, 2021 09:48
@jiafatom jiafatom self-assigned this Aug 12, 2021
@hwangdeyu hwangdeyu force-pushed the deyu/remove_use_external_data_format branch 2 times, most recently from 17df005 to 38f3756 Compare August 17, 2021 16:02
@BowenBao BowenBao force-pushed the onnx_ms_1 branch 2 times, most recently from 7c7074e to c500fb6 Compare August 20, 2021 20:46
@hwangdeyu hwangdeyu force-pushed the deyu/remove_use_external_data_format branch 2 times, most recently from 4d1593a to 0c1aec8 Compare August 24, 2021 07:31
BowenBao added a commit that referenced this pull request Sep 1, 2021
….onnx.export() function. (#62257)"

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Sep 1, 2021
….onnx.export() function. (#62257)"

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Sep 7, 2021
…param from torch.onnx.export() function. (#62257)"

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Sep 7, 2021
…t() function. (#62257)

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

ghstack-source-id: 6bb15060d028c0b1c28ecd4aa2360e36e81db4e5
Pull Request resolved: #64382

fix use external data format pr rebase error (#64357)

* Fix use_external_data_format PR rebase error.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

ghstack-source-id: 6bb15060d028c0b1c28ecd4aa2360e36e81db4e5
Pull Request resolved: #64383
BowenBao added a commit that referenced this pull request Sep 7, 2021
….onnx.export() function. (#62257)"

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

[ghstack-poisoned]
garymm added a commit that referenced this pull request Sep 18, 2021
…t() function. (#62257)

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Sep 20, 2021
…t() function. (#62257)

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

ghstack-source-id: e9ca7d7ce3bbfb211b1a43147e3ced3dbd690c86
Pull Request resolved: #64382

fix use external data format pr rebase error (#64357)

* Fix use_external_data_format PR rebase error.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

ghstack-source-id: e9ca7d7ce3bbfb211b1a43147e3ced3dbd690c86
Pull Request resolved: #64383
BowenBao added a commit that referenced this pull request Sep 20, 2021
…param from torch.onnx.export() function. (#62257)"

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

Differential Revision: [D30905265](https://our.internmc.facebook.com/intern/diff/D30905265)

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Sep 20, 2021
….onnx.export() function. (#62257)"

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

Differential Revision: [D30905265](https://our.internmc.facebook.com/intern/diff/D30905265)

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Sep 22, 2021
…t() function. (#62257)

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

ghstack-source-id: 1b5a6bd45e9c83580ab6b7af45b7c715dd74f4c7
Pull Request resolved: #64382

fix use external data format pr rebase error (#64357)

* Fix use_external_data_format PR rebase error.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

ghstack-source-id: 1b5a6bd45e9c83580ab6b7af45b7c715dd74f4c7
Pull Request resolved: #64383
BowenBao added a commit that referenced this pull request Sep 22, 2021
…param from torch.onnx.export() function. (#62257)"

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

Differential Revision: [D30905265](https://our.internmc.facebook.com/intern/diff/D30905265)

[ghstack-poisoned]
BowenBao added a commit that referenced this pull request Sep 22, 2021
….onnx.export() function. (#62257)"

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Co-authored-by: hwangdeyu <dejack953@outlook.com>

Differential Revision: [D30905265](https://our.internmc.facebook.com/intern/diff/D30905265)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Sep 24, 2021
…t() function. (#62257) (#64382)

Summary:
Pull Request resolved: #64382

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905265

Pulled By: malfet

fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb

Co-authored-by: hwangdeyu <dejack953@outlook.com>
garymm pushed a commit to garymm/pytorch that referenced this pull request Oct 1, 2021
…t() function. (pytorch#62257) (pytorch#64382)

Summary:
Pull Request resolved: pytorch#64382

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905265

Pulled By: malfet

fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb

Co-authored-by: hwangdeyu <dejack953@outlook.com>
malfet pushed a commit that referenced this pull request Oct 8, 2021
* [ONNX] Remove argument _retain_param_name from torch.onnx.export() function. (#61702) (#64370)

Summary:
Pull Request resolved: #64370

As of now, the "_retain_param_name" parameter has no description in PyTorch docs website. According to code, this argument determines if we keep the original parameter names of PyTorch model in the final ONNX graph. If this is False, those original parameter names will be replaced with a series of integers starting from 1.

Since setting numbers as parameter names make no sense to users, we remove this argument from the torch.onnx.export() function to increase user experience of calling this function.

This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as _retain_param_name is set to True.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905270

Pulled By: malfet

fbshipit-source-id: ca60757ca17daaff937e9f08da42596086795f4a

Co-authored-by: fatcat-z <zhang-ji@outlook.com>

* [ONNX] Remove strip_doc_string param from torch.onnx.export() function. (#61712) (#64371)

Summary:
Pull Request resolved: #64371

As of now, the "strip_doc_string" parameter was described as below:

strip_doc_string (bool, default True): do not include the field
doc_string``` from the exported model. Otherwise the field will mention the source code locations for model``.

This is usually useless to users who want to transform a PyTorch model to ONNX one. Only when the user wants to debug the export process, these source code locations could provide benefits.

To make the export() function more friendly by providing less parameters, we combined "strip_doc_string" into "verbose" parameter. If a user set verbose to True, it means the users need some log information for debugging the export process and this is similar with the purpose of strip_doc_string parameter.

But the usage of these 2 arguments are opposite: setting verbose to True means we want to print log information to help debug, which means strip_doc_string should be False. And this is how we replace strip_doc_string with verbose argument in this PR.

This PR will still keep it in torch.onnx.export() function for backward support while the usage of it has been combined with verbose argument.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905268

Pulled By: malfet

fbshipit-source-id: 2f06eb805c01fe15ff7a1b4f6595c937ba716d60

Co-authored-by: fatcat-z <zhang-ji@outlook.com>

* [ONNX] minor doc improvements and cleanup (#62514) (#64373)

Summary:
Pull Request resolved: #64373

* Fix some bad formatting and clarify things in onnx.rst.
* In `export_to_pretty_string`:
    * Add documentation for previously undocumented args.
    * Document that `f` arg is ignored and mark it deprecated.
    * Update tests to stop setting `f`.
    * Warn if `_retain_param_name` is set.
* Use double quotes for string literals in test_operators.py.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905271

Pulled By: malfet

fbshipit-source-id: 3627eeabf40b9516c4a83cfab424ce537b36e4b3

* [ONNX] Deprecated the example_outputs param from torch.onnx.export() function. (#62815) (#64380)

Summary:
Pull Request resolved: #64380

* `example_outputs` used to determine the type and shape of the outputs without tracing the execution of the model. And it must be provided when exporting a ScriptModule or ScriptFunction when using export() function.

* Since we can work out `example_outputs` in internal function instead of being provided by user, so we deprecated this argument in the export() function to increase user experience of calling this function.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905266

Pulled By: malfet

fbshipit-source-id: d00b00d7d02b365d165028288ad915678caa51f2

Co-authored-by: hwangdeyu <dejack953@outlook.com>

* [ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (#62257) (#64382)

Summary:
Pull Request resolved: #64382

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905265

Pulled By: malfet

fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb

Co-authored-by: hwangdeyu <dejack953@outlook.com>

* fix clang-tidy error introduced by #64382 (#65977)

Summary: Pull Request resolved: #65977

Reviewed By: ngimel

Differential Revision: D31423174

Pulled By: malfet

fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352

Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: fatcat-z <zhang-ji@outlook.com>
Co-authored-by: hwangdeyu <dejack953@outlook.com>
pytorchmergebot pushed a commit that referenced this pull request Oct 14, 2023
Fixes #110982

#62257 deprecated `torch.onnx.export(use_external_data_format: bool=...)`  argument, but it seems the introduced `EncoderBase::GetGraphProtoSize` has a bug and doesn't detect models > 2GB when onnx Constant nodes are large (and responsible for the size overflow)

This PR adds the constant node to the total size of the model, along with initializers.

In python, what we need to do is:

```python
import onnx

def compute_tensor_size(tensor):
    # Compute the size of the tensor based on its shape and data type
    size = tensor.size * tensor.itemsize
    return size

def sum_constant_and_initializer_sizes(model_path):
    # Load the ONNX model
    model = onnx.load(model_path)

    total_size = 0
    initializer_size = 0
    constant_size = 0

    # Compute the size of constant nodes
    for node in model.graph.node:
        if node.op_type == 'Constant':
            constant_value = node.attribute[0].t
            # Convert constant value to numpy array
            constant_array = onnx.numpy_helper.to_array(constant_value)
            # Compute the size of the constant tensor
            tensor_size = compute_tensor_size(constant_array)
            total_size += tensor_size
            constant_size += tensor_size

    # Compute the size of initializer nodes that are not graph inputs
    for initializer in model.graph.initializer:
        if initializer.name not in [input.name for input in model.graph.input]:
            # Convert the shape and data type information to calculate size
            # tensor = onnx.helper.tensor_value_info_to_tensor(input)
            tensor = onnx.numpy_helper.to_array(initializer)
            tensor_size = compute_tensor_size(tensor)
            total_size += tensor_size
            initializer_size += tensor_size

    return total_size, constant_size, initializer_size

model_path = '/path/to/model.onnx'
total_size, constant_size, initializer_size = sum_constant_and_initializer_sizes(model_path)

print("Total size of constant nodes in bytes:", constant_size)
print("Total size of initializer nodes (excluding graph inputs) in bytes:", initializer_size)
print("Total size of constant and initializer nodes (excluding graph inputs) in bytes:", total_size)
```

Pull Request resolved: #111097
Approved by: https://github.com/justinchuby, https://github.com/zhipenghan
yeounoh pushed a commit to yeounoh/pytorch that referenced this pull request Oct 16, 2023
Fixes pytorch#110982

pytorch#62257 deprecated `torch.onnx.export(use_external_data_format: bool=...)`  argument, but it seems the introduced `EncoderBase::GetGraphProtoSize` has a bug and doesn't detect models > 2GB when onnx Constant nodes are large (and responsible for the size overflow)

This PR adds the constant node to the total size of the model, along with initializers.

In python, what we need to do is:

```python
import onnx

def compute_tensor_size(tensor):
    # Compute the size of the tensor based on its shape and data type
    size = tensor.size * tensor.itemsize
    return size

def sum_constant_and_initializer_sizes(model_path):
    # Load the ONNX model
    model = onnx.load(model_path)

    total_size = 0
    initializer_size = 0
    constant_size = 0

    # Compute the size of constant nodes
    for node in model.graph.node:
        if node.op_type == 'Constant':
            constant_value = node.attribute[0].t
            # Convert constant value to numpy array
            constant_array = onnx.numpy_helper.to_array(constant_value)
            # Compute the size of the constant tensor
            tensor_size = compute_tensor_size(constant_array)
            total_size += tensor_size
            constant_size += tensor_size

    # Compute the size of initializer nodes that are not graph inputs
    for initializer in model.graph.initializer:
        if initializer.name not in [input.name for input in model.graph.input]:
            # Convert the shape and data type information to calculate size
            # tensor = onnx.helper.tensor_value_info_to_tensor(input)
            tensor = onnx.numpy_helper.to_array(initializer)
            tensor_size = compute_tensor_size(tensor)
            total_size += tensor_size
            initializer_size += tensor_size

    return total_size, constant_size, initializer_size

model_path = '/path/to/model.onnx'
total_size, constant_size, initializer_size = sum_constant_and_initializer_sizes(model_path)

print("Total size of constant nodes in bytes:", constant_size)
print("Total size of initializer nodes (excluding graph inputs) in bytes:", initializer_size)
print("Total size of constant and initializer nodes (excluding graph inputs) in bytes:", total_size)
```

Pull Request resolved: pytorch#111097
Approved by: https://github.com/justinchuby, https://github.com/zhipenghan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants