Share more constant initializers by pengwa · Pull Request #15461 · microsoft/onnxruntime

pengwa · 2023-04-11T12:42:45Z

Share more constant initializers.

ConstantSharing transformer originally only handle single value initializer (scalar or 1D).

This PR tried to share more cases to make common subexpression elimination transformer to remove more duplicated nodes.

Originally, we used a single vector<std::variant<float,half,int32,int64>> to store different scalar values. In this PR, we create a unordered map with its key being data_type + rank + element count, and its value is a vector of InitializerValue.

For one specific initializer, if it fulfils the condition, then finally will find the corresponding vector of InitializerValue by its <data_type + rank + element count>, then search from the vector whether the constant tensor already exist or not. After that, a value id is returned, which will be combined together with <data_type + rank + element count> to form the pattern key to decide which tensor to reuse (legacy code).

Motivation and Context

One example we see here is:

stateDiagram
    [*] --> LayerNorm(b,s,64)
    LayerNorm(b,s,64) --> Reshape1
    Shape1_Const[b*s,64] --> Reshape1

    LayerNorm(b,s,64) --> Reshape2
    Shape2_Const[b*s,64] --> Reshape2


    Reshape1 --> AttentionSubGraph
    Reshape2 -->  Add
    AttentionSubGraph--> Add
   Add --> [*]

Ideally CommonSubexpressionElimination can remove one of Reshape1 and Reshape2, while since Shape1_Const and Shape2_Const are different NodeArg*, so it did not remove the duplication.

This is an example: removing the duplication will bring more opportunities to apply graph transformations.

(cherry picked from commit a4b0ea8)

…pengwa/const_share

askhade · 2023-04-11T22:16:00Z

 using SupportedTypeList = boost::mp11::mp_list<MLFloat16, float, double, int32_t, int64_t>;

-bool IsValidSingleValueShape(const ONNX_NAMESPACE::TensorShapeProto* input_shape) {
+static constexpr int64_t MAX_SIZE_PER_VALUE = 8;


why do we have this restriction on num of elements?

Having a bigger tensor element size threshold here means more overhead running ConstantSharing graph transformation. Originally the number is 1, now I changed it to 8 gradually. Maybe we can make it bigger once we found it helps for some specific scenarios.

Can you add some cooments around the reasoning behind choosing 8 and what should one consider if they want to change this or remove this limitation altogether in future. Thanks!

Sure, I added a comment for it. :)

…pengwa/const_share

### Minor fix for differently scoped cpu_ep cpu_ep is under `#ifndef DISABLE_CONTRIB_OPS`, but one of its usage is not under the same condition. ``` #ifndef DISABLE_CONTRIB_OPS const InlinedHashSet<std::string_view> cpu_ep = {onnxruntime::kCpuExecutionProvider}; #endif ``` ### Motivation and Context Postmoterm: #15461 passed all CIs except Linux/Windows TVM CIs. I did not check the detailed error message then because they are failed for some reason for a few days at least. While checking the details, after PR 15461, the error messge changes from Before constant sharing change: TVM CI error message: ``` https://github.com/microsoft/onnxruntime/actions/runs/4700368634/jobs/8334955814 ERROR: testBooleanInputs (__main__.TestInferenceSession) ---------------------------------------------------------------------- Traceback (most recent call last): File "onnxruntime_test_python.py", line 617, in testBooleanInputs sess = onnxrt.InferenceSession(get_name("logicaland.onnx"), providers=available_providers) File "D:\a\onnxruntime\onnxruntime\build\Release\Release\onnxruntime\capi\onnxruntime_inference_collection.py", line 383, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "D:\a\onnxruntime\onnxruntime\build\Release\Release\onnxruntime\capi\onnxruntime_inference_collection.py", line 435, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: D:\a\onnxruntime\onnxruntime\onnxruntime\core\providers\tvm\tvm_api.cc:49 onnxruntime::tvm::TVMCompile compile != nullptr was false. Unable to retrieve 'tvm_onnx_import_and_compile'. ``` to ``` D:\a\onnxruntime\onnxruntime\onnxruntime\core\optimizer\graph_transformer_utils.cc(213,67): error C2065: 'cpu_ep': undeclared identifier [D:\a\onnxruntime\onnxruntime\build\Release\onnxruntime_optimizer.vcxproj] D:\a\onnxruntime\onnxruntime\onnxruntime\core\optimizer\graph_transformer_utils.cc(213,19): error C2672: ``` This PR fixes the build the issue, The error message of Windows/Linux TVM CIs are back to the original ones.

pengwa added 2 commits April 11, 2023 11:41

share 1d array to allow more cse opt applied

940130d

(cherry picked from commit a4b0ea8)

add ut

72293f7

pengwa added the training issues related to ONNX Runtime training; typically submitted using template label Apr 11, 2023

pengwa requested review from askhade, baijumeswani and guyang3532 April 11, 2023 12:42

pengwa marked this pull request as ready for review April 11, 2023 12:43

pengwa added 3 commits April 11, 2023 15:07

add ut cases

d86a4d5

fix ci

1a6b040

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

f5a0fbc

…pengwa/const_share

askhade reviewed Apr 11, 2023

View reviewed changes

pengwa added 7 commits April 12, 2023 02:16

fix ut

60c95a6

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

0267517

…pengwa/const_share

fix windows warning as error

b5f0796

fix quant test by disable ConstantSharing for it

1f0df7d

add more context for the threshold

684c445

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

bcce9c8

…pengwa/const_share

typo

497ac77

askhade approved these changes Apr 14, 2023

View reviewed changes

askhade merged commit bf32dbb into main Apr 14, 2023

askhade deleted the pengwa/const_share branch April 14, 2023 14:41

snnn mentioned this pull request Apr 14, 2023

[QNN EP] Fix pool and conv op tests #15504

Merged

pengwa mentioned this pull request Apr 18, 2023

Minor fix for differently scoped cpu_ep usage #15550

Merged

edgchen1 mentioned this pull request Aug 23, 2023

Support QDQ transformations with com.microsoft.Quantize/Dequantize ops #17127

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share more constant initializers#15461

Share more constant initializers#15461
askhade merged 12 commits into
mainfrom
pengwa/const_share

pengwa commented Apr 11, 2023 •

edited

Loading

Uh oh!

askhade Apr 11, 2023

Uh oh!

pengwa Apr 12, 2023

Uh oh!

askhade Apr 14, 2023

Uh oh!

pengwa Apr 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pengwa commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!