Skip to content

Share more constant initializers#15461

Merged
askhade merged 12 commits into
mainfrom
pengwa/const_share
Apr 14, 2023
Merged

Share more constant initializers#15461
askhade merged 12 commits into
mainfrom
pengwa/const_share

Conversation

@pengwa
Copy link
Copy Markdown
Contributor

@pengwa pengwa commented Apr 11, 2023

Share more constant initializers.

ConstantSharing transformer originally only handle single value initializer (scalar or 1D).

This PR tried to share more cases to make common subexpression elimination transformer to remove more duplicated nodes.

Originally, we used a single vector<std::variant<float,half,int32,int64>> to store different scalar values. In this PR, we create a unordered map with its key being data_type + rank + element count, and its value is a vector of InitializerValue.

For one specific initializer, if it fulfils the condition, then finally will find the corresponding vector of InitializerValue by its <data_type + rank + element count>, then search from the vector whether the constant tensor already exist or not. After that, a value id is returned, which will be combined together with <data_type + rank + element count> to form the pattern key to decide which tensor to reuse (legacy code).

Motivation and Context

One example we see here is:

stateDiagram
    [*] --> LayerNorm(b,s,64)
    LayerNorm(b,s,64) --> Reshape1
    Shape1_Const[b*s,64] --> Reshape1

    LayerNorm(b,s,64) --> Reshape2
    Shape2_Const[b*s,64] --> Reshape2


    Reshape1 --> AttentionSubGraph
    Reshape2 -->  Add
    AttentionSubGraph--> Add
   Add --> [*]
Loading

Ideally CommonSubexpressionElimination can remove one of Reshape1 and Reshape2, while since Shape1_Const and Shape2_Const are different NodeArg*, so it did not remove the duplication.

This is an example: removing the duplication will bring more opportunities to apply graph transformations.

@pengwa pengwa added the training issues related to ONNX Runtime training; typically submitted using template label Apr 11, 2023
@pengwa pengwa marked this pull request as ready for review April 11, 2023 12:43
using SupportedTypeList = boost::mp11::mp_list<MLFloat16, float, double, int32_t, int64_t>;

bool IsValidSingleValueShape(const ONNX_NAMESPACE::TensorShapeProto* input_shape) {
static constexpr int64_t MAX_SIZE_PER_VALUE = 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have this restriction on num of elements?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a bigger tensor element size threshold here means more overhead running ConstantSharing graph transformation. Originally the number is 1, now I changed it to 8 gradually. Maybe we can make it bigger once we found it helps for some specific scenarios.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some cooments around the reasoning behind choosing 8 and what should one consider if they want to change this or remove this limitation altogether in future. Thanks!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added a comment for it. :)

@askhade askhade merged commit bf32dbb into main Apr 14, 2023
@askhade askhade deleted the pengwa/const_share branch April 14, 2023 14:41
wejoncy pushed a commit that referenced this pull request Apr 18, 2023
### Minor fix for differently scoped cpu_ep

cpu_ep is under `#ifndef DISABLE_CONTRIB_OPS`, but one of its usage is
not under the same condition.

```
#ifndef DISABLE_CONTRIB_OPS
  const InlinedHashSet<std::string_view> cpu_ep = {onnxruntime::kCpuExecutionProvider};
#endif
```

### Motivation and Context

Postmoterm: #15461 passed
all CIs except Linux/Windows TVM CIs. I did not check the detailed error
message then because they are failed for some reason for a few days at
least. While checking the details, after PR 15461, the error messge
changes from

Before constant sharing change: TVM CI error message:

```
https://github.com/microsoft/onnxruntime/actions/runs/4700368634/jobs/8334955814

ERROR: testBooleanInputs (__main__.TestInferenceSession)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "onnxruntime_test_python.py", line 617, in testBooleanInputs
    sess = onnxrt.InferenceSession(get_name("logicaland.onnx"), providers=available_providers)
  File "D:\a\onnxruntime\onnxruntime\build\Release\Release\onnxruntime\capi\onnxruntime_inference_collection.py", line 383, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "D:\a\onnxruntime\onnxruntime\build\Release\Release\onnxruntime\capi\onnxruntime_inference_collection.py", line 435, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: D:\a\onnxruntime\onnxruntime\onnxruntime\core\providers\tvm\tvm_api.cc:49 onnxruntime::tvm::TVMCompile compile != nullptr was false. Unable to retrieve 'tvm_onnx_import_and_compile'.
```

to 

```
D:\a\onnxruntime\onnxruntime\onnxruntime\core\optimizer\graph_transformer_utils.cc(213,67): error C2065: 'cpu_ep': undeclared identifier [D:\a\onnxruntime\onnxruntime\build\Release\onnxruntime_optimizer.vcxproj]
D:\a\onnxruntime\onnxruntime\onnxruntime\core\optimizer\graph_transformer_utils.cc(213,19): error C2672: 
```

This PR fixes the build the issue, The error message of Windows/Linux
TVM CIs are back to the original ones.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

training issues related to ONNX Runtime training; typically submitted using template

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants