Support direct usage of ORT format model flatbuffer for initializers by skottmckay · Pull Request #12465 · microsoft/onnxruntime

skottmckay · 2022-08-04T10:59:59Z

Description:
An ORT format model contains initializer data that we currently copy into a TensorProto during model load, and copy again into an OrtValue<Tensor> during session state finalization. We can do some optimizations to try and keep peak memory usage from these steps to roughly 2x the original size of the initializers, but that is still inefficient in a mobile scenario.

There is no way to populate the raw_data field of a TensorProto using an existing buffer. The OrtValue<Tensor> however does support the Tensor being constructed from an existing buffer with optional ownership transfer.

There is the capability for a TensorProto to point to external data. Typically the external data is stored in a separate file to the model, and the TensorProto contains the filename, offset and size of the data. We can leverage this mechanism to point to external data that is already resident in memory (from the ORT format model flatbuffer) by using a special tag for the filename and storing the memory address in the 'offset' field.

The existing code to create an OrtValue<Tensor> from a TensorProto containing external data supports the copy-free approach of passing along a pointer with optional transfer of ownership to the OrtValue, as we normally mmap the file containing the external data and use the address of that buffer.

This PR contains the small set of changes necessary to implement this approach to gather feedback. The usage is limited to an ORT format model where the caller provides a buffer containing the pre-loaded bytes for the model and they set a flag specifying not to copy the bytes (signifying that memory usage is important to them). An additional flag is provided to allow specifying that we may also use the buffer directly for initializers, as that creates a new requirement that the buffer remain valid for the entire duration of the InferenceSession (vs. currently where it is only required to be valid until InferenceSession initialization completes).

Motivation and Context
We have production mobile scenarios that require a reduction in peak memory usage.

Test output from potential production model is below. ORT format model being tested is 13.6MB.

Peak Working Set Size in bytes

Stage	Original	New	Notes
Pre-load of model into buffer	7,737,344	7,663,616	Baseline with overhead from onnxruntime_test_all size
Model loaded. Pre-InferenceSession::Load	23,093,248	23,040,000	Roughly equal as expected
Post InferenceSession.Load. Pre InferenceSession::Initialize	42,946,560	29,179,904	13MB+ reduction
PostInferenceSession.Initialize	61,435,904	34,676,736	25MB+ reduction

NOTE: Pre-packing was disabled for this testing. If we are using the user-provided buffer directly for the initializers, the pre-packing causes an additional copy of the initializer data when creating the pre-packed OrtValue<Tensor>, and we can't free the original initializer data as that is within the user-provided buffer. If that buffer was mutable we could potentially do in-place pre-packing (pre-pack to temporary buffer, replace original data) to avoid that copy. This is a separate problem to solve if pre-packing is also required in the production scenario.

…ers by leveraging the TensorProto external data infrastructure.

yuslepukhin · 2022-08-04T21:52:57Z

onnxruntime/core/framework/session_state_utils.cc

                                                        Tensor& tensor, OrtCallback& ext_data_deleter) {
  ORT_ENFORCE(utils::HasExternalData(tensor_proto));
-  ORT_ENFORCE(!proto_path.empty());
+  // TODO: Do we need a check here? It's internal only so I don't think having an empty proto_path when loading


Do we need a check here? It's internal only

I think we can remove it, and make sure that the error message on failure includes that empty path.

On Windows we report just the filename. On Linux the failure would occur at a point there's an invalid file descriptor, so no filename or path.

Added something to GetExtDataFromTensorProto to ensure the full path is reported if there's an error.

onnxruntime/core/framework/tensorprotoutils.h

yuslepukhin · 2022-08-04T23:13:59Z

                            std::vector<uint8_t>& bytes_data_holder) {

Should this be uint8_t[]?

Refers to: onnxruntime/core/session/inference_session.cc:993 in e30ebad. [](commit_id = e30ebad, deletion_comment = False)

yuslepukhin · 2022-08-04T23:15:39Z

LGTM

yuslepukhin

onnxruntime/core/framework/tensorprotoutils.cc

onnxruntime/core/graph/graph_flatbuffers_utils.cc

onnxruntime/core/framework/tensorprotoutils.h

skottmckay · 2022-08-05T04:43:59Z

                            std::vector<uint8_t>& bytes_data_holder) {

The std::vector is a member of InferenceSession and is providing the storage.

In reply to: 1205866045

Refers to: onnxruntime/core/session/inference_session.cc:993 in e30ebad. [](commit_id = e30ebad, deletion_comment = False)

edgchen1

looks good to me overall

onnxruntime/core/graph/graph_flatbuffers_utils.cc

onnxruntime/core/framework/tensorprotoutils.h

pranavsharma · 2022-08-08T20:29:05Z

Could we ask customers to use the AddExternalInitializers API instead?

skottmckay · 2022-08-08T21:33:14Z

Could we ask customers to use the AddExternalInitializers API instead?

Not sure that helps. Looks like the API takes an OrtValue that we convert to a TensorProto (and I assume back to an OrtValue during session state finalization). Due to that would you still have 2x the memory usage for each initializer?

onnxruntime/onnxruntime/core/graph/graph.cc

Lines 2866 to 2873 in 8a86b34

    
           Status Graph::InjectExternalInitializedTensors(const InlinedHashMap<std::string, OrtValue>& external_initializers) { 
        
             for (const auto& e : external_initializers) { 
        
               const auto& name = e.first; 
        
               const OrtValue& ort_value = e.second; 
        
               auto tensor_proto = utils::TensorToTensorProto(ort_value.Get<Tensor>(), name); 
        
               ORT_RETURN_IF_ERROR(ReplaceInitializedTensorImpl(std::move(tensor_proto), true)); 
        
               LOGS(logger_, INFO) << "Replaced external initializer: " << name; 
        
             }

…direct_usage_of_ORT_format_model_flatbuffer_for_initializers

pranavsharma · 2022-08-09T01:05:48Z

Could we ask customers to use the AddExternalInitializers API instead?

Not sure that helps. Looks like the API takes an OrtValue that we convert to a TensorProto (and I assume back to an OrtValue during session state finalization). Due to that would you still have 2x the memory usage for each initializer?

onnxruntime/onnxruntime/core/graph/graph.cc

Lines 2866 to 2873 in 8a86b34

Status Graph::InjectExternalInitializedTensors(const InlinedHashMap<std::string, OrtValue>& external_initializers) {

for (const auto& e : external_initializers) {

const auto& name = e.first;

const OrtValue& ort_value = e.second;

auto tensor_proto = utils::TensorToTensorProto(ort_value.Get<Tensor>(), name);

ORT_RETURN_IF_ERROR(ReplaceInitializedTensorImpl(std::move(tensor_proto), true));

LOGS(logger_, INFO) << "Replaced external initializer: " << name;

}

Yeah, the approach in the PR looks fine to me. Just some minor comments.

…rect_usage_of_ORT_format_model_flatbuffer_for_initializers

skottmckay added 2 commits August 4, 2022 11:16

Add ability to use ORT format model flatbuffer directly for intiializ…

e30ebad

…ers by leveraging the TensorProto external data infrastructure.

Merge

ab65e58

skottmckay requested review from RyanUnderhill, edgchen1, hariharans29, pranavsharma, souptc, yufenglee and yuslepukhin August 4, 2022 10:59

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

onnxruntime/core/framework/tensorprotoutils.h Outdated Show resolved Hide resolved

yuslepukhin previously approved these changes Aug 4, 2022

View reviewed changes

edgchen1 reviewed Aug 5, 2022

View reviewed changes

onnxruntime/core/framework/tensorprotoutils.cc Outdated Show resolved Hide resolved

onnxruntime/core/graph/graph_flatbuffers_utils.cc Outdated Show resolved Hide resolved

onnxruntime/core/framework/tensorprotoutils.h Outdated Show resolved Hide resolved

Address PR comments

fd7f42a

skottmckay dismissed yuslepukhin’s stale review via fd7f42a August 5, 2022 06:52

edgchen1 reviewed Aug 8, 2022

View reviewed changes

onnxruntime/core/graph/graph_flatbuffers_utils.cc Outdated Show resolved Hide resolved

onnxruntime/core/framework/tensorprotoutils.h Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/master' into skottmckay/Support_…

3860b36

…direct_usage_of_ORT_format_model_flatbuffer_for_initializers

Address PR comments

bb41bc3

skottmckay changed the title ~~RFC: support direct usage of ORT format model flatbuffer for initializers~~ Support direct usage of ORT format model flatbuffer for initializers Aug 9, 2022

edgchen1 approved these changes Aug 9, 2022

View reviewed changes

Merge remote-tracking branch 'origin/main' into skottmckay/Support_di…

e5e3324

…rect_usage_of_ORT_format_model_flatbuffer_for_initializers

skottmckay merged commit 0b0c51e into main Aug 12, 2022

skottmckay deleted the skottmckay/Support_direct_usage_of_ORT_format_model_flatbuffer_for_initializers branch August 12, 2022 08:31

Conversation

skottmckay commented Aug 4, 2022

Uh oh!

yuslepukhin Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

skottmckay Aug 5, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yuslepukhin commented Aug 4, 2022

Uh oh!

yuslepukhin commented Aug 4, 2022

Uh oh!

yuslepukhin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skottmckay commented Aug 5, 2022

Uh oh!

edgchen1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pranavsharma commented Aug 8, 2022

Uh oh!

skottmckay commented Aug 8, 2022

Uh oh!

pranavsharma commented Aug 9, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants