VitisAI EP Context Model #20926

glen-amd · 2024-06-04T23:40:57Z

Why so many commits

Runtime debugging - which is necessary
Three different approaches to EP context model - as a result testing back and forth
Windows compatibility issues - this development has been done on Linux for convenience

"Open" (?) questions

Full offloading to a specific EP
Dumping EP context models by EPs vs by ONNXRT
Node name to pick nodes

VitisAI EP made three variant implementations that have respective pros and conds (and of course we can combine them)

Serialize and cache the list of compute capabilites and the original ONNX model itself

In `ComputeCapability()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key

In `Compile()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key

EP context model creation

Precondition
Session option configuration kOrtSessionOptionEpContextEnable (aka "ep.context_enable") is enabled.
Approach 1
- Steps
  1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext").
  2. EP implements/overrides IExecutionProvider::GetEpContextNodes() method.
  3. ONNXRT core creates an EP context model and saves/dumps it.
    - CreateEpContextModel() in the file "graph_partitioner.cc"
      - In get_ep_context_node(), Node::Name() is used to check whether a node is an EP context node. This limits that EP model creation can only happen in IExecutionProvider::Compile().
      - The workaround is (1) not implementing IExecutionProvider::GetEpContextNodes() and (2) dumping the EP context model by EP itself.
  4. Optionally, EP can also dump the EP context model it created by iteself.
- Examples
  - QNNExecutionProvider
  - VitisAIExecutionProvider
Approach 2
- Steps
  1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext").
  2. EP does NOT implement IExecutionProvider::GetEpContextNodes() at all.
  3. EP dumps the EP context model it created.
- Examples
  - TensorrtExecutionProvider
    - UPDATES
      - TRT EP is switching to leveraging IExecutionProvider::GetEpContextNodes()
  - OpenVINOExecutionProvider (?)

What to cache in EP context nodes

Non Compilation based EPs
- Examples
  - VitisAIExecutionProvider
- Characteristics
  - Heavy lifting work happens in IExecutionProvider::GetCapability().
- Preconditions
  - IExecutionProvider::GetCapability() is only called once by ONNXRT.
- Cache content
  - Serialization of a list of ComputeCapability
    - Not EP-specific
    - Serialized using onnx::FunctionProto
  - EP-specific cache
Compilation based EPs
- Examples
  - QNNExecutionProvider
  - TensorrtExecutionProvider
  - MIGraphXExecutionProvider
  - OpenVINOExecutionProvider
- Cache content
  - EP-specific cache

Requirements

Offline / AOT compilation of ONNX models with EP context cache
Compile somewhere, run everywhere

Pseudo code with brief explanation

GenerateCache(original_onnx_file, cache_onnx_file)
  model_buffer = load(original_onnx_file) --> Load the original ONNX model file
  model_buffer = decrypt(model_buffer)
  session_options = { kOrtSessionOptionEpContextEnable: true, kOrtSessionOptionEpContextFilePath: temp_file } --> Set necessary configs
  Ort::CreateSessionFromArray(model_buffer, session_options) --> The new ONNX model with EP context is created and dumped into the user specified file "temp_file"
  temp_buffer = encrypt(temp_file)
  write(temp_buffer, cache_onnx_file) --> Write the encypted context of "temp_file" into the "cache_onnx_file" file


InitializeInferenceSession(cache_onnx_file)
  model_buffer = load(cache_onnx_file) --> Load the ONNX model with EP context from the file generated in the previous step
  model_buffer = decrypt(model_buffer)
  session_options = { }
  Ort::CreateSessionFromArray(model_buffer, session_options) --> Create and initalize an session with the EP context model

Python code with comments

EP context model creation

import onnxruntime as onnxrt


# Session options for creating an ONNX model with EP context cache.
sess_opts = onnxrt.SessionOptions()

# Verbose.
sess_opts.log_severity_level = 0

# This is REQUIRED.
sess_opts.add_session_config_entry("ep.context_enable", "1")
# This is OPTIONAL.
# Either an absolute path (preferred for now) or a relative path (WIP) is okay.
# sess_opts.add_session_config_entry("ep.context_file_path", "/some/path/to/original_model_ctx.onnx")
# This is OPTIONAL.
sess_opts.add_session_config_entry("ep.context_embed_mode", "1")

orig_model_location = "/some/path/to/original_model.onnx"
sess = onnxrt.InferenceSession(orig_model_location, sess_opts, providers=["VitisAIExecutionProvider"], provider_options=[])

Inference run with an EP context model

import onnxruntime as onnxrt


# Session options for creating an ONNX model with EP context cache.
sess_opts = onnxrt.SessionOptions()

# Default EP context model path.
# ep_ctx_model_location = "/some/path/to/origina_model.onnx_ctx.onnx"
# User configured EP context model path.
ep_ctx_model_location = "/some/path/to/origina_model_ctx.onnx"
sess = onnxrt.InferenceSession(ep_ctx_model_location, sess_opts, providers=["VitisAIExecutionProvider"], provider_options=[])

model_inputs = {}
run_opts = onnxrt.RunOptions()
# Verbose.
run_opts.log_severity_level = 1
sess.run(None, model_inputs, run_opts)

Tried addressing part of the issues with wrapped classes Fixed extra forward declaration

Fixed some piecemeal issues Fixed an incorrect header inclusion Got rid of NodeAttrHelper Fixed piecemeal issues

Commented out an explicit and direct header inclusion

Dealt with const-ness of class members Fixed mismatched types due to previous changes Added missing namespace prefixes Removed incorrect methods

…-model

glen-amd · 2024-06-04T23:44:15Z

@microsoft-github-policy-service agree

onnxruntime/core/providers/vitisai/vitisai_execution_provider.cc

onnxruntime/core/providers/vitisai/imp/ep_context_utils.cc

…iler()` method

HectorSVC · 2024-06-06T15:34:38Z

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

azure-pipelines · 2024-06-06T15:35:17Z

Azure Pipelines successfully started running 9 pipeline(s).

HectorSVC · 2024-06-06T15:36:19Z

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

azure-pipelines · 2024-06-06T15:36:54Z

Azure Pipelines successfully started running 9 pipeline(s).

…. This one is tightly coupled with the closed-source backend of VitisAI EP. 2) Interim commit. Algorithm implementation selection for MD5 is TODO.

HectorSVC · 2024-07-11T23:05:00Z

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

azure-pipelines · 2024-07-11T23:05:05Z

Pull request contains merge conflicts.

HectorSVC · 2024-07-11T23:05:12Z

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

azure-pipelines · 2024-07-11T23:05:18Z

Pull request contains merge conflicts.

HectorSVC · 2024-07-11T23:23:23Z

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

HectorSVC · 2024-07-11T23:23:37Z

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

azure-pipelines · 2024-07-11T23:24:03Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-07-11T23:24:16Z

Azure Pipelines successfully started running 9 pipeline(s).

HectorSVC · 2024-07-12T00:15:34Z

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

HectorSVC · 2024-07-12T00:15:46Z

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

azure-pipelines · 2024-07-12T00:16:16Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-07-12T00:16:23Z

Azure Pipelines successfully started running 9 pipeline(s).

…vitisai-ep-context-model-2

HectorSVC · 2024-07-12T21:08:04Z

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

HectorSVC · 2024-07-12T21:08:25Z

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

azure-pipelines · 2024-07-12T21:08:46Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2024-07-12T21:09:03Z

Azure Pipelines successfully started running 9 pipeline(s).

HectorSVC · 2024-07-13T00:08:29Z

/azp run Linux OpenVINO CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

azure-pipelines · 2024-07-13T00:08:47Z

Azure Pipelines successfully started running 3 pipeline(s).

glen-amd added 12 commits June 3, 2024 11:41

Initial support for EP context models for VitisAI EP

7f6adcb

Tried fixing a few errors

a94ad9b

Trying address issues of unrecognized ONNX_NAMESPACE::FunctionProto

0c0e296

Tried addressing part of the issues with wrapped classes Fixed extra forward declaration

Tried addressing the issues with defining IndexedSubGraph_SourceOfSchema

c1a2a59

Fixed some piecemeal issues Fixed an incorrect header inclusion Got rid of NodeAttrHelper Fixed piecemeal issues

Worked around unused variable warning

e40ff6e

Commented out an explicit and direct header inclusion

const & non-const compatability of class methods

4aaac63

Commented out SessionOptions related pieces in VAI EP ctor

4e8e043

Dealt with const-ness of class members Fixed mismatched types due to previous changes Added missing namespace prefixes Removed incorrect methods

Removed or commented out deprecated fields of FunctionProto

ebc0290

Compitible with the deprecated FunctionProto fields

3b0b2b0

lintrunner -a

d86a3eb

Merge remote-tracking branch 'ms_onnxrt/main' into vitisai-ep-context…

cae97f1

…-model

lintrunner -a; merge upstream main

98d7208

glen-amd marked this pull request as ready for review June 4, 2024 23:46

jywu-msft requested a review from HectorSVC June 4, 2024 23:49

HectorSVC reviewed Jun 5, 2024

View reviewed changes

onnxruntime/core/providers/vitisai/vitisai_execution_provider.cc Outdated Show resolved Hide resolved

HectorSVC reviewed Jun 5, 2024

View reviewed changes

onnxruntime/core/providers/vitisai/imp/ep_context_utils.cc Outdated Show resolved Hide resolved

glen-amd changed the title ~~Vitisai ep context model 2~~ VitisAI EP Context Model Jun 5, 2024

glen-amd added 2 commits June 5, 2024 17:20

Non-optimal approach to accommodating `VitisAIExecutionProvider::Comp…

ecce038

…iler()` method

Fixed a couple of piecemeals

2ecacff

Refined some comments

9796057

HectorSVC self-assigned this Jun 6, 2024

glen-amd added 4 commits June 7, 2024 17:30

1) An alternative implementation of EP context feature for VitisAI EP…

d68d487

…. This one is tightly coupled with the closed-source backend of VitisAI EP. 2) Interim commit. Algorithm implementation selection for MD5 is TODO.

Some refactoring

7f5ed75

Added logging

64aeeac

More logging

6a94023

glen-amd added 3 commits July 9, 2024 13:32

Added some to-be-removed logs

738f8aa

Code cleanup

dc7dd3d

Code cleanup

c78fd8c

Merge branch 'main' into vitisai-ep-context-model-2

7b4194c

Removed un-needed header files

e2aa3d3

glen-amd added 5 commits July 12, 2024 10:10

Merge branch 'main' of https://github.com/microsoft/onnxruntime into …

a2e5de9

…vitisai-ep-context-model-2

Made changes according to PR microsoft#21211

e3033e4

Got rid of upstream-deprecated onnxruntime::Path::ToPathString()

824c64c

Dealt with platform-dependent PathString and string

4410713

Fixed slip of the pen

68bdc65

HectorSVC approved these changes Jul 13, 2024

View reviewed changes

HectorSVC merged commit 281ed8c into microsoft:main Jul 13, 2024
64 of 66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VitisAI EP Context Model #20926

VitisAI EP Context Model #20926

glen-amd commented Jun 4, 2024 •

edited

Loading

glen-amd commented Jun 4, 2024

HectorSVC commented Jun 6, 2024

azure-pipelines bot commented Jun 6, 2024

HectorSVC commented Jun 6, 2024

azure-pipelines bot commented Jun 6, 2024

HectorSVC commented Jul 11, 2024

azure-pipelines bot commented Jul 11, 2024

HectorSVC commented Jul 11, 2024

azure-pipelines bot commented Jul 11, 2024

HectorSVC commented Jul 11, 2024

HectorSVC commented Jul 11, 2024

azure-pipelines bot commented Jul 11, 2024

azure-pipelines bot commented Jul 11, 2024

HectorSVC commented Jul 12, 2024

HectorSVC commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

HectorSVC commented Jul 12, 2024

HectorSVC commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

HectorSVC commented Jul 13, 2024

azure-pipelines bot commented Jul 13, 2024

VitisAI EP Context Model #20926

VitisAI EP Context Model #20926

Conversation

glen-amd commented Jun 4, 2024 • edited Loading

Why so many commits

"Open" (?) questions

VitisAI EP made three variant implementations that have respective pros and conds (and of course we can combine them)

Serialize and cache the list of compute capabilites and the original ONNX model itself

In ComputeCapability(), serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key

In Compile(), serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key

EP context model creation

What to cache in EP context nodes

Requirements

glen-amd commented Jun 4, 2024

HectorSVC commented Jun 6, 2024

azure-pipelines bot commented Jun 6, 2024

HectorSVC commented Jun 6, 2024

azure-pipelines bot commented Jun 6, 2024

HectorSVC commented Jul 11, 2024

azure-pipelines bot commented Jul 11, 2024

HectorSVC commented Jul 11, 2024

azure-pipelines bot commented Jul 11, 2024

HectorSVC commented Jul 11, 2024

HectorSVC commented Jul 11, 2024

azure-pipelines bot commented Jul 11, 2024

azure-pipelines bot commented Jul 11, 2024

HectorSVC commented Jul 12, 2024

HectorSVC commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

HectorSVC commented Jul 12, 2024

HectorSVC commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

azure-pipelines bot commented Jul 12, 2024

HectorSVC commented Jul 13, 2024

azure-pipelines bot commented Jul 13, 2024

glen-amd commented Jun 4, 2024 •

edited

Loading

In `ComputeCapability()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key

In `Compile()`, serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key