Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VitisAI EP Context Model #20926

Merged
merged 145 commits into from
Jul 13, 2024
Merged

Conversation

glen-amd
Copy link
Contributor

@glen-amd glen-amd commented Jun 4, 2024

Why so many commits

  • Runtime debugging - which is necessary
  • Three different approaches to EP context model - as a result testing back and forth
  • Windows compatibility issues - this development has been done on Linux for convenience

"Open" (?) questions

VitisAI EP made three variant implementations that have respective pros and conds (and of course we can combine them)

Serialize and cache the list of compute capabilites and the original ONNX model itself

In ComputeCapability(), serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key

In Compile(), serialize and cache the backend compilation cache and the related necessary cache info such as cache dir and cache key

EP context model creation

  • Precondition
    Session option configuration kOrtSessionOptionEpContextEnable (aka "ep.context_enable") is enabled.
  • Approach 1
    • Steps
      1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext").
      2. EP implements/overrides IExecutionProvider::GetEpContextNodes() method.
      3. ONNXRT core creates an EP context model and saves/dumps it.
        • CreateEpContextModel() in the file "graph_partitioner.cc"
          • In get_ep_context_node(), Node::Name() is used to check whether a node is an EP context node. This limits that EP model creation can only happen in IExecutionProvider::Compile().
          • The workaround is (1) not implementing IExecutionProvider::GetEpContextNodes() and (2) dumping the EP context model by EP itself.
      4. Optionally, EP can also dump the EP context model it created by iteself.
    • Examples
      • QNNExecutionProvider
      • VitisAIExecutionProvider
  • Approach 2
    • Steps
      1. EP creates an ONNX model whose main graph has EP context nodes (i.e., node type is "EPContext").
      2. EP does NOT implement IExecutionProvider::GetEpContextNodes() at all.
      3. EP dumps the EP context model it created.
    • Examples
      • TensorrtExecutionProvider
        • UPDATES
          • TRT EP is switching to leveraging IExecutionProvider::GetEpContextNodes()
      • OpenVINOExecutionProvider (?)

What to cache in EP context nodes

  • Non Compilation based EPs
    • Examples
      • VitisAIExecutionProvider
    • Characteristics
      • Heavy lifting work happens in IExecutionProvider::GetCapability().
    • Preconditions
      • IExecutionProvider::GetCapability() is only called once by ONNXRT.
    • Cache content
      • Serialization of a list of ComputeCapability
        • Not EP-specific
        • Serialized using onnx::FunctionProto
      • EP-specific cache
  • Compilation based EPs
    • Examples
      • QNNExecutionProvider
      • TensorrtExecutionProvider
      • MIGraphXExecutionProvider
      • OpenVINOExecutionProvider
    • Cache content
      • EP-specific cache

Requirements

  • Offline / AOT compilation of ONNX models with EP context cache
  • Compile somewhere, run everywhere
  • Pseudo code with brief explanation
    GenerateCache(original_onnx_file, cache_onnx_file)
      model_buffer = load(original_onnx_file) --> Load the original ONNX model file
      model_buffer = decrypt(model_buffer)
      session_options = { kOrtSessionOptionEpContextEnable: true, kOrtSessionOptionEpContextFilePath: temp_file } --> Set necessary configs
      Ort::CreateSessionFromArray(model_buffer, session_options) --> The new ONNX model with EP context is created and dumped into the user specified file "temp_file"
      temp_buffer = encrypt(temp_file)
      write(temp_buffer, cache_onnx_file) --> Write the encypted context of "temp_file" into the "cache_onnx_file" file
    
    
    InitializeInferenceSession(cache_onnx_file)
      model_buffer = load(cache_onnx_file) --> Load the ONNX model with EP context from the file generated in the previous step
      model_buffer = decrypt(model_buffer)
      session_options = { }
      Ort::CreateSessionFromArray(model_buffer, session_options) --> Create and initalize an session with the EP context model
    
  • Python code with comments
    • EP context model creation
      import onnxruntime as onnxrt
      
      
      # Session options for creating an ONNX model with EP context cache.
      sess_opts = onnxrt.SessionOptions()
      
      # Verbose.
      sess_opts.log_severity_level = 0
      
      # This is REQUIRED.
      sess_opts.add_session_config_entry("ep.context_enable", "1")
      # This is OPTIONAL.
      # Either an absolute path (preferred for now) or a relative path (WIP) is okay.
      # sess_opts.add_session_config_entry("ep.context_file_path", "/some/path/to/original_model_ctx.onnx")
      # This is OPTIONAL.
      sess_opts.add_session_config_entry("ep.context_embed_mode", "1")
      
      orig_model_location = "/some/path/to/original_model.onnx"
      sess = onnxrt.InferenceSession(orig_model_location, sess_opts, providers=["VitisAIExecutionProvider"], provider_options=[])
    • Inference run with an EP context model
      import onnxruntime as onnxrt
      
      
      # Session options for creating an ONNX model with EP context cache.
      sess_opts = onnxrt.SessionOptions()
      
      # Default EP context model path.
      # ep_ctx_model_location = "/some/path/to/origina_model.onnx_ctx.onnx"
      # User configured EP context model path.
      ep_ctx_model_location = "/some/path/to/origina_model_ctx.onnx"
      sess = onnxrt.InferenceSession(ep_ctx_model_location, sess_opts, providers=["VitisAIExecutionProvider"], provider_options=[])
      
      model_inputs = {}
      run_opts = onnxrt.RunOptions()
      # Verbose.
      run_opts.log_severity_level = 1
      sess.run(None, model_inputs, run_opts)

@glen-amd
Copy link
Contributor Author

glen-amd commented Jun 4, 2024

@microsoft-github-policy-service agree

@glen-amd glen-amd marked this pull request as ready for review June 4, 2024 23:46
@jywu-msft jywu-msft requested a review from HectorSVC June 4, 2024 23:49
@glen-amd glen-amd changed the title Vitisai ep context model 2 VitisAI EP Context Model Jun 5, 2024
@HectorSVC
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@HectorSVC
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@HectorSVC HectorSVC self-assigned this Jun 6, 2024
…. This one is tightly coupled with the closed-source backend of VitisAI EP. 2) Interim commit. Algorithm implementation selection for MD5 is TODO.
@HectorSVC
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

Copy link

Pull request contains merge conflicts.

@HectorSVC
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

Copy link

Pull request contains merge conflicts.

@HectorSVC
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

@HectorSVC
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@HectorSVC
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

@HectorSVC
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@HectorSVC
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

@HectorSVC
Copy link
Contributor

/azp run Big Models,Linux Android Emulator QNN CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@HectorSVC
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@HectorSVC HectorSVC merged commit 281ed8c into microsoft:main Jul 13, 2024
64 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants