Align generate-model-package CLI with onnxruntime-genai by xiaoyu-work · Pull Request #2495 · microsoft/Olive

xiaoyu-work · 2026-06-02T23:36:00Z

This pull request updates the test suite for the model packaging CLI to reflect a new package layout and metadata schema. The changes remove the generation and validation of variant.json files, update how execution provider (EP) compatibility and inference settings are represented, and ensure external data blobs are handled per-variant rather than being deduplicated. The tests now check for the presence of new fields and files, and for the absence of deprecated ones.

Test updates for new package schema and layout:

Removed all references to variant.json; tests now assert that this file is not emitted, and instead check for the presence of model.onnx and, where applicable, genai_config_overlay.json in variant directories. [1] [2] [3] [4] [5] [6]
Updated assertions for metadata structure:
- EP compatibility is now represented inline within the variants mapping, not as an ep_compatibility array. [1] [2] [3] [4] [5]
- Added checks for new manifest and metadata fields, such as schema_version, component_name, package_name, package_version, and configs_dir. [1] [2] [3]

Inference settings and overlay configuration:

Runtime and provider options are now placed in genai_config_overlay.json instead of variant.json, and tests verify the correct structure and location of these overlays. [1] [2] [3]
Added and updated tests to ensure overlays are omitted for variants without options, and that provider options are matched to the correct EP by name. [1] [2]

External data handling:

Shared weights deduplication has been removed; tests confirm that each variant keeps its own external data blob inline and that no shared_weights directory or variant.json is created. [1] [2] [3]

Naming and documentation improvements:

Test and comment names have been updated for clarity, reflecting the new schema and file layout. [1] [2]

These changes ensure the test suite accurately validates the new packaging format and metadata conventions.

Update metadata.json to inline EP info (single EP per variant) with schema_version and component_name; rename compatibility list to a single compatibility_string passthrough. Emit genai_config_overlay.json carrying per-variant session_options/provider_options as an RFC-7386 merge patch keyed by the genai role resolved from the base genai_config. Add package_name, package_version and configs_dir to manifest.json. The v4 format removes variant.json and has no cross-variant weight-sharing mechanism, so drop variant.json emission and shared_weights deduplication: each variant directory now keeps its ONNX file and external-data blobs inline so stock ORT can load it directly.

Describe the model-package writer as a single current behavior: remove references to a specific ORT schema version (v4) and to fields or files that were removed/changed elsewhere (e.g. variant.json), so the docstrings, comments, and test comments read as one self-contained feature.

Copilot

Pull request overview

This PR aims to align Olive’s generate-model-package packaging behavior and its tests with the ONNX Runtime / ORT-GenAI model package schema by updating the emitted package layout and JSON schemas (manifest/metadata), removing variant.json, and switching per-variant runtime settings to genai_config_overlay.json.

Changes:

Update package structure and schema: introduce models/ component root, add manifest fields (package_name, package_version, configs_dir) and metadata fields (schema_version, component_name), and represent EP compatibility inline per variant.
Remove external-data dedup (shared_weights/) and ensure external data blobs are kept inline per-variant.
Replace variant.json with optional per-variant genai_config_overlay.json and update tests accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 15 comments.

File	Description
`test/cli/test_model_package.py`	Updates test expectations for the new package schema/layout and overlay behavior (currently needs additional path corrections for `models/` + `.ortpackage`).
`olive/cli/model_package.py`	Implements new `.ortpackage` output naming, `models/` layout, updated manifest/metadata schemas, per-variant overlays, and inline external-data copying.

ORT-GenAI's SetProviderSessionOptions dispatch table has no CPU handler (src/models/session_options.cpp:150-159); the prior sentinel entry [{"CPU": {}}] only triggered a V1 no-op registration. ORT InferenceSession implicitly registers the CPU EP when no other provider is selected (onnxruntime/core/session/inference_session.cc), so emitting an empty provider_options list for CPU is sufficient and matches the convention used by reference ORT model packages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

OpenVINO/QNN variants ship a tiny EPContext stub .onnx plus same-stem .xml/.bin sidecars that the loader expects to find next to it. These sidecars are not referenced through ONNX initializer external_data, so the previous copy path missed them and the produced variants were unloadable. After copying each .onnx and its external-data blobs, walk the source directory once more and copy any remaining files whose suffix is one of the known model suffixes (.onnx/.bin/.xml/.data). Each Olive source directory holds the artifacts for a single variant, so any model-suffix file there belongs next to the ONNX. Skips duplicates already copied via external_data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Variants of the same model can legitimately differ on a small set of model-level scalars — most importantly `context_length`, which on OpenVINO NPU is capped at the prompt+response budget (e.g. 4224) but on GPU/CPU runs at the full pretrained limit (e.g. 131072). Similar applies to `pad_token_id`, `bos_token_id`, `eos_token_id`, and `type`. The base genai_config can only hold one value for each of these, so without per-variant overlay the merged config would silently use whichever source happened to win the base. This change lifts those fields verbatim from each variant's source `genai_config.json` into its overlay, and strips them from the base. The strip is required for the array field `eos_token_id`: GenAI's overlay merge appends arrays rather than replacing them, so a base entry would duplicate (not override) the variant entry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…del-package Adds support for two source shapes that previously couldn't be packaged: 1. **Pipeline (multi-stage) sources** — e.g. QNN exports that ship a single source dir containing 4 ONNX stages (embedding, prompt-processor, token-generator, transformer-head) plus QNN context binaries. The pipeline structure lives in the source's `genai_config.json` at `model.<role>.pipeline`. The packager: - lifts every stage's ONNX file from the source into the variant dir (the existing sidecar sweep already takes care of the QNN .bin context binaries that sit next to the ONNX files); - writes the pipeline array verbatim into the variant overlay so each stage keeps its own filename and EP-specific `session_options.provider_options` (htp_performance_mode, soc_model, etc.); - strips `pipeline` from the base genai_config because GenAI's overlay parser appends arrays rather than replacing them — a pipeline in both base and overlay would double every stage. 2. **GenAI-shaped sources without `model_config.json`** — sources downloaded directly from a model hub instead of produced by an Olive workflow. `_read_model_config` now synthesises a minimal config from `genai_config.json` + a directory scan so the rest of the packager stays single-codepath. As a bonus, an existing `model_config.json` whose `model_path` is unreachable (a common state when artifacts are copied between hosts) is repaired in-memory by repointing it at the source directory. Supporting changes: - `_extract_task` now honours `model_attributes.task` and falls back to inspecting the source genai_config for a `decoder` role, so the component directory ends up as `models/decoder/...` (not the generic `models/model/...`). - EP derivation prefers the source genai's `session_options.provider_options` alias over the variant-name heuristic — e.g. a directory named `vitia_npu` correctly resolves to VitisAI rather than QNN (the `npu` substring would otherwise win by accident). - `_VARIANT_NAME_EP_HINTS` gains `vitisai`/`vitia` entries ahead of `npu` so the heuristic itself is also unambiguous. Validated end-to-end on Phi-4-mini-reasoning by packaging: - qnn_npu → 2.8 GB pipeline package (4 ONNX + 4 .bin + per-stage overlay options preserved) - vitia_npu → 3.2 GB flat package (single ONNX + VitisAI overlay) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Make `olive generate-model-package` purely genai_config-driven. The source's `genai_config.json` is now the only declarative input the packager reads: it names the role(s), their ONNX filename(s) (flat) or pipeline stages (multi-stage), provider_options (and thus the ORT EP), and the model-level scalars (context_length etc.) that legitimately diverge per variant. Behavior changes: - `--source` now requires `genai_config.json`. `model_config.json` and the older ONNXModel/CompositeModel synthesis paths are removed; HF Hub task lookup is dropped too. - Multi-role flat sources (e.g. VLMs with vision + embedding + decoder ONNXs in one dir) now have EVERY role's `filename` and `session_options` lifted into the per-variant `genai_config_overlay.json`, not just the primary role. Previously the loader would lose vision/embedding filenames at load time. - Base `configs/genai_config.json` injects `component=<comp>` markers for every role found in any variant's source genai_config. - `device` is no longer emitted in variant metadata (it was sourced from `model_attributes` which is no longer read). Test changes: - `_create_source_dir` now writes a minimal genai_config.json (no model_config.json) parameterised by EP. - Removed TestMixedSourceTypes / TestCompositeBuild / TestUnsupportedModelType (composite + model_config-only behaviors). - Added TestVLMMultiRoleOverlay covering the multi-role overlay restoration + per-role component-marker injection. - TestPipelineSources tightened: error message updated, the stale model_config-repointing test dropped. Validated end-to-end by repacking 7 Phi-4-mini-reasoning variants (CPU, CUDA, OpenVINO GPU/NPU, QNN NPU, VitisAI NPU, WebGPU) and 2 qwen3-vl-2b-instruct variants (CPU, CUDA) from genai_config-only sources. QNN pipeline retains all 4 stages and .bin context binaries; each EP overlay carries its correct provider_options; qwen3-vl overlays carry vision + embedding + decoder filenames. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xiaoyu-work force-pushed the modelpkg branch from 94d6441 to d8622c5 Compare June 3, 2026 00:06

xiaoyu-work added 3 commits June 2, 2026 18:55

Fix bugs

d511c1a

add suffix

4318b8a

xiaoyu-work marked this pull request as ready for review June 3, 2026 23:12

Copilot AI review requested due to automatic review settings June 3, 2026 23:12

Copilot started reviewing on behalf of xiaoyu-work June 3, 2026 23:13 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

xiaoyu-work added 2 commits June 4, 2026 00:25

Fix nit

5c2bd16

Fix comments

a000797

github-advanced-security AI found potential problems Jun 4, 2026

View reviewed changes

Comment thread test/cli/test_model_package.py Fixed

xiaoyu-work and others added 5 commits June 4, 2026 20:05

jambayk approved these changes Jun 5, 2026

View reviewed changes

xiaoyu-work merged commit 120972c into main Jun 5, 2026
13 checks passed

xiaoyu-work deleted the modelpkg branch June 5, 2026 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align generate-model-package CLI with onnxruntime-genai#2495

Align generate-model-package CLI with onnxruntime-genai#2495
xiaoyu-work merged 11 commits into
mainfrom
modelpkg

xiaoyu-work commented Jun 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xiaoyu-work commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xiaoyu-work commented Jun 2, 2026 •

edited

Loading