Align generate-model-package CLI with onnxruntime-genai#2495
Merged
Conversation
Update metadata.json to inline EP info (single EP per variant) with schema_version and component_name; rename compatibility list to a single compatibility_string passthrough. Emit genai_config_overlay.json carrying per-variant session_options/provider_options as an RFC-7386 merge patch keyed by the genai role resolved from the base genai_config. Add package_name, package_version and configs_dir to manifest.json. The v4 format removes variant.json and has no cross-variant weight-sharing mechanism, so drop variant.json emission and shared_weights deduplication: each variant directory now keeps its ONNX file and external-data blobs inline so stock ORT can load it directly.
Describe the model-package writer as a single current behavior: remove references to a specific ORT schema version (v4) and to fields or files that were removed/changed elsewhere (e.g. variant.json), so the docstrings, comments, and test comments read as one self-contained feature.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR aims to align Olive’s generate-model-package packaging behavior and its tests with the ONNX Runtime / ORT-GenAI model package schema by updating the emitted package layout and JSON schemas (manifest/metadata), removing variant.json, and switching per-variant runtime settings to genai_config_overlay.json.
Changes:
- Update package structure and schema: introduce
models/component root, add manifest fields (package_name,package_version,configs_dir) and metadata fields (schema_version,component_name), and represent EP compatibility inline per variant. - Remove external-data dedup (
shared_weights/) and ensure external data blobs are kept inline per-variant. - Replace
variant.jsonwith optional per-variantgenai_config_overlay.jsonand update tests accordingly.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 15 comments.
| File | Description |
|---|---|
test/cli/test_model_package.py |
Updates test expectations for the new package schema/layout and overlay behavior (currently needs additional path corrections for models/ + .ortpackage). |
olive/cli/model_package.py |
Implements new .ortpackage output naming, models/ layout, updated manifest/metadata schemas, per-variant overlays, and inline external-data copying. |
ORT-GenAI's SetProviderSessionOptions dispatch table has no CPU handler
(src/models/session_options.cpp:150-159); the prior sentinel entry
[{"CPU": {}}] only triggered a V1 no-op registration. ORT
InferenceSession implicitly registers the CPU EP when no other provider
is selected (onnxruntime/core/session/inference_session.cc), so emitting
an empty provider_options list for CPU is sufficient and matches the
convention used by reference ORT model packages.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
OpenVINO/QNN variants ship a tiny EPContext stub .onnx plus same-stem .xml/.bin sidecars that the loader expects to find next to it. These sidecars are not referenced through ONNX initializer external_data, so the previous copy path missed them and the produced variants were unloadable. After copying each .onnx and its external-data blobs, walk the source directory once more and copy any remaining files whose suffix is one of the known model suffixes (.onnx/.bin/.xml/.data). Each Olive source directory holds the artifacts for a single variant, so any model-suffix file there belongs next to the ONNX. Skips duplicates already copied via external_data. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Variants of the same model can legitimately differ on a small set of model-level scalars — most importantly `context_length`, which on OpenVINO NPU is capped at the prompt+response budget (e.g. 4224) but on GPU/CPU runs at the full pretrained limit (e.g. 131072). Similar applies to `pad_token_id`, `bos_token_id`, `eos_token_id`, and `type`. The base genai_config can only hold one value for each of these, so without per-variant overlay the merged config would silently use whichever source happened to win the base. This change lifts those fields verbatim from each variant's source `genai_config.json` into its overlay, and strips them from the base. The strip is required for the array field `eos_token_id`: GenAI's overlay merge appends arrays rather than replacing them, so a base entry would duplicate (not override) the variant entry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…del-package
Adds support for two source shapes that previously couldn't be packaged:
1. **Pipeline (multi-stage) sources** — e.g. QNN exports that ship a
single source dir containing 4 ONNX stages (embedding,
prompt-processor, token-generator, transformer-head) plus QNN context
binaries. The pipeline structure lives in the source's
`genai_config.json` at `model.<role>.pipeline`. The packager:
- lifts every stage's ONNX file from the source into the variant dir
(the existing sidecar sweep already takes care of the QNN .bin
context binaries that sit next to the ONNX files);
- writes the pipeline array verbatim into the variant overlay so
each stage keeps its own filename and EP-specific
`session_options.provider_options` (htp_performance_mode,
soc_model, etc.);
- strips `pipeline` from the base genai_config because GenAI's
overlay parser appends arrays rather than replacing them — a
pipeline in both base and overlay would double every stage.
2. **GenAI-shaped sources without `model_config.json`** — sources
downloaded directly from a model hub instead of produced by an Olive
workflow. `_read_model_config` now synthesises a minimal config
from `genai_config.json` + a directory scan so the rest of the
packager stays single-codepath. As a bonus, an existing
`model_config.json` whose `model_path` is unreachable (a common
state when artifacts are copied between hosts) is repaired in-memory
by repointing it at the source directory.
Supporting changes:
- `_extract_task` now honours `model_attributes.task` and falls
back to inspecting the source genai_config for a `decoder` role, so
the component directory ends up as `models/decoder/...` (not the
generic `models/model/...`).
- EP derivation prefers the source genai's
`session_options.provider_options` alias over the variant-name
heuristic — e.g. a directory named `vitia_npu` correctly resolves
to VitisAI rather than QNN (the `npu` substring would otherwise win
by accident).
- `_VARIANT_NAME_EP_HINTS` gains `vitisai`/`vitia` entries ahead
of `npu` so the heuristic itself is also unambiguous.
Validated end-to-end on Phi-4-mini-reasoning by packaging:
- qnn_npu → 2.8 GB pipeline package (4 ONNX + 4 .bin + per-stage
overlay options preserved)
- vitia_npu → 3.2 GB flat package (single ONNX + VitisAI overlay)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make `olive generate-model-package` purely genai_config-driven. The source's `genai_config.json` is now the only declarative input the packager reads: it names the role(s), their ONNX filename(s) (flat) or pipeline stages (multi-stage), provider_options (and thus the ORT EP), and the model-level scalars (context_length etc.) that legitimately diverge per variant. Behavior changes: - `--source` now requires `genai_config.json`. `model_config.json` and the older ONNXModel/CompositeModel synthesis paths are removed; HF Hub task lookup is dropped too. - Multi-role flat sources (e.g. VLMs with vision + embedding + decoder ONNXs in one dir) now have EVERY role's `filename` and `session_options` lifted into the per-variant `genai_config_overlay.json`, not just the primary role. Previously the loader would lose vision/embedding filenames at load time. - Base `configs/genai_config.json` injects `component=<comp>` markers for every role found in any variant's source genai_config. - `device` is no longer emitted in variant metadata (it was sourced from `model_attributes` which is no longer read). Test changes: - `_create_source_dir` now writes a minimal genai_config.json (no model_config.json) parameterised by EP. - Removed TestMixedSourceTypes / TestCompositeBuild / TestUnsupportedModelType (composite + model_config-only behaviors). - Added TestVLMMultiRoleOverlay covering the multi-role overlay restoration + per-role component-marker injection. - TestPipelineSources tightened: error message updated, the stale model_config-repointing test dropped. Validated end-to-end by repacking 7 Phi-4-mini-reasoning variants (CPU, CUDA, OpenVINO GPU/NPU, QNN NPU, VitisAI NPU, WebGPU) and 2 qwen3-vl-2b-instruct variants (CPU, CUDA) from genai_config-only sources. QNN pipeline retains all 4 stages and .bin context binaries; each EP overlay carries its correct provider_options; qwen3-vl overlays carry vision + embedding + decoder filenames. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jambayk
approved these changes
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request updates the test suite for the model packaging CLI to reflect a new package layout and metadata schema. The changes remove the generation and validation of
variant.jsonfiles, update how execution provider (EP) compatibility and inference settings are represented, and ensure external data blobs are handled per-variant rather than being deduplicated. The tests now check for the presence of new fields and files, and for the absence of deprecated ones.Test updates for new package schema and layout:
Removed all references to
variant.json; tests now assert that this file is not emitted, and instead check for the presence ofmodel.onnxand, where applicable,genai_config_overlay.jsonin variant directories. [1] [2] [3] [4] [5] [6]Updated assertions for metadata structure:
variantsmapping, not as anep_compatibilityarray. [1] [2] [3] [4] [5]schema_version,component_name,package_name,package_version, andconfigs_dir. [1] [2] [3]Inference settings and overlay configuration:
Runtime and provider options are now placed in
genai_config_overlay.jsoninstead ofvariant.json, and tests verify the correct structure and location of these overlays. [1] [2] [3]Added and updated tests to ensure overlays are omitted for variants without options, and that provider options are matched to the correct EP by name. [1] [2]
External data handling:
shared_weightsdirectory orvariant.jsonis created. [1] [2] [3]Naming and documentation improvements:
These changes ensure the test suite accurately validates the new packaging format and metadata conventions.