Skip to content

Align generate-model-package CLI with onnxruntime-genai#2495

Merged
xiaoyu-work merged 11 commits into
mainfrom
modelpkg
Jun 5, 2026
Merged

Align generate-model-package CLI with onnxruntime-genai#2495
xiaoyu-work merged 11 commits into
mainfrom
modelpkg

Conversation

@xiaoyu-work
Copy link
Copy Markdown
Collaborator

@xiaoyu-work xiaoyu-work commented Jun 2, 2026

This pull request updates the test suite for the model packaging CLI to reflect a new package layout and metadata schema. The changes remove the generation and validation of variant.json files, update how execution provider (EP) compatibility and inference settings are represented, and ensure external data blobs are handled per-variant rather than being deduplicated. The tests now check for the presence of new fields and files, and for the absence of deprecated ones.

Test updates for new package schema and layout:

  • Removed all references to variant.json; tests now assert that this file is not emitted, and instead check for the presence of model.onnx and, where applicable, genai_config_overlay.json in variant directories. [1] [2] [3] [4] [5] [6]

  • Updated assertions for metadata structure:

    • EP compatibility is now represented inline within the variants mapping, not as an ep_compatibility array. [1] [2] [3] [4] [5]
    • Added checks for new manifest and metadata fields, such as schema_version, component_name, package_name, package_version, and configs_dir. [1] [2] [3]

Inference settings and overlay configuration:

  • Runtime and provider options are now placed in genai_config_overlay.json instead of variant.json, and tests verify the correct structure and location of these overlays. [1] [2] [3]

  • Added and updated tests to ensure overlays are omitted for variants without options, and that provider options are matched to the correct EP by name. [1] [2]

External data handling:

  • Shared weights deduplication has been removed; tests confirm that each variant keeps its own external data blob inline and that no shared_weights directory or variant.json is created. [1] [2] [3]

Naming and documentation improvements:

  • Test and comment names have been updated for clarity, reflecting the new schema and file layout. [1] [2]

These changes ensure the test suite accurately validates the new packaging format and metadata conventions.

Update metadata.json to inline EP info (single EP per variant) with
schema_version and component_name; rename compatibility list to a single
compatibility_string passthrough. Emit genai_config_overlay.json carrying
per-variant session_options/provider_options as an RFC-7386 merge patch keyed
by the genai role resolved from the base genai_config. Add package_name,
package_version and configs_dir to manifest.json.

The v4 format removes variant.json and has no cross-variant weight-sharing
mechanism, so drop variant.json emission and shared_weights deduplication:
each variant directory now keeps its ONNX file and external-data blobs inline
so stock ORT can load it directly.
Describe the model-package writer as a single current behavior: remove
references to a specific ORT schema version (v4) and to fields or files
that were removed/changed elsewhere (e.g. variant.json), so the docstrings,
comments, and test comments read as one self-contained feature.
@xiaoyu-work xiaoyu-work marked this pull request as ready for review June 3, 2026 23:12
Copilot AI review requested due to automatic review settings June 3, 2026 23:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to align Olive’s generate-model-package packaging behavior and its tests with the ONNX Runtime / ORT-GenAI model package schema by updating the emitted package layout and JSON schemas (manifest/metadata), removing variant.json, and switching per-variant runtime settings to genai_config_overlay.json.

Changes:

  • Update package structure and schema: introduce models/ component root, add manifest fields (package_name, package_version, configs_dir) and metadata fields (schema_version, component_name), and represent EP compatibility inline per variant.
  • Remove external-data dedup (shared_weights/) and ensure external data blobs are kept inline per-variant.
  • Replace variant.json with optional per-variant genai_config_overlay.json and update tests accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 15 comments.

File Description
test/cli/test_model_package.py Updates test expectations for the new package schema/layout and overlay behavior (currently needs additional path corrections for models/ + .ortpackage).
olive/cli/model_package.py Implements new .ortpackage output naming, models/ layout, updated manifest/metadata schemas, per-variant overlays, and inline external-data copying.

Comment thread olive/cli/model_package.py
Comment thread olive/cli/model_package.py Outdated
Comment thread test/cli/test_model_package.py
Comment thread test/cli/test_model_package.py Outdated
Comment thread test/cli/test_model_package.py Outdated
Comment thread test/cli/test_model_package.py Outdated
Comment thread test/cli/test_model_package.py Outdated
Comment thread test/cli/test_model_package.py Outdated
Comment thread test/cli/test_model_package.py
Comment thread test/cli/test_model_package.py Outdated
Comment thread test/cli/test_model_package.py Fixed
xiaoyu-work and others added 5 commits June 4, 2026 20:05
ORT-GenAI's SetProviderSessionOptions dispatch table has no CPU handler
(src/models/session_options.cpp:150-159); the prior sentinel entry
[{"CPU": {}}] only triggered a V1 no-op registration. ORT
InferenceSession implicitly registers the CPU EP when no other provider
is selected (onnxruntime/core/session/inference_session.cc), so emitting
an empty provider_options list for CPU is sufficient and matches the
convention used by reference ORT model packages.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
OpenVINO/QNN variants ship a tiny EPContext stub .onnx plus same-stem .xml/.bin
sidecars that the loader expects to find next to it. These sidecars are not
referenced through ONNX initializer external_data, so the previous copy path
missed them and the produced variants were unloadable.

After copying each .onnx and its external-data blobs, walk the source
directory once more and copy any remaining files whose suffix is one of the
known model suffixes (.onnx/.bin/.xml/.data). Each Olive source directory
holds the artifacts for a single variant, so any model-suffix file there
belongs next to the ONNX. Skips duplicates already copied via external_data.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Variants of the same model can legitimately differ on a small set of
model-level scalars — most importantly `context_length`, which on
OpenVINO NPU is capped at the prompt+response budget (e.g. 4224) but
on GPU/CPU runs at the full pretrained limit (e.g. 131072). Similar
applies to `pad_token_id`, `bos_token_id`, `eos_token_id`, and
`type`. The base genai_config can only hold one value for each of
these, so without per-variant overlay the merged config would silently
use whichever source happened to win the base.

This change lifts those fields verbatim from each variant's source
`genai_config.json` into its overlay, and strips them from the
base. The strip is required for the array field `eos_token_id`:
GenAI's overlay merge appends arrays rather than replacing them, so a
base entry would duplicate (not override) the variant entry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…del-package

Adds support for two source shapes that previously couldn't be packaged:

1. **Pipeline (multi-stage) sources** — e.g. QNN exports that ship a
   single source dir containing 4 ONNX stages (embedding,
   prompt-processor, token-generator, transformer-head) plus QNN context
   binaries. The pipeline structure lives in the source's
   `genai_config.json` at `model.<role>.pipeline`. The packager:
   - lifts every stage's ONNX file from the source into the variant dir
     (the existing sidecar sweep already takes care of the QNN .bin
     context binaries that sit next to the ONNX files);
   - writes the pipeline array verbatim into the variant overlay so
     each stage keeps its own filename and EP-specific
     `session_options.provider_options` (htp_performance_mode,
     soc_model, etc.);
   - strips `pipeline` from the base genai_config because GenAI's
     overlay parser appends arrays rather than replacing them — a
     pipeline in both base and overlay would double every stage.

2. **GenAI-shaped sources without `model_config.json`** — sources
   downloaded directly from a model hub instead of produced by an Olive
   workflow. `_read_model_config` now synthesises a minimal config
   from `genai_config.json` + a directory scan so the rest of the
   packager stays single-codepath. As a bonus, an existing
   `model_config.json` whose `model_path` is unreachable (a common
   state when artifacts are copied between hosts) is repaired in-memory
   by repointing it at the source directory.

Supporting changes:
- `_extract_task` now honours `model_attributes.task` and falls
  back to inspecting the source genai_config for a `decoder` role, so
  the component directory ends up as `models/decoder/...` (not the
  generic `models/model/...`).
- EP derivation prefers the source genai's
  `session_options.provider_options` alias over the variant-name
  heuristic — e.g. a directory named `vitia_npu` correctly resolves
  to VitisAI rather than QNN (the `npu` substring would otherwise win
  by accident).
- `_VARIANT_NAME_EP_HINTS` gains `vitisai`/`vitia` entries ahead
  of `npu` so the heuristic itself is also unambiguous.

Validated end-to-end on Phi-4-mini-reasoning by packaging:
- qnn_npu  → 2.8 GB pipeline package (4 ONNX + 4 .bin + per-stage
  overlay options preserved)
- vitia_npu → 3.2 GB flat package (single ONNX + VitisAI overlay)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make `olive generate-model-package` purely genai_config-driven. The
source's `genai_config.json` is now the only declarative input the
packager reads: it names the role(s), their ONNX filename(s) (flat) or
pipeline stages (multi-stage), provider_options (and thus the ORT EP),
and the model-level scalars (context_length etc.) that legitimately
diverge per variant.

Behavior changes:

- `--source` now requires `genai_config.json`. `model_config.json` and
  the older ONNXModel/CompositeModel synthesis paths are removed; HF
  Hub task lookup is dropped too.
- Multi-role flat sources (e.g. VLMs with vision + embedding + decoder
  ONNXs in one dir) now have EVERY role's `filename` and
  `session_options` lifted into the per-variant
  `genai_config_overlay.json`, not just the primary role. Previously
  the loader would lose vision/embedding filenames at load time.
- Base `configs/genai_config.json` injects `component=<comp>` markers
  for every role found in any variant's source genai_config.
- `device` is no longer emitted in variant metadata (it was sourced
  from `model_attributes` which is no longer read).

Test changes:

- `_create_source_dir` now writes a minimal genai_config.json (no
  model_config.json) parameterised by EP.
- Removed TestMixedSourceTypes / TestCompositeBuild /
  TestUnsupportedModelType (composite + model_config-only behaviors).
- Added TestVLMMultiRoleOverlay covering the multi-role overlay
  restoration + per-role component-marker injection.
- TestPipelineSources tightened: error message updated, the stale
  model_config-repointing test dropped.

Validated end-to-end by repacking 7 Phi-4-mini-reasoning variants (CPU,
CUDA, OpenVINO GPU/NPU, QNN NPU, VitisAI NPU, WebGPU) and 2
qwen3-vl-2b-instruct variants (CPU, CUDA) from genai_config-only
sources. QNN pipeline retains all 4 stages and .bin context binaries;
each EP overlay carries its correct provider_options; qwen3-vl overlays
carry vision + embedding + decoder filenames.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xiaoyu-work xiaoyu-work merged commit 120972c into main Jun 5, 2026
13 checks passed
@xiaoyu-work xiaoyu-work deleted the modelpkg branch June 5, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants