Skip to content

Validate model task in config.#723

Merged
chinazhangchao merged 21 commits into
mainfrom
chao/validatetask
May 29, 2026
Merged

Validate model task in config.#723
chinazhangchao merged 21 commits into
mainfrom
chao/validatetask

Conversation

@chinazhangchao
Copy link
Copy Markdown
Contributor

No description provided.

@chinazhangchao chinazhangchao changed the title Chao/validatetask Validate model task in config. May 25, 2026
@chinazhangchao chinazhangchao marked this pull request as ready for review May 25, 2026 07:56
@chinazhangchao chinazhangchao requested a review from a team as a code owner May 25, 2026 07:56
timenick

This comment was marked as outdated.

Copy link
Copy Markdown
Collaborator

@timenick timenick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three substantive findings. The preflight check works, but two of them are tied to recent changes elsewhere in the repo (#719 dedupe, CLAUDE.md import rules), and one is a coverage-asymmetry question that applies across CLI entrypoints.

Comment thread src/winml/modelkit/loader/config.py Outdated
Comment thread src/winml/modelkit/commands/build.py Outdated
Comment thread src/winml/modelkit/commands/build.py
@chinazhangchao chinazhangchao requested a review from timenick May 26, 2026 05:48
Copy link
Copy Markdown
Member

@zhenchaoni zhenchaoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few concerns with the current acceptance semantics that I think are worth addressing before merge, because they interact with how the rest of the CLI resolves tasks.

1. normalize_task on both sides collapses modality

The check is:

normalized_supported = {normalize_task(t) for t in supported_tasks}
if normalize_task(task) in normalized_supported:
    return hf_config

normalize_task wraps TasksManager.map_from_synonym, which treats image-feature-extraction as a synonym of feature-extraction. Collapsing both sides through it means the gate cannot distinguish text vs image FE. Two consequences:

  • --task image-feature-extraction against a text-only arch (e.g. bert-base-uncased) passes — the normalized form feature-extraction is in BERT's supported list.
  • --task feature-extraction against a vision-only arch (e.g. dinov2) passes — vision arches register feature-extraction (Optimum-canonical).

The PR title says "Validate model task" but the validator can't catch cross-modality mismatches. Worse, the second case silently produces wrong downstream behavior: our four task registries (_EVALUATOR_REGISTRY, TASK_DATASET_MAPPING, TASK_SCHEMAS, TASK_TO_WINML_CLASS) are keyed by HF-pipeline task IDs, where feature-extraction is text-only (FeatureExtractionPipeline requires a PreTrainedTokenizer) and image-feature-extraction is a separate pipeline (ImageFeatureExtractionPipeline). So a vision model with loader.task = "feature-extraction" routes through text logic in eval/quantize.

Suggested change: compare without normalizing, falling back to normalized only when the verbatim check fails. Something like:

if task in supported_tasks:
    return hf_config
if normalize_task(task) in {normalize_task(t) for t in supported_tasks}:
    # Synonym match — log a hint about the canonical spelling instead of silently accepting
    logger.warning("Task %r matches via synonym; consider using canonical %r", task, normalize_task(task))
    return hf_config

2. HF-pipeline-only task names are falsely rejected

normalize_task doesn't consult TASK_SYNONYM_EXTENSIONS in io.py. Names handled there (next-sentence-predictiontext-classification, mask-generation preserved) won't be in supported_tasks and won't normalize to something in it. Same for sentence-similarity (HF-only, not in Optimum's synonym map). The if not supported_tasks: return hf_config escape only fires when Optimum knows nothing about the arch — it doesn't help for mainstream text models.

Repro: winml build -m bert-base-uncased --task next-sentence-prediction works pre-PR (handled in export/io.py) and will be rejected post-PR.

Suggested change: short-circuit on names that the rest of the codebase knows about:

from ..loader.task import KNOWN_TASKS
from ..export.io import TASK_SYNONYM_EXTENSIONS

if task in TASK_SYNONYM_EXTENSIONS or task in KNOWN_TASKS:
    # Accept names that other CLI commands accept; let downstream resolution
    # raise ONNXConfigNotFoundError if the arch can't actually export it.
    return hf_config

3. Acceptance set now diverges from winml config / winml export / winml perf

The docstring notes that the other commands rely on resolve_cfgONNXConfigNotFoundError. Those paths use resolve_task_and_model_class + KNOWN_TASKS, which is broader and lists image-feature-extraction as a distinct entry. Result: a config.json produced by winml config (and accepted by winml export) can be rejected by winml build. That asymmetry will surface as bug reports.

Suggested change: either widen the acceptance set in this PR to match the rest of the CLI (combination of fixes 1 + 2), or move the validation up into resolve_task_and_model_class so all commands share one definition.

4. Test coverage

The current test test_rejects_incompatible_config_task_and_model only exercises the rejection path. Suggest adding:

  • Vision-arch + --task feature-extraction → currently passes the gate (documenting the limitation).
  • Text-arch + --task image-feature-extraction → currently passes (documenting the limitation, or fixing).
  • Text-arch + --task next-sentence-prediction → currently rejected (regression vs. existing behavior).
  • Same task accepted by winml config -m ... --task ... should also be accepted by winml build.

Summary

The PR is solving a real problem, but the gate's notion of "valid" is Optimum's collapse, which is the wrong granularity for the four HF-pipeline-keyed registries that consume loader.task downstream. As-is, this introduces false rejections for some HF-pipeline names and silently accepts modality-mismatched inputs. Happy to pair on a follow-up if useful — there's overlapping work on the quantize/eval side where we're hitting the same canonical-name boundary.

@chinazhangchao chinazhangchao requested a review from zhenchaoni May 28, 2026 06:39
@chinazhangchao chinazhangchao merged commit bb1a67c into main May 29, 2026
9 checks passed
@chinazhangchao chinazhangchao deleted the chao/validatetask branch May 29, 2026 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants