Validate model task in config.#723
Conversation
…to chao/validatetask
…to chao/validatetask
timenick
left a comment
There was a problem hiding this comment.
Three substantive findings. The preflight check works, but two of them are tied to recent changes elsewhere in the repo (#719 dedupe, CLAUDE.md import rules), and one is a coverage-asymmetry question that applies across CLI entrypoints.
zhenchaoni
left a comment
There was a problem hiding this comment.
A few concerns with the current acceptance semantics that I think are worth addressing before merge, because they interact with how the rest of the CLI resolves tasks.
1. normalize_task on both sides collapses modality
The check is:
normalized_supported = {normalize_task(t) for t in supported_tasks}
if normalize_task(task) in normalized_supported:
return hf_confignormalize_task wraps TasksManager.map_from_synonym, which treats image-feature-extraction as a synonym of feature-extraction. Collapsing both sides through it means the gate cannot distinguish text vs image FE. Two consequences:
--task image-feature-extractionagainst a text-only arch (e.g.bert-base-uncased) passes — the normalized formfeature-extractionis in BERT's supported list.--task feature-extractionagainst a vision-only arch (e.g. dinov2) passes — vision arches registerfeature-extraction(Optimum-canonical).
The PR title says "Validate model task" but the validator can't catch cross-modality mismatches. Worse, the second case silently produces wrong downstream behavior: our four task registries (_EVALUATOR_REGISTRY, TASK_DATASET_MAPPING, TASK_SCHEMAS, TASK_TO_WINML_CLASS) are keyed by HF-pipeline task IDs, where feature-extraction is text-only (FeatureExtractionPipeline requires a PreTrainedTokenizer) and image-feature-extraction is a separate pipeline (ImageFeatureExtractionPipeline). So a vision model with loader.task = "feature-extraction" routes through text logic in eval/quantize.
Suggested change: compare without normalizing, falling back to normalized only when the verbatim check fails. Something like:
if task in supported_tasks:
return hf_config
if normalize_task(task) in {normalize_task(t) for t in supported_tasks}:
# Synonym match — log a hint about the canonical spelling instead of silently accepting
logger.warning("Task %r matches via synonym; consider using canonical %r", task, normalize_task(task))
return hf_config2. HF-pipeline-only task names are falsely rejected
normalize_task doesn't consult TASK_SYNONYM_EXTENSIONS in io.py. Names handled there (next-sentence-prediction → text-classification, mask-generation preserved) won't be in supported_tasks and won't normalize to something in it. Same for sentence-similarity (HF-only, not in Optimum's synonym map). The if not supported_tasks: return hf_config escape only fires when Optimum knows nothing about the arch — it doesn't help for mainstream text models.
Repro: winml build -m bert-base-uncased --task next-sentence-prediction works pre-PR (handled in export/io.py) and will be rejected post-PR.
Suggested change: short-circuit on names that the rest of the codebase knows about:
from ..loader.task import KNOWN_TASKS
from ..export.io import TASK_SYNONYM_EXTENSIONS
if task in TASK_SYNONYM_EXTENSIONS or task in KNOWN_TASKS:
# Accept names that other CLI commands accept; let downstream resolution
# raise ONNXConfigNotFoundError if the arch can't actually export it.
return hf_config3. Acceptance set now diverges from winml config / winml export / winml perf
The docstring notes that the other commands rely on resolve_cfg → ONNXConfigNotFoundError. Those paths use resolve_task_and_model_class + KNOWN_TASKS, which is broader and lists image-feature-extraction as a distinct entry. Result: a config.json produced by winml config (and accepted by winml export) can be rejected by winml build. That asymmetry will surface as bug reports.
Suggested change: either widen the acceptance set in this PR to match the rest of the CLI (combination of fixes 1 + 2), or move the validation up into resolve_task_and_model_class so all commands share one definition.
4. Test coverage
The current test test_rejects_incompatible_config_task_and_model only exercises the rejection path. Suggest adding:
- Vision-arch +
--task feature-extraction→ currently passes the gate (documenting the limitation). - Text-arch +
--task image-feature-extraction→ currently passes (documenting the limitation, or fixing). - Text-arch +
--task next-sentence-prediction→ currently rejected (regression vs. existing behavior). - Same task accepted by
winml config -m ... --task ...should also be accepted bywinml build.
Summary
The PR is solving a real problem, but the gate's notion of "valid" is Optimum's collapse, which is the wrong granularity for the four HF-pipeline-keyed registries that consume loader.task downstream. As-is, this introduces false rejections for some HF-pipeline names and silently accepts modality-mismatched inputs. Happy to pair on a follow-up if useful — there's overlapping work on the quantize/eval side where we're hitting the same canonical-name boundary.
…to chao/validatetask
No description provided.