perf(inspect): skip redundant Auto* calls for text-only models by timenick · Pull Request #746 · microsoft/winml-cli

timenick · 2026-05-26T06:27:55Z

Summary

Two extra Strategy-2 gating fixes on top of #719. Profile on cardiffnlp/twitter-roberta-base-sentiment-latest (text-only) showed the inspect command was still taking ~16 s warm — most of it spent on Auto* calls that didn't need to run.

Targeting release/v0.1.0 since #717 / #718 / #719 are already on that release; this is the natural follow-up.

Profile (warm cache, `cardiffnlp/twitter-roberta-base-sentiment-latest`)

Before this PR:

[0]  AutoConfig             0.74s  (parent_hf_config — already deduped by #719)
[1]  AutoProcessor          4.27s  returns RobertaTokenizerFast
[7]  AutoTokenizer          2.22s  ← redundant, AutoProcessor already returned the tokenizer
[11] AutoImageProcessor     1.39s  FAIL (text model has no preprocessor_config.json)
[12] AutoFeatureExtractor   0.64s  FAIL (same)
─────
~16 s total

After this PR:

AutoConfig             5 calls (vs 8)
AutoProcessor          1 call
AutoTokenizer          1 call   (vs 2 — the redundant standalone load is gone)
AutoImageProcessor     0 calls  (skipped — no preprocessor_config.json)
AutoFeatureExtractor   0 calls  (skipped — same)
─────
~12 s total  (~25% faster)

Change

1. Detect when `AutoProcessor` returns a leaf class

For single-modality models, AutoProcessor.from_pretrained returns the leaf class directly — e.g. RoBERTa → RobertaTokenizerFast. Such a return has no .tokenizer wrapper attribute, so the old code couldn't populate tokenizer_class and fell through to a redundant AutoTokenizer.from_pretrained (~2 s warm).

Pattern-match the returned class name (*Tokenizer / *TokenizerFast, *ImageProcessor / *ImageProcessorFast, *FeatureExtractor) and populate the corresponding field. The .tokenizer / .image_processor / .feature_extractor attribute path still wins for genuine multimodal ProcessorMixin returns (CLIP, etc.) — see the test_autoprocessor_with_wrapped_pieces_uses_attributes regression test.

2. `preprocessor_config.json` absence is authoritative

_resolve_processor_from_hub_configs already tries to download preprocessor_config.json. When the hub returns 404, the model has no image processor or feature extractor, period. Surface this as a has_preprocessor_config bool from the helper so the caller can skip the AutoImageProcessor / AutoFeatureExtractor round-trips (~2 s total wasted confirming 404s).

Tests

tests/unit/inspect/test_resolve_processor_gating.py:

test_autoprocessor_returns_tokenizer_fills_tokenizer_class — leaf-class detection populates tokenizer_class from class-name suffix and skips standalone AutoTokenizer
test_autoprocessor_returns_image_processor_fills_image_class — same for *ImageProcessor
test_autoprocessor_returns_feature_extractor_fills_feature_class — same for *FeatureExtractor
test_autoprocessor_with_wrapped_pieces_uses_attributes — multimodal ProcessorMixin with .tokenizer attribute wins over name suffix
test_missing_preprocessor_config_skips_image_and_feature — has_preprocessor_config=False skips AutoImageProcessor / AutoFeatureExtractor

55 targeted tests pass.

Two additional Strategy-2 gating fixes on top of #719: 1. AutoProcessor returns a leaf class for single-modality models. For text-only models (RoBERTa, BERT, ...) AutoProcessor.from_pretrained returns the tokenizer directly — e.g. RobertaTokenizerFast — which has no .tokenizer wrapper attribute. The previous code only set tokenizer_class via the .tokenizer attribute, so the standalone AutoTokenizer.from_pretrained then re-loaded the same tokenizer (~2s warm). Add class-name-suffix detection (Tokenizer / TokenizerFast, ImageProcessor / ImageProcessorFast, FeatureExtractor) so leaf-class results populate the corresponding field directly. The .tokenizer /.image_processor / .feature_extractor attribute path still wins for genuine multimodal processors. 2. preprocessor_config.json absence is authoritative. When the hub returns 404 for preprocessor_config.json the model has neither an image processor nor a feature extractor. Surface this from _resolve_processor_from_hub_configs as a has_preprocessor_config bool and skip the AutoImageProcessor / AutoFeatureExtractor round- trips on text-only models (~2s saved confirming 404s). Measurements (warm cache, cardiffnlp/twitter-roberta-base-sentiment-latest): before this PR: ~16 s wall, AutoConfig x8, AutoTokenizer x2, AutoImageProcessor x1, AutoFeatureExtractor x1 this PR: ~12 s wall, AutoConfig x5, AutoTokenizer x1, AutoImageProcessor x0, AutoFeatureExtractor x0 Tests: 4 new cases in test_resolve_processor_gating.py — three for the leaf-class suffix detection (Tokenizer/ImageProcessor/FeatureExtractor) plus one verifying the multimodal .tokenizer-attribute path still wins over a non-leaf class name; and one new case for the missing-preprocessor-config gate.

DingmaomaoBJTU

Review: `perf(inspect): skip redundant Auto* calls for text-only models`

Overall this is a well-motivated performance PR with solid profiling data. Both optimizations are sensible and the test coverage is good. A few items worth discussing:

Design

1. Growing positional tuple → consider a dataclass/NamedTuple

_resolve_processor_from_hub_configs now returns a 6-element positional tuple. Every caller must destructure it by position, and future additions will make this increasingly fragile. A NamedTuple (or @dataclass) would make the return self-documenting and order-independent:

class HubConfigResult(NamedTuple):
    processor_class: str | None
    tokenizer_class: str | None
    image_processor_class: str | None
    feature_extractor_class: str | None
    has_preprocessor_config: bool
    has_tokenizer_config: bool

Not blocking, but worth doing now while the change is fresh — the tuple was already borderline at 4 elements.

2. has_tokenizer_config is returned but unused

The caller destructures it as _has_tokenizer_config and discards it. If there's no planned consumer, consider not returning it yet (YAGNI). If it is planned, a TODO comment would help.

Correctness

3. has_preprocessor_config set True even on json.JSONDecodeError

In _resolve_processor_from_hub_configs, the flag is set to True right after the hf_hub_download call but before json.load. If preprocessor_config.json exists on the hub but contains invalid JSON, has_preprocessor_config will be True even though no useful data was extracted. This is defensively fine (we don't skip Auto* calls when the file exists), but worth a comment noting this is intentional — a corrupt config file on the hub doesn't mean the model has no image processor.

4. Leaf-class detection: ambiguous class names

The _is_tokenizer_class_name heuristic checks name.endswith(("Tokenizer", "TokenizerFast")). Today this works because HuggingFace naming is well-disciplined, but note that endswith("Tokenizer") already matches *TokenizerFast since "TokenizerFast" doesn't end with "Tokenizer". The tuple is correct — just flagging that the "TokenizerFast" entry is the one doing the real work for fast tokenizers, while "Tokenizer" covers the slow variant. This is fine.

Tests

5. Leaf-class tests don't verify the standalone Auto calls are actually gated*

test_autoprocessor_returns_tokenizer_fills_tokenizer_class patches AutoTokenizer.from_pretrained and asserts call_count == 0, but it also passes try_image_processor=False, try_feature_extractor=False, so those Auto* classes are pre-gated. Consider adding a case where try_tokenizer=True and the leaf class is a tokenizer, but the model actually has no .tokenizer attr — this is the exact real-world scenario (text-only model) and is what the test claims to cover. The current tests do cover this via _make_leaf_instance (no attrs), so this is fine on closer reading — just wanted to confirm.

6. Missing negative test: class name that doesn't match any heuristic

What happens when AutoProcessor returns a leaf class with an unrecognized suffix (e.g. SpeechT5Processor)? The current code silently leaves tokenizer_class/image_processor_class/feature_extractor_class as None and the standalone Auto* calls below fill them in. This is the correct fallback, but a test for it would document the behavior.

Nits

The PR description says "~25% faster" but the numbers show 16s → 12s which is exactly 25%. Might as well say "25% faster" without the tilde.

Overall: solid perf improvement with good test coverage. The tuple→NamedTuple suggestion is the main structural feedback.

…ack test Address PR review feedback: - Drop unused has_tokenizer_config (YAGNI; no consumer). - Replace 5-element tuple return with `_HubConfigResult` NamedTuple so the boolean cannot be silently swapped at the call site. - Document why `has_preprocessor_config` is set before JSON parsing (corrupt config != "no image processor"). - Add negative-path test: AutoProcessor returns a leaf class with an unrecognized suffix (SpeechT5Processor) and the standalone Auto* calls below fill in the missing fields.

…757) ## Summary - Folds the seven fixes merged after the v0.1.0 release notes into the existing v0.1.0 changelog section: #745, #746 (inspect), #747, #753 (perf), #750 (eval), #744/#752, #754 (analyze/compile e2e). - Trims the v0.1.0 section to focus on user-visible changes: - Removed telemetry mentions (#728 preview bullet and #693 fix line). - Removed pure-internal CI/test items (CI display name rename, ADO pipeline UI plumbing, mock fixes, branch back-merge).

timenick requested a review from a team as a code owner May 26, 2026 06:27

DingmaomaoBJTU reviewed May 26, 2026

View reviewed changes

Comment thread src/winml/modelkit/inspect/resolver.py Outdated

DingmaomaoBJTU reviewed May 26, 2026

View reviewed changes

Comment thread src/winml/modelkit/inspect/resolver.py Outdated

DingmaomaoBJTU reviewed May 26, 2026

View reviewed changes

Comment thread src/winml/modelkit/inspect/resolver.py

DingmaomaoBJTU reviewed May 26, 2026

View reviewed changes

Comment thread tests/unit/inspect/test_resolve_processor_gating.py

DingmaomaoBJTU approved these changes May 26, 2026

View reviewed changes

timenick merged commit 69c8bab into release/v0.1.0 May 26, 2026
9 checks passed

timenick deleted the zhiwang/inspect-skip-redundant-auto-calls branch May 26, 2026 08:30

timenick mentioned this pull request May 27, 2026

docs(changelog): roll up post-release fixes and trim v0.1.0 section #757

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(inspect): skip redundant Auto* calls for text-only models#746

perf(inspect): skip redundant Auto* calls for text-only models#746
timenick merged 2 commits into
release/v0.1.0from
zhiwang/inspect-skip-redundant-auto-calls

timenick commented May 26, 2026

Uh oh!

DingmaomaoBJTU left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timenick commented May 26, 2026

Summary

Profile (warm cache, cardiffnlp/twitter-roberta-base-sentiment-latest)

Change

1. Detect when AutoProcessor returns a leaf class

2. preprocessor_config.json absence is authoritative

Tests

Uh oh!

DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

Review: perf(inspect): skip redundant Auto* calls for text-only models

Design

Correctness

Tests

Nits

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Profile (warm cache, `cardiffnlp/twitter-roberta-base-sentiment-latest`)

1. Detect when `AutoProcessor` returns a leaf class

2. `preprocessor_config.json` absence is authoritative

Review: `perf(inspect): skip redundant Auto* calls for text-only models`