Skip to content

perf(inspect): skip redundant Auto* calls for text-only models#746

Merged
timenick merged 2 commits into
release/v0.1.0from
zhiwang/inspect-skip-redundant-auto-calls
May 26, 2026
Merged

perf(inspect): skip redundant Auto* calls for text-only models#746
timenick merged 2 commits into
release/v0.1.0from
zhiwang/inspect-skip-redundant-auto-calls

Conversation

@timenick
Copy link
Copy Markdown
Collaborator

Summary

Two extra Strategy-2 gating fixes on top of #719. Profile on cardiffnlp/twitter-roberta-base-sentiment-latest (text-only) showed the inspect command was still taking ~16 s warm — most of it spent on Auto* calls that didn't need to run.

Targeting release/v0.1.0 since #717 / #718 / #719 are already on that release; this is the natural follow-up.

Profile (warm cache, cardiffnlp/twitter-roberta-base-sentiment-latest)

Before this PR:

[0]  AutoConfig             0.74s  (parent_hf_config — already deduped by #719)
[1]  AutoProcessor          4.27s  returns RobertaTokenizerFast
[7]  AutoTokenizer          2.22s  ← redundant, AutoProcessor already returned the tokenizer
[11] AutoImageProcessor     1.39s  FAIL (text model has no preprocessor_config.json)
[12] AutoFeatureExtractor   0.64s  FAIL (same)
─────
~16 s total

After this PR:

AutoConfig             5 calls (vs 8)
AutoProcessor          1 call
AutoTokenizer          1 call   (vs 2 — the redundant standalone load is gone)
AutoImageProcessor     0 calls  (skipped — no preprocessor_config.json)
AutoFeatureExtractor   0 calls  (skipped — same)
─────
~12 s total  (~25% faster)

Change

1. Detect when AutoProcessor returns a leaf class

For single-modality models, AutoProcessor.from_pretrained returns the leaf class directly — e.g. RoBERTa → RobertaTokenizerFast. Such a return has no .tokenizer wrapper attribute, so the old code couldn't populate tokenizer_class and fell through to a redundant AutoTokenizer.from_pretrained (~2 s warm).

Pattern-match the returned class name (*Tokenizer / *TokenizerFast, *ImageProcessor / *ImageProcessorFast, *FeatureExtractor) and populate the corresponding field. The .tokenizer / .image_processor / .feature_extractor attribute path still wins for genuine multimodal ProcessorMixin returns (CLIP, etc.) — see the test_autoprocessor_with_wrapped_pieces_uses_attributes regression test.

2. preprocessor_config.json absence is authoritative

_resolve_processor_from_hub_configs already tries to download preprocessor_config.json. When the hub returns 404, the model has no image processor or feature extractor, period. Surface this as a has_preprocessor_config bool from the helper so the caller can skip the AutoImageProcessor / AutoFeatureExtractor round-trips (~2 s total wasted confirming 404s).

Tests

tests/unit/inspect/test_resolve_processor_gating.py:

  • test_autoprocessor_returns_tokenizer_fills_tokenizer_class — leaf-class detection populates tokenizer_class from class-name suffix and skips standalone AutoTokenizer
  • test_autoprocessor_returns_image_processor_fills_image_class — same for *ImageProcessor
  • test_autoprocessor_returns_feature_extractor_fills_feature_class — same for *FeatureExtractor
  • test_autoprocessor_with_wrapped_pieces_uses_attributes — multimodal ProcessorMixin with .tokenizer attribute wins over name suffix
  • test_missing_preprocessor_config_skips_image_and_featurehas_preprocessor_config=False skips AutoImageProcessor / AutoFeatureExtractor

55 targeted tests pass.

Two additional Strategy-2 gating fixes on top of #719:

1. AutoProcessor returns a leaf class for single-modality models.
   For text-only models (RoBERTa, BERT, ...) AutoProcessor.from_pretrained
   returns the tokenizer directly — e.g. RobertaTokenizerFast — which
   has no .tokenizer wrapper attribute. The previous code only set
   tokenizer_class via the .tokenizer attribute, so the standalone
   AutoTokenizer.from_pretrained then re-loaded the same tokenizer
   (~2s warm). Add class-name-suffix detection (Tokenizer / TokenizerFast,
   ImageProcessor / ImageProcessorFast, FeatureExtractor) so leaf-class
   results populate the corresponding field directly. The .tokenizer
   /.image_processor / .feature_extractor attribute path still wins for
   genuine multimodal processors.

2. preprocessor_config.json absence is authoritative.
   When the hub returns 404 for preprocessor_config.json the model has
   neither an image processor nor a feature extractor. Surface this
   from _resolve_processor_from_hub_configs as a has_preprocessor_config
   bool and skip the AutoImageProcessor / AutoFeatureExtractor round-
   trips on text-only models (~2s saved confirming 404s).

Measurements (warm cache, cardiffnlp/twitter-roberta-base-sentiment-latest):
  before this PR: ~16 s wall, AutoConfig x8, AutoTokenizer x2,
                  AutoImageProcessor x1, AutoFeatureExtractor x1
  this PR:       ~12 s wall, AutoConfig x5, AutoTokenizer x1,
                  AutoImageProcessor x0, AutoFeatureExtractor x0

Tests: 4 new cases in test_resolve_processor_gating.py — three for the
leaf-class suffix detection (Tokenizer/ImageProcessor/FeatureExtractor)
plus one verifying the multimodal .tokenizer-attribute path still wins
over a non-leaf class name; and one new case for the
missing-preprocessor-config gate.
@timenick timenick requested a review from a team as a code owner May 26, 2026 06:27
Copy link
Copy Markdown
Collaborator

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: perf(inspect): skip redundant Auto* calls for text-only models

Overall this is a well-motivated performance PR with solid profiling data. Both optimizations are sensible and the test coverage is good. A few items worth discussing:

Design

1. Growing positional tuple → consider a dataclass/NamedTuple

_resolve_processor_from_hub_configs now returns a 6-element positional tuple. Every caller must destructure it by position, and future additions will make this increasingly fragile. A NamedTuple (or @dataclass) would make the return self-documenting and order-independent:

class HubConfigResult(NamedTuple):
    processor_class: str | None
    tokenizer_class: str | None
    image_processor_class: str | None
    feature_extractor_class: str | None
    has_preprocessor_config: bool
    has_tokenizer_config: bool

Not blocking, but worth doing now while the change is fresh — the tuple was already borderline at 4 elements.

2. has_tokenizer_config is returned but unused

The caller destructures it as _has_tokenizer_config and discards it. If there's no planned consumer, consider not returning it yet (YAGNI). If it is planned, a TODO comment would help.

Correctness

3. has_preprocessor_config set True even on json.JSONDecodeError

In _resolve_processor_from_hub_configs, the flag is set to True right after the hf_hub_download call but before json.load. If preprocessor_config.json exists on the hub but contains invalid JSON, has_preprocessor_config will be True even though no useful data was extracted. This is defensively fine (we don't skip Auto* calls when the file exists), but worth a comment noting this is intentional — a corrupt config file on the hub doesn't mean the model has no image processor.

4. Leaf-class detection: ambiguous class names

The _is_tokenizer_class_name heuristic checks name.endswith(("Tokenizer", "TokenizerFast")). Today this works because HuggingFace naming is well-disciplined, but note that endswith("Tokenizer") already matches *TokenizerFast since "TokenizerFast" doesn't end with "Tokenizer". The tuple is correct — just flagging that the "TokenizerFast" entry is the one doing the real work for fast tokenizers, while "Tokenizer" covers the slow variant. This is fine.

Tests

5. Leaf-class tests don't verify the standalone Auto calls are actually gated*

test_autoprocessor_returns_tokenizer_fills_tokenizer_class patches AutoTokenizer.from_pretrained and asserts call_count == 0, but it also passes try_image_processor=False, try_feature_extractor=False, so those Auto* classes are pre-gated. Consider adding a case where try_tokenizer=True and the leaf class is a tokenizer, but the model actually has no .tokenizer attr — this is the exact real-world scenario (text-only model) and is what the test claims to cover. The current tests do cover this via _make_leaf_instance (no attrs), so this is fine on closer reading — just wanted to confirm.

6. Missing negative test: class name that doesn't match any heuristic

What happens when AutoProcessor returns a leaf class with an unrecognized suffix (e.g. SpeechT5Processor)? The current code silently leaves tokenizer_class/image_processor_class/feature_extractor_class as None and the standalone Auto* calls below fill them in. This is the correct fallback, but a test for it would document the behavior.

Nits

  • The PR description says "~25% faster" but the numbers show 16s → 12s which is exactly 25%. Might as well say "25% faster" without the tilde.

Overall: solid perf improvement with good test coverage. The tuple→NamedTuple suggestion is the main structural feedback.

Comment thread src/winml/modelkit/inspect/resolver.py Outdated
Comment thread src/winml/modelkit/inspect/resolver.py Outdated
Comment thread src/winml/modelkit/inspect/resolver.py
Comment thread tests/unit/inspect/test_resolve_processor_gating.py
…ack test

Address PR review feedback:
- Drop unused has_tokenizer_config (YAGNI; no consumer).
- Replace 5-element tuple return with `_HubConfigResult` NamedTuple so
  the boolean cannot be silently swapped at the call site.
- Document why `has_preprocessor_config` is set before JSON parsing
  (corrupt config != "no image processor").
- Add negative-path test: AutoProcessor returns a leaf class with an
  unrecognized suffix (SpeechT5Processor) and the standalone Auto*
  calls below fill in the missing fields.
@timenick timenick merged commit 69c8bab into release/v0.1.0 May 26, 2026
9 checks passed
@timenick timenick deleted the zhiwang/inspect-skip-redundant-auto-calls branch May 26, 2026 08:30
timenick added a commit that referenced this pull request May 27, 2026
…757)

## Summary

- Folds the seven fixes merged after the v0.1.0 release notes into the
existing v0.1.0 changelog section: #745, #746 (inspect), #747, #753
(perf), #750 (eval), #744/#752, #754 (analyze/compile e2e).
- Trims the v0.1.0 section to focus on user-visible changes:
  - Removed telemetry mentions (#728 preview bullet and #693 fix line).
- Removed pure-internal CI/test items (CI display name rename, ADO
pipeline UI plumbing, mock fixes, branch back-merge).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants