perf(inspect): skip redundant Auto* calls for text-only models#746
Conversation
Two additional Strategy-2 gating fixes on top of #719: 1. AutoProcessor returns a leaf class for single-modality models. For text-only models (RoBERTa, BERT, ...) AutoProcessor.from_pretrained returns the tokenizer directly — e.g. RobertaTokenizerFast — which has no .tokenizer wrapper attribute. The previous code only set tokenizer_class via the .tokenizer attribute, so the standalone AutoTokenizer.from_pretrained then re-loaded the same tokenizer (~2s warm). Add class-name-suffix detection (Tokenizer / TokenizerFast, ImageProcessor / ImageProcessorFast, FeatureExtractor) so leaf-class results populate the corresponding field directly. The .tokenizer /.image_processor / .feature_extractor attribute path still wins for genuine multimodal processors. 2. preprocessor_config.json absence is authoritative. When the hub returns 404 for preprocessor_config.json the model has neither an image processor nor a feature extractor. Surface this from _resolve_processor_from_hub_configs as a has_preprocessor_config bool and skip the AutoImageProcessor / AutoFeatureExtractor round- trips on text-only models (~2s saved confirming 404s). Measurements (warm cache, cardiffnlp/twitter-roberta-base-sentiment-latest): before this PR: ~16 s wall, AutoConfig x8, AutoTokenizer x2, AutoImageProcessor x1, AutoFeatureExtractor x1 this PR: ~12 s wall, AutoConfig x5, AutoTokenizer x1, AutoImageProcessor x0, AutoFeatureExtractor x0 Tests: 4 new cases in test_resolve_processor_gating.py — three for the leaf-class suffix detection (Tokenizer/ImageProcessor/FeatureExtractor) plus one verifying the multimodal .tokenizer-attribute path still wins over a non-leaf class name; and one new case for the missing-preprocessor-config gate.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Review: perf(inspect): skip redundant Auto* calls for text-only models
Overall this is a well-motivated performance PR with solid profiling data. Both optimizations are sensible and the test coverage is good. A few items worth discussing:
Design
1. Growing positional tuple → consider a dataclass/NamedTuple
_resolve_processor_from_hub_configs now returns a 6-element positional tuple. Every caller must destructure it by position, and future additions will make this increasingly fragile. A NamedTuple (or @dataclass) would make the return self-documenting and order-independent:
class HubConfigResult(NamedTuple):
processor_class: str | None
tokenizer_class: str | None
image_processor_class: str | None
feature_extractor_class: str | None
has_preprocessor_config: bool
has_tokenizer_config: boolNot blocking, but worth doing now while the change is fresh — the tuple was already borderline at 4 elements.
2. has_tokenizer_config is returned but unused
The caller destructures it as _has_tokenizer_config and discards it. If there's no planned consumer, consider not returning it yet (YAGNI). If it is planned, a TODO comment would help.
Correctness
3. has_preprocessor_config set True even on json.JSONDecodeError
In _resolve_processor_from_hub_configs, the flag is set to True right after the hf_hub_download call but before json.load. If preprocessor_config.json exists on the hub but contains invalid JSON, has_preprocessor_config will be True even though no useful data was extracted. This is defensively fine (we don't skip Auto* calls when the file exists), but worth a comment noting this is intentional — a corrupt config file on the hub doesn't mean the model has no image processor.
4. Leaf-class detection: ambiguous class names
The _is_tokenizer_class_name heuristic checks name.endswith(("Tokenizer", "TokenizerFast")). Today this works because HuggingFace naming is well-disciplined, but note that endswith("Tokenizer") already matches *TokenizerFast since "TokenizerFast" doesn't end with "Tokenizer". The tuple is correct — just flagging that the "TokenizerFast" entry is the one doing the real work for fast tokenizers, while "Tokenizer" covers the slow variant. This is fine.
Tests
5. Leaf-class tests don't verify the standalone Auto calls are actually gated*
test_autoprocessor_returns_tokenizer_fills_tokenizer_class patches AutoTokenizer.from_pretrained and asserts call_count == 0, but it also passes try_image_processor=False, try_feature_extractor=False, so those Auto* classes are pre-gated. Consider adding a case where try_tokenizer=True and the leaf class is a tokenizer, but the model actually has no .tokenizer attr — this is the exact real-world scenario (text-only model) and is what the test claims to cover. The current tests do cover this via _make_leaf_instance (no attrs), so this is fine on closer reading — just wanted to confirm.
6. Missing negative test: class name that doesn't match any heuristic
What happens when AutoProcessor returns a leaf class with an unrecognized suffix (e.g. SpeechT5Processor)? The current code silently leaves tokenizer_class/image_processor_class/feature_extractor_class as None and the standalone Auto* calls below fill them in. This is the correct fallback, but a test for it would document the behavior.
Nits
- The PR description says "~25% faster" but the numbers show 16s → 12s which is exactly 25%. Might as well say "25% faster" without the tilde.
Overall: solid perf improvement with good test coverage. The tuple→NamedTuple suggestion is the main structural feedback.
…ack test Address PR review feedback: - Drop unused has_tokenizer_config (YAGNI; no consumer). - Replace 5-element tuple return with `_HubConfigResult` NamedTuple so the boolean cannot be silently swapped at the call site. - Document why `has_preprocessor_config` is set before JSON parsing (corrupt config != "no image processor"). - Add negative-path test: AutoProcessor returns a leaf class with an unrecognized suffix (SpeechT5Processor) and the standalone Auto* calls below fill in the missing fields.
…757) ## Summary - Folds the seven fixes merged after the v0.1.0 release notes into the existing v0.1.0 changelog section: #745, #746 (inspect), #747, #753 (perf), #750 (eval), #744/#752, #754 (analyze/compile e2e). - Trims the v0.1.0 section to focus on user-visible changes: - Removed telemetry mentions (#728 preview bullet and #693 fix line). - Removed pure-internal CI/test items (CI display name rename, ADO pipeline UI plumbing, mock fixes, branch back-merge).
Summary
Two extra Strategy-2 gating fixes on top of #719. Profile on
cardiffnlp/twitter-roberta-base-sentiment-latest(text-only) showed the inspect command was still taking ~16 s warm — most of it spent on Auto* calls that didn't need to run.Targeting
release/v0.1.0since #717 / #718 / #719 are already on that release; this is the natural follow-up.Profile (warm cache,
cardiffnlp/twitter-roberta-base-sentiment-latest)Before this PR:
After this PR:
Change
1. Detect when
AutoProcessorreturns a leaf classFor single-modality models,
AutoProcessor.from_pretrainedreturns the leaf class directly — e.g. RoBERTa →RobertaTokenizerFast. Such a return has no.tokenizerwrapper attribute, so the old code couldn't populatetokenizer_classand fell through to a redundantAutoTokenizer.from_pretrained(~2 s warm).Pattern-match the returned class name (
*Tokenizer/*TokenizerFast,*ImageProcessor/*ImageProcessorFast,*FeatureExtractor) and populate the corresponding field. The.tokenizer/.image_processor/.feature_extractorattribute path still wins for genuine multimodalProcessorMixinreturns (CLIP, etc.) — see thetest_autoprocessor_with_wrapped_pieces_uses_attributesregression test.2.
preprocessor_config.jsonabsence is authoritative_resolve_processor_from_hub_configsalready tries to downloadpreprocessor_config.json. When the hub returns 404, the model has no image processor or feature extractor, period. Surface this as ahas_preprocessor_configbool from the helper so the caller can skip theAutoImageProcessor/AutoFeatureExtractorround-trips (~2 s total wasted confirming 404s).Tests
tests/unit/inspect/test_resolve_processor_gating.py:test_autoprocessor_returns_tokenizer_fills_tokenizer_class— leaf-class detection populatestokenizer_classfrom class-name suffix and skips standaloneAutoTokenizertest_autoprocessor_returns_image_processor_fills_image_class— same for*ImageProcessortest_autoprocessor_returns_feature_extractor_fills_feature_class— same for*FeatureExtractortest_autoprocessor_with_wrapped_pieces_uses_attributes— multimodalProcessorMixinwith.tokenizerattribute wins over name suffixtest_missing_preprocessor_config_skips_image_and_feature—has_preprocessor_config=FalseskipsAutoImageProcessor/AutoFeatureExtractor55 targeted tests pass.