fix(export): resolve timm image_size from pretrained_cfg#806
fix(export): resolve timm image_size from pretrained_cfg#806vortex-captain wants to merge 6 commits into
Conversation
Optimum's DummyVisionInputGenerator falls back to 64x64 when normalized_config lacks image_size/input_size. For timm models loaded via TimmWrapperConfig, input_size lives nested in pretrained_cfg as a plain dict, which Optimum's NormalizedConfig does not traverse, and preprocessor_config.json is absent on the hub. So winml config emitted [1, 3, 64, 64] instead of [1, 3, 224, 224]. Synthesize a preprocessor_config-style dict from pretrained_cfg.input_size when the hub fetch misses, keeping the existing size-parsing block intact. Added timm/mobilenetv3_small_100.lamb_in1k and timm/repghostnet_200.in1k to the e2e registry; both PASS perf on CPU with the correct [1, 3, 224, 224] shape.
CI failure here is unrelated to this PR — it's network-dependent unit testsThe failing job (run 26930137132) is erroring on Root cause: two module-scoped fixtures pull configs from the HF Hub at collection time:
These fixtures and the T5/Qwen test classes were introduced in #334 ( A unit test shouldn't require network access — it makes the suite fail on offline runners, behind firewalls, or whenever the Hub is unavailable, which is exactly what hit this PR. Suggested fix: build these configs synthetically, the way the Marian/BART tests in the same file already do (e.g. Generated with Claude Code |
Summary
winml configemitting[1, 3, 64, 64]instead of[1, 3, 224, 224]for timm image-classification models (e.g.timm/repghostnet_200.in1k).TimmWrapperConfigstores shape info inpretrained_cfg["input_size"](a plain dict). Optimum'sNormalizedConfigonly walksPretrainedConfigchildren, so the value is invisible toDummyVisionInputGenerator, which then defaults to 64x64. timm models also have nopreprocessor_config.jsonon the hub, so winml's existing fallback misses too.preprocessor_config.jsonis unavailable, synthesize a preprocessor-style dict fromhf_config.pretrained_cfg.input_size. The existing size-parsing block (sizeint / dict-with-height-width / shortest_edge) is unchanged — the timm concern is isolated at the data-fetch boundary in_get_preprocessor_dict/_synthesize_preprocessor_dict.timm/mobilenetv3_small_100.lamb_in1kandtimm/repghostnet_200.in1ktomodels_all.json; both PASS perf on CPU. Perf output confirmspixel_values [1, 3, 224, 224] float32.🤖 Generated with Claude Code