Fix AutoImageProcessor.from_pretrained failing on URL input#44892
Fix AutoImageProcessor.from_pretrained failing on URL input#44892he-yufeng wants to merge 2 commits intohuggingface:mainfrom
Conversation
The elif branch for URL detection (is_remote_url + download_url) was accidentally removed in huggingface#43514 during the image processor refactor. This restores URL support with a local download_url helper using httpx, since the old utils.hub.download_url was intentionally dropped in v5. Fixes huggingface#44821
| elif pretrained_model_name_or_path.startswith("http://") or pretrained_model_name_or_path.startswith( | ||
| "https://" | ||
| ): | ||
| image_processor_file = pretrained_model_name_or_path | ||
| resolved_image_processor_file = download_url(pretrained_model_name_or_path) | ||
| resolved_processor_file = None |
There was a problem hiding this comment.
there are utilities already available which we can import and use. Can you instead revert how it was prev and also check video processing and just processing files (it was copied from image and might have same issue)
transformers/src/transformers/image_processing_base.py
Lines 326 to 329 in 753d611
There was a problem hiding this comment.
Got it, thanks for digging into that. Makes sense that it was removed on purpose — no point bringing back deprecated stuff.
| def test_image_processor_from_pretrained_url(self): | ||
| # Regression test: loading from a URL should work (see #44821) | ||
| config_data = { | ||
| "image_processor_type": "ViTImageProcessor", | ||
| "size": {"height": 224, "width": 224}, | ||
| "do_resize": True, | ||
| } | ||
| # Write a fake config to a temp file, then have download_url return that path | ||
| with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f: | ||
| json.dump(config_data, f) | ||
| tmp_path = f.name | ||
|
|
||
| with mock.patch("transformers.image_processing_base.download_url", return_value=tmp_path): | ||
| processor = AutoImageProcessor.from_pretrained("https://example.com/preprocessor_config.json") | ||
|
|
||
| self.assertIsInstance(processor, ViTImageProcessor) | ||
| self.assertEqual(processor.size, {"height": 224, "width": 224}) |
There was a problem hiding this comment.
thanks for a test! We could do without mock if we take a random checkpoint from the hub, e.g. https://huggingface.co/google/vit-base-patch16-224-in21k/blob/main/preprocessor_config.json
AutoImageProcessor.from_pretrained("https://huggingface.co/google/vit-base-patch16-224-in21k/blob/main/preprocessor_config.json") and compare it will result in the same internal dict as if we loaded by repo id
There was a problem hiding this comment.
Agreed, the real checkpoint approach is cleaner. Already switched to that in the latest push before seeing the closure.
Addressed review feedback: - Reverted inline URL detection to use is_remote_url + download_url from utils/hub.py (restored these helpers that were dropped in the refactor) - Applied the same URL handling fix to processing_utils.py - Replaced mock-based test with a real HF checkpoint comparison
|
Updated — switched to Test rewritten to use |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44892&sha=599cdd |
zucchini-nlp
left a comment
There was a problem hiding this comment.
I'm sorry, I just tracked down why download_url was deletde and that seems to be on purpose. It was deprectaed and we don't want to load from a URL anymore
transformers/src/transformers/utils/hub.py
Lines 610 to 616 in 753d611
Let's close this PR, sorry again 🥲
|
No worries at all — thanks for tracking down the history on |
Fixes #44821
The
elif is_remote_url(...)/download_url(...)branch inget_image_processor_dictwas accidentally removed during the image processor refactor in #43514. This causedAutoImageProcessor.from_pretrained(url)to break with anOSErrorabout invalid repo id format.The old
is_remote_urlanddownload_urlutilities were intentionally removed in v5, so this restores URL support with a small localdownload_urlhelper usinghttpx(already a project dependency), paired with a straightforwardstartswith("http")check — consistent with how URLs are detected elsewhere in the codebase (e.g.video_utils.py,audio_utils.py).Added a regression test that mocks the download and verifies the processor loads correctly from a URL.
@zucchini-nlp