Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds a memory-safe OWLv2 image preprocessing path for the new
inference_modelsOWLv2/Roboflow Instant implementation.Instead of handing target images directly to the Hugging Face OWLv2 image processor, we now resize the image to the model input envelope first, then pad to the configured OWLv2 input size and normalize using the processor config. This preserves the bounded model input size while avoiding the very large square intermediate that HF can allocate for extreme-aspect-ratio images.
It also updates OWLv2 image hashing to hash a byte view for contiguous arrays instead of calling
.tobytes(), avoiding an extra full-resolution copy before preprocessing.Why
We have seen hosted serverless inference containers get OOM-killed when Roboflow Instant receives very high-resolution or extreme-aspect-ratio images. One concrete example was an
800x15078image.The issue is not that final OWLv2 embeddings become enormous: OWLv2 ultimately runs on a fixed-size input. The problem is the preprocessing path. The HF OWLv2 processor pads to a square based on the longest side before resizing. For
800x15078, that can transiently create a15078x15078image-like intermediate before the image is resized down to the model input size. That intermediate is hundreds of MiB as uint8 and can become multiple GiB once float/intermediate processor buffers are involved. In serverless, that can kill the container before Python has a chance to return a controlled error.By resizing before padding, the same
800x15078image is reduced into the OWLv2 input envelope first, then padded at the fixed target resolution. For the default1008x1008OWLv2 size, the content becomes roughly53x1008before fixed-size padding, rather than allocating a15078x15078square.Notes
Validation
python -B -m py_compile inference_models/inference_models/models/owlv2/owlv2_hf.py inference_models/inference_models/models/owlv2/reference_dataset.py inference_models/tests/unit_tests/models/owlv2/__init__.py inference_models/tests/unit_tests/models/owlv2/test_memory_safe_preprocessing.pygit diff --cached --checkCould not run pytest in this local shell because
pytestand the model dependency stack are not installed here (numpy,torch,cv2, andtransformersare unavailable).