[codex] Add memory-safe OWLv2 preprocessing by hansent · Pull Request #2304 · roboflow/inference

hansent · 2026-04-30T22:19:00Z

Summary

This PR adds a memory-safe OWLv2 image preprocessing path for the new inference_models OWLv2/Roboflow Instant implementation.

Instead of handing target images directly to the Hugging Face OWLv2 image processor, we now resize the image to the model input envelope first, then pad to the configured OWLv2 input size and normalize using the processor config. This preserves the bounded model input size while avoiding the very large square intermediate that HF can allocate for extreme-aspect-ratio images.

It also updates OWLv2 image hashing to hash a byte view for contiguous arrays instead of calling .tobytes(), avoiding an extra full-resolution copy before preprocessing.

Why

We have seen hosted serverless inference containers get OOM-killed when Roboflow Instant receives very high-resolution or extreme-aspect-ratio images. One concrete example was an 800x15078 image.

The issue is not that final OWLv2 embeddings become enormous: OWLv2 ultimately runs on a fixed-size input. The problem is the preprocessing path. The HF OWLv2 processor pads to a square based on the longest side before resizing. For 800x15078, that can transiently create a 15078x15078 image-like intermediate before the image is resized down to the model input size. That intermediate is hundreds of MiB as uint8 and can become multiple GiB once float/intermediate processor buffers are involved. In serverless, that can kill the container before Python has a chance to return a controlled error.

By resizing before padding, the same 800x15078 image is reduced into the OWLv2 input envelope first, then padded at the fixed target resolution. For the default 1008x1008 OWLv2 size, the content becomes roughly 53x1008 before fixed-size padding, rather than allocating a 15078x15078 square.

Notes

This intentionally does not change Roboflow Instant target-image embedding cache behavior.
This still assumes the request image has already been decoded by the adapter. A separate pre-decode byte/pixel guard would be the next hard protection for truly oversized request payloads.
Text processing and OWLv2 post-processing continue to use the HF processor.

Validation

python -B -m py_compile inference_models/inference_models/models/owlv2/owlv2_hf.py inference_models/inference_models/models/owlv2/reference_dataset.py inference_models/tests/unit_tests/models/owlv2/__init__.py inference_models/tests/unit_tests/models/owlv2/test_memory_safe_preprocessing.py
git diff --cached --check

Could not run pytest in this local shell because pytest and the model dependency stack are not installed here (numpy, torch, cv2, and transformers are unavailable).

hansent added 2 commits April 30, 2026 17:18

Add memory-safe OWLv2 preprocessing

765c2e7

Merge branch 'main' into codex/memory-safe-owlv2-preprocessing

ac0e585

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] Add memory-safe OWLv2 preprocessing#2304

[codex] Add memory-safe OWLv2 preprocessing#2304
hansent wants to merge 2 commits intomainfrom
codex/memory-safe-owlv2-preprocessing

hansent commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hansent commented Apr 30, 2026

Summary

Why

Notes

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant