v1.2.4
πͺ Added
π§βπ¨ RLE Mask Representation for Instance Segmentation
Instance segmentation models served by inference, inference-models and inference server can now return masks in COCO RLE (Run-Length Encoding) format as a memory-efficient alternative to polygon contours. This applies to all instance segmentation models in the library and:
- significantly reduces memory consumption both server-side and on the wire
- ensures correct representation of non-continuous shapes (which old default,
polygonformat could not support)
To use the feature, pass response_mask_format=rle to your request as query parameter. Using inference-sdk client, you can just use the following snippet:
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
CLIENT = InferenceHTTPClient(
api_url="https://serverless.roboflow.com",
api_key="XXX",
).configure(InferenceConfiguration(response_mask_format="rle"))
with CLIENT.use_api_v0():
results = CLIENT.infer(
"/path/to/image.jpg",
model_id="asl-instance-seg/155",
)If you choose to use RLE format, your prediction needs to be parsed differently compared to the old polygon responses, as each detection is be delivered with therle key instead of points and new field (added to aid building client-side parsers) mask_format set appropriately:
{
"x": 58.5, "y": 72.5, "width": 71.0, "height": 95.0,
"confidence": 0.94,
"class": "N", "class_id": 13,
"detection_id": "...",
"rle": {
"size": [200, 150], # [height, width]
"counts": "Ub46h5<L5K4M3M..." # ASCII-encoded compressed RLE
},
"mask_format": "rle"
}π What is RLE?
Run-Length Encoding is a lossless compression scheme for binary data. Instead of storing every pixel of a mask, you store the lengths of consecutive runs of identical values.
Imagine a single row of a mask: 0 0 0 0 1 1 1 0 0 1 1 1 1 1. Pixel-by-pixel that's 14 values. As RLE: [4, 3, 2, 5] β meaning "4 zeros, 3 ones, 2 zeros, 5 ones". The format always starts with a run of zeros (use 0 as the first count if the mask starts with 1), and alternates from there.
COCO RLE (the variant used here, via pycocotools) adds two specifics:
-
Column-major traversal. The image is flattened in Fortran order β top-to-bottom, then left-to-right β not row-major like you might expect. This is a historical quirk of the COCO format that you generally don't need to think about as long as you decode through
pycocotools. -
ASCII-encoded counts. Rather than transmitting the count integers as JSON numbers, COCO packs them with a custom variable-length encoding (similar in spirit to LEB128) and then base64-style-encodes the bytes into a printable ASCII string. This is what makes the
countsfield look like garbage characters βUb46h5<L5K4M3M202N.... The encoding is much more compact than a JSON integer array, especially for masks with many short runs.
Why use it?
For a 1080p mask with one object, a dense binary representation is ~2 MB. Polygons help, but for masks with holes, ragged edges, or many small components, the polygon list grows quickly. RLE typically lands in the single kilobytes for a single object, regardless of shape complexity. For an inference server returning dozens of predictions per request, this compounds dramatically.
Important
This feature works only with new inference-models backend - make sure that env variable USE_INFERENCE_MODELS=True (currently, that's the default value)
Note
Since the new format, if just replaced old polygon format, would be considered a breaking change, we added RLE as an alternative, while still believing this format is a better choice for our clients.
π§ Gemma-4 added to inference-model
Thanks to @dkosowski87, we have Gemma-4 model added to inference-models (no suport in old inference yet).
import cv2
from inference_models import AutoModel
model = AutoModel.from_pretrained("gemma-4-e2b-it")
image = cv2.imread("path/to/image.jpg")
answers = model.prompt(
images=image,
prompt="What is the main subject in this image?",
max_new_tokens=256,
do_sample=False,
)
print(answers[0])PR: #2227
See also inference-models documentation for more details.
π§ Maintenance
- Add OTel spans to model artifact loading by @ecarrara in #2244
- Fix docs build: skip empty block sections in SUMMARY.md by @Erol444 in #2243
- Fix download page to resolve latest release assets dynamically by @Erol444 in #2250
- Try to fix owlv2 model in new transformers by @PawelPeczek-Roboflow in #2257
- Support optimal confidence from model-eval by @leeclemnet in #2242
- New roboflow model block versions with confidence dropdown by @leeclemnet in #2246
- Route all API calls through Secure Gateway proxy and rename LICENSE_SERVER by @alexnorell in #2254
- Fix classification confidence shape handling by @hansent in #2262
- Add add-inference-model Claude skill by @hansent in #2247
- Add SAM2 streaming video tracker (inference_models + workflow block) by @hansent in #2245
- Re-land remote cold start workflow headers by @hansent in #2224
- feat(cicd): switch to per-repo SA by @bigbitbus in #2259
- Add structured logging to inference compiler by @alexnorell in #2248
- fix(dynamic-blocks): tee stdout/stderr so print() still reaches Docke⦠by @rafel-roboflow in #2258
- Update GHA service account for docker build workflows by @ecarrara in #2264
- chore: update GHA service account for docker build workflows by @ecarrara in #2265
- Fix flash-attention compatibility issue for GLM-OCR model by @PawelPeczek-Roboflow in #2266
Full Changelog: v1.2.3...v1.2.4