💪 Added

🧑‍🎨 RLE Mask Representation for Instance Segmentation

Instance segmentation models served by inference, inference-models and inference server can now return masks in COCO RLE (Run-Length Encoding) format as a memory-efficient alternative to polygon contours. This applies to all instance segmentation models in the library and:

significantly reduces memory consumption both server-side and on the wire
ensures correct representation of non-continuous shapes (which old default, polygon format could not support)

To use the feature, pass response_mask_format=rle to your request as query parameter. Using inference-sdk client, you can just use the following snippet:

from inference_sdk import InferenceHTTPClient, InferenceConfiguration

CLIENT = InferenceHTTPClient(
    api_url="https://serverless.roboflow.com",
    api_key="XXX",
).configure(InferenceConfiguration(response_mask_format="rle"))

with CLIENT.use_api_v0():
    results = CLIENT.infer(
        "/path/to/image.jpg",
        model_id="asl-instance-seg/155",
    )

If you choose to use RLE format, your prediction needs to be parsed differently compared to the old polygon responses, as each detection is be delivered with therle key instead of points and new field (added to aid building client-side parsers) mask_format set appropriately:

{
    "x": 58.5, "y": 72.5, "width": 71.0, "height": 95.0,
    "confidence": 0.94,
    "class": "N", "class_id": 13,
    "detection_id": "...",
    "rle": {
        "size": [200, 150],         # [height, width]
        "counts": "Ub46h5<L5K4M3M..." # ASCII-encoded compressed RLE
    },
    "mask_format": "rle"
}

👉 What is RLE?

Run-Length Encoding is a lossless compression scheme for binary data. Instead of storing every pixel of a mask, you store the lengths of consecutive runs of identical values.

Imagine a single row of a mask: 0 0 0 0 1 1 1 0 0 1 1 1 1 1. Pixel-by-pixel that's 14 values. As RLE: [4, 3, 2, 5] — meaning "4 zeros, 3 ones, 2 zeros, 5 ones". The format always starts with a run of zeros (use 0 as the first count if the mask starts with 1), and alternates from there.

COCO RLE (the variant used here, via pycocotools) adds two specifics:

Column-major traversal. The image is flattened in Fortran order — top-to-bottom, then left-to-right — not row-major like you might expect. This is a historical quirk of the COCO format that you generally don't need to think about as long as you decode through pycocotools.
ASCII-encoded counts. Rather than transmitting the count integers as JSON numbers, COCO packs them with a custom variable-length encoding (similar in spirit to LEB128) and then base64-style-encodes the bytes into a printable ASCII string. This is what makes the counts field look like garbage characters — Ub46h5<L5K4M3M202N.... The encoding is much more compact than a JSON integer array, especially for masks with many short runs.

Why use it?

For a 1080p mask with one object, a dense binary representation is ~2 MB. Polygons help, but for masks with holes, ragged edges, or many small components, the polygon list grows quickly. RLE typically lands in the single kilobytes for a single object, regardless of shape complexity. For an inference server returning dozens of predictions per request, this compounds dramatically.

Important

This feature works only with new inference-models backend - make sure that env variable USE_INFERENCE_MODELS=True (currently, that's the default value)

Note

Since the new format, if just replaced old polygon format, would be considered a breaking change, we added RLE as an alternative, while still believing this format is a better choice for our clients.

🧠 Gemma-4 added to `inference-model`

Thanks to @dkosowski87, we have Gemma-4 model added to inference-models (no suport in old inference yet).

import cv2
from inference_models import AutoModel

model = AutoModel.from_pretrained("gemma-4-e2b-it")
image = cv2.imread("path/to/image.jpg")

answers = model.prompt(
    images=image,
    prompt="What is the main subject in this image?",
    max_new_tokens=256,
    do_sample=False,
)
print(answers[0])

PR: #2227
See also inference-models documentation for more details.

🚧 Maintenance

Add OTel spans to model artifact loading by @ecarrara in #2244
Fix docs build: skip empty block sections in SUMMARY.md by @Erol444 in #2243
Fix download page to resolve latest release assets dynamically by @Erol444 in #2250
Try to fix owlv2 model in new transformers by @PawelPeczek-Roboflow in #2257
Support optimal confidence from model-eval by @leeclemnet in #2242
New roboflow model block versions with confidence dropdown by @leeclemnet in #2246
Route all API calls through Secure Gateway proxy and rename LICENSE_SERVER by @alexnorell in #2254
Fix classification confidence shape handling by @hansent in #2262
Add add-inference-model Claude skill by @hansent in #2247
Add SAM2 streaming video tracker (inference_models + workflow block) by @hansent in #2245
Re-land remote cold start workflow headers by @hansent in #2224
feat(cicd): switch to per-repo SA by @bigbitbus in #2259
Add structured logging to inference compiler by @alexnorell in #2248
fix(dynamic-blocks): tee stdout/stderr so print() still reaches Docke… by @rafel-roboflow in #2258
Update GHA service account for docker build workflows by @ecarrara in #2264
chore: update GHA service account for docker build workflows by @ecarrara in #2265
Fix flash-attention compatibility issue for GLM-OCR model by @PawelPeczek-Roboflow in #2266

Full Changelog: v1.2.3...v1.2.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.2.4

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

💪 Added

🧑‍🎨 RLE Mask Representation for Instance Segmentation

Why use it?

🧠 Gemma-4 added to `inference-model`

🚧 Maintenance

Contributors

Uh oh!

v1.2.4

💪 Added

🧑‍🎨 RLE Mask Representation for Instance Segmentation

Why use it?

🧠 Gemma-4 added to inference-model

🚧 Maintenance

Contributors

Uh oh!

🧠 Gemma-4 added to `inference-model`