Skip to content

v1.2.4

Choose a tag to compare

@PawelPeczek-Roboflow PawelPeczek-Roboflow released this 24 Apr 16:52
· 134 commits to main since this release
6584f8e

πŸ’ͺ Added

πŸ§‘β€πŸŽ¨ RLE Mask Representation for Instance Segmentation

Instance segmentation models served by inference, inference-models and inference server can now return masks in COCO RLE (Run-Length Encoding) format as a memory-efficient alternative to polygon contours. This applies to all instance segmentation models in the library and:

  • significantly reduces memory consumption both server-side and on the wire
  • ensures correct representation of non-continuous shapes (which old default, polygon format could not support)

To use the feature, pass response_mask_format=rle to your request as query parameter. Using inference-sdk client, you can just use the following snippet:

from inference_sdk import InferenceHTTPClient, InferenceConfiguration

CLIENT = InferenceHTTPClient(
    api_url="https://serverless.roboflow.com",
    api_key="XXX",
).configure(InferenceConfiguration(response_mask_format="rle"))

with CLIENT.use_api_v0():
    results = CLIENT.infer(
        "/path/to/image.jpg",
        model_id="asl-instance-seg/155",
    )

If you choose to use RLE format, your prediction needs to be parsed differently compared to the old polygon responses, as each detection is be delivered with therle key instead of points and new field (added to aid building client-side parsers) mask_format set appropriately:

{
    "x": 58.5, "y": 72.5, "width": 71.0, "height": 95.0,
    "confidence": 0.94,
    "class": "N", "class_id": 13,
    "detection_id": "...",
    "rle": {
        "size": [200, 150],         # [height, width]
        "counts": "Ub46h5<L5K4M3M..." # ASCII-encoded compressed RLE
    },
    "mask_format": "rle"
}
πŸ‘‰ What is RLE?

Run-Length Encoding is a lossless compression scheme for binary data. Instead of storing every pixel of a mask, you store the lengths of consecutive runs of identical values.

Imagine a single row of a mask: 0 0 0 0 1 1 1 0 0 1 1 1 1 1. Pixel-by-pixel that's 14 values. As RLE: [4, 3, 2, 5] β€” meaning "4 zeros, 3 ones, 2 zeros, 5 ones". The format always starts with a run of zeros (use 0 as the first count if the mask starts with 1), and alternates from there.

COCO RLE (the variant used here, via pycocotools) adds two specifics:

  1. Column-major traversal. The image is flattened in Fortran order β€” top-to-bottom, then left-to-right β€” not row-major like you might expect. This is a historical quirk of the COCO format that you generally don't need to think about as long as you decode through pycocotools.

  2. ASCII-encoded counts. Rather than transmitting the count integers as JSON numbers, COCO packs them with a custom variable-length encoding (similar in spirit to LEB128) and then base64-style-encodes the bytes into a printable ASCII string. This is what makes the counts field look like garbage characters β€” Ub46h5<L5K4M3M202N.... The encoding is much more compact than a JSON integer array, especially for masks with many short runs.

Why use it?

For a 1080p mask with one object, a dense binary representation is ~2 MB. Polygons help, but for masks with holes, ragged edges, or many small components, the polygon list grows quickly. RLE typically lands in the single kilobytes for a single object, regardless of shape complexity. For an inference server returning dozens of predictions per request, this compounds dramatically.

Important

This feature works only with new inference-models backend - make sure that env variable USE_INFERENCE_MODELS=True (currently, that's the default value)

Note

Since the new format, if just replaced old polygon format, would be considered a breaking change, we added RLE as an alternative, while still believing this format is a better choice for our clients.

🧠 Gemma-4 added to inference-model

Thanks to @dkosowski87, we have Gemma-4 model added to inference-models (no suport in old inference yet).

import cv2
from inference_models import AutoModel

model = AutoModel.from_pretrained("gemma-4-e2b-it")
image = cv2.imread("path/to/image.jpg")

answers = model.prompt(
    images=image,
    prompt="What is the main subject in this image?",
    max_new_tokens=256,
    do_sample=False,
)
print(answers[0])

PR: #2227
See also inference-models documentation for more details.

🚧 Maintenance

Full Changelog: v1.2.3...v1.2.4