Impl/2026 05 22 workflows tensor data representation implementation by PawelPeczek-Roboflow · Pull Request #2371 · roboflow/inference

PawelPeczek-Roboflow · 2026-05-26T10:26:03Z

What does this PR do?

Related Issue(s):

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactoring (no functional changes)
Other:

Testing

I have tested this change locally
I have added/updated tests for this change

Test details:

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code where necessary, particularly in hard-to-understand areas
My changes generate no new warnings or errors
I have updated the documentation accordingly (if applicable)

Additional Context

Add an optional torch.Tensor representation to WorkflowImageData with lazy BGR<->RGB conversion between the two backings. Layout contract: numpy is HWC uint8 BGR (cv2 native), tensor is HWC uint8 RGB (inference-models / torch convention). dtype is preserved; no implicit device moves. - __init__ accepts tensor_image; "empty" check covers the new field. - numpy_image property: if only tensor is set, materialize via detach().to("cpu").numpy() with a channel flip and cache the result. - tensor_image property: mirror fallback from numpy with channel flip. - copy_and_replace propagates tensor_image. - create_crop_from_tensor: tensor-native sibling of create_crop with identical metadata math. - _read_shape_without_materialization avoids forcing device->host just to fill parent_metadata / workflow_root_ancestor_metadata origin coordinates when only the tensor representation is set. Public surface is unchanged; the field is opt-in. Serialization (via base64_image -> numpy_image -> JPEG) continues to work transparently.

New common/deserializers_tensor.py and common/serializers_tensor.py. The numpy files are untouched per the plan's locked [ITERATE 4.A] decision; the tensor file's deserialize_image_kind handles raw torch.Tensor input and the dict-shape {"type": "tensor", "value": ...} input, and delegates all other inputs (np.ndarray, base64, URL, dict with type=base64/url) to the numpy implementation. The serializer sibling currently re-exports the numpy functions because the lazy tensor->numpy fallback in WorkflowImageData makes serialise_image correct in both modes; the file exists to keep the loader's import swap symmetric and to give future tensor-aware optimisations a landing spot without touching the loader contract.

Mirror module per the plan's Step 5b. attach_parents_coordinates_* helpers go through ImageParentMetadata.origin_coordinates, which is populated by WorkflowImageData.parent_metadata / workflow_root_ancestor_metadata via _read_shape_without_materialization, so the tensor mirrors currently delegate to the numpy implementations. The module exists so future tensor-specific divergence has a landing spot the loader can swap to without touching the numpy file.

…path map_inference_kwargs unconditionally sets input_color_format="bgr", which is the right default for the cv2-derived numpy paths (preprocess / predict / postprocess) but breaks the new run_tensor_native_inference entry points: workflows tensor blocks pass RGB tensors per the workflows tensor-data-representation plan and need a way to opt out of the BGR override. In each of the four adapter classes that implement run_tensor_native_inference (object detection, instance segmentation, classification, semantic segmentation), pop input_color_format from kwargs before map_inference_kwargs (default None), then restore it afterwards. map_inference_kwargs itself is untouched, so every old execution path keeps the BGR default it has today. Effect: callers of run_tensor_native_inference can pass input_color_format="rgb" (or "bgr", or leave it None) and the value travels through to the underlying model unchanged.

New v3_tensor.py next to v3.py. Per the plan's locked decisions: - Manifest is verbatim (same type literal, name, version, description, fields, outputs, ui_manifest). Class name unchanged so the loader's if/else swap binds the same identifier in both branches. - run_locally calls model_manager.run_tensor_native_inference with the per-image torch tensors from WorkflowImageData.tensor_image and passes input_color_format="rgb" (adapter now respects caller's value). Skips convert_inference_detections_batch_to_sv_detections because the adapter returns sv.Detections directly. - attach_parents_coordinates_to_batch_of_sv_detections_tensor used on the local path; numpy mirror reused on the remote path per [ITERATE 6.A]. - run_remotely materialises base64_image (which lazily goes tensor -> numpy -> JPEG) and hits the same HTTP API as v3.py. Remote response is dict-shaped so the numpy converter applies. - inference_id read from sv.Detections.data when present; uuid4 fallback per image per [ITERATE 6.B].

Wire the three pieces the plan's Step 3 calls for, gated on ENABLE_TENSOR_DATA_REPRESENTATION: - Import env flag. - Move serialise_image / serialise_sv_detections / serialise_rle_sv_detections and deserialize_image_kind / deserialize_detections_kind / deserialize_rle_detections_kind into an if/else swap that picks the _tensor module when the flag is on. All other (de)serializer functions stay imported from the numpy file (the tensor file only mirrors the image/detection trio that has tensor-aware behaviour). - Object-detection V3 block import becomes an if/else that picks v3_tensor.py when the flag is on. Class name unchanged, so the blocks = [...] list and load_blocks() are untouched. load_kinds() and KINDS_SERIALIZERS / KINDS_DESERIALIZERS dict construction are unchanged per the plan ("no new kinds").

…ponse conversion Two utilities the per-block tensor producer siblings will share, per plan v3.1 step A: - `core_steps/common/tensor_prediction_metadata.py` — `attach_prediction_metadata` populates `image_metadata` on `inference_models.{Detections,InstanceDetections, MultiLabelClassificationPrediction,SemanticSegmentationResult}` from a `WorkflowImageData`'s parent / root-parent metadata. One dict per prediction (no per-detection replication). `inference_id` resolved as existing-metadata > explicit-arg > minted uuid4. `class_names=None` signals "global list unavailable" (remote path) — the key is omitted in that case. `ClassificationPrediction` rejected (plural `images_metadata` needs a dedicated helper). - `core_steps/common/remote_response_converters.py` — `dict_response_to_object_detections` builds an `inference_models.Detections` from one HTTP-API response dict. Mirrors `convert_inference_detections_batch_to_sv_detections` semantics (`utils.py:105`) but emits tensor-native instead of `sv.Detections`. Per-detection `detection_id`/`parent_id`/`class` go to `bboxes_metadata`. Top-level `inference_id` / image dims go to `image_metadata`. Unit tests cover key population, inference_id resolution precedence, empty-predictions, class-name handling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ions natively The previous pilot (`3c0ff2d10`) returned `List[inference_models.Detections]` from `run_tensor_native_inference` but then ran sv.Detections-shaped post-processing on it (`attach_prediction_type_info_to_sv_detections_batch`, `filter_out_unwanted_classes_from_sv_detections_batch`, `attach_parents_coordinates_to_batch_of_sv_detections_tensor`) — broken because `inference_models.Detections` has no `.data` dict. User review during the first /implement run identified the architectural cause: predictions should also type-swap under the flag (not just images), and metadata belongs in `image_metadata`/`bboxes_metadata` on the tensor-native types, not replicated across `sv.Detections.data` arrays. This rewrite, per plan v3.1: - `run_locally`: pass tensor inputs + `input_color_format="rgb"` to the adapter, take its `List[Detections]` return value, attach metadata via `attach_prediction_metadata` once per prediction. Drop NMS / class-filter / coordinate-attach functions — model already applies NMS / class filter / max_detections; `attach_prediction_metadata` records parent / root-parent coordinates in `image_metadata`. Pull `class_names` from `model_manager.get_class_names(model_id)`. - `run_remotely`: materialise base64 via the existing lazy fallback, call the same HTTP path as v3.py, then convert each response dict into `inference_models.Detections` via the new `dict_response_to_object_detections` converter. `class_names=None` on the attach call — the global list is not available remotely; per-detection class strings are preserved in `bboxes_metadata` by the converter. Manifest verbatim from v3.py. Class name unchanged. The loader's existing `if/else` swap (`93299327e`) picks this file when `ENABLE_TENSOR_DATA_REPRESENTATION=True`. Unit tests with MagicMock'd model_manager + InferenceHTTPClient cover metadata attach, color-format passthrough, class_filter passthrough, empty predictions, inference_id resolution, dimensions writing, singleton-response wrapping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nsor paths Previous shape was `List[str]` with `None` to signal "not available remotely". User reshaped: a `Dict[int, str]` mapping is constructible in either mode and lets downstream consumers do `class_names[class_id]` without index alignment. - `tensor_prediction_metadata.attach_prediction_metadata` accepts `class_names: Optional[Dict[int, str]]`. Defensive copy via `dict(class_names)`. `None` still omits the key entirely. - `remote_response_converters.class_id_to_name_from_responses` — new utility that merges a sparse `class_id -> class_name` mapping across a batch of response dicts. First-seen wins on duplicates. Coerces class_id to int (some APIs hand back string-numbers). Skips entries missing either id or name. - `remote_response_converters.dict_response_to_object_detections` no longer puts per-detection `class` into `bboxes_metadata` — redundant with the dict mapping carried at the prediction level. - OD pilot `v3_tensor.py`: - Local: `dict(enumerate(model_manager.get_class_names(...)))` builds the full mapping. - Remote: `class_id_to_name_from_responses(responses)` builds the sparse mapping from labels actually seen in the batch. Tests updated to assert dict shapes; new tests cover sparse construction, defensive-copy semantics, missing-id/name skipping, int-coercion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New helper in remote_response_converters.py that mirrors dict_response_to_object_detections but builds inference_models InstanceDetections with the mask field set to InstancesRLEMasks — no polygon-to-mask rasterization on the path. Reads `response.image.width/height` for InstancesRLEMasks.image_size (preferred) with fallback to per-detection `rle.size`. Per-detection `rle.counts` go into masks list-of-bytes. Raises ValueError if any detection is missing its `rle` field — caller must configure the HTTP request with InferenceConfiguration(response_mask_format="rle"). Tests cover RLE construction, image-dim resolution precedence, empty-predictions case, error on missing rle, inference_id propagation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the OD v3 tensor pilot pattern for instance segmentation: - v3_tensor.py: manifest verbatim from v3.py (same type literal, name, version, ui_manifest, fields including mask_decode_mode + tradeoff_factor). Class name unchanged. - run_locally: passes tensor inputs + input_color_format="rgb" to run_tensor_native_inference. Adapter's map_inference_kwargs auto-selects mask_format="rle" when the model supports it (inference_models_adapters.py:304), so InstanceDetections.mask comes back as InstancesRLEMasks for free on local — RLE preservation end-to-end with no caller action. - run_remotely: configures InferenceConfiguration with response_mask_format="rle". Each response dict goes through dict_response_to_instance_detections which builds InstancesRLEMasks directly — no polygon-to-mask rasterization on the path. - Both paths attach metadata via attach_prediction_metadata; local mode gets a full class_names dict from model_manager.get_class_names, remote mode gets the sparse dict from class_id_to_name_from_responses. Loader if/else swap added at loader.py:380 (was unconditional import of the numpy v3 block). Same pattern as the OD v3 swap at line 416. Tests cover RLE preservation (local + remote), dense-mask passthrough (model without RLE support), empty predictions, response_mask_format configuration, sparse class_names construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ce call to map_inference_kwargs All 5 `run_tensor_native_inference` methods (OD, IS, KP, Classification, SemSeg) called `self.map_inference_kwargs(**kwargs)` (unpacked) while the method signature is `def map_inference_kwargs(self, kwargs: dict)` (positional). Python raises TypeError on every real invocation — either "got an unexpected keyword argument" when kwargs has entries or "missing 1 required positional argument" when empty. The bug was masked in workflow-block unit tests because they MagicMock model_manager.run_tensor_native_inference; the adapter code path never ran. Surfaced when designing the IS v3 tensor sibling — passing mask_format="rle" through the call chain forced a closer read. Fix: change the 5 sites to positional `self.map_inference_kwargs(kwargs)` matching the 12 other internal callers (preprocess / predict / postprocess on each adapter). Originally introduced by 948c4de ("Add support for tensor-native interface for models"). No semantic change; only makes the code actually executable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pass mask_format="rle" explicitly to run_tensor_native_inference rather than relying on the adapter's auto-selection in map_inference_kwargs. Rationale: the model's post_process defaults mask_format="dense" (yolov8_instance_segmentation_onnx.py:224 et al). The adapter's auto-selection works only when the model supports RLE; that's a silent side effect of the adapter layer that the workflow tensor block shouldn't depend on. By declaring the intent at the call site: - The block reads as obviously compact-aware - If the underlying model doesn't list "rle" in supported_mask_formats, the model raises ModelInputError loudly instead of returning dense - Behaviour matches the remote path which also enforces response_mask_format="rle" via InferenceConfiguration Test added asserting mask_format="rle" lands in the adapter kwargs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirror the v3_tensor pilot pattern for the older OD block versions. Manifests verbatim from v1.py/v2.py — v1 keeps its legacy type alias list ("RoboflowObjectDetectionModel", "ObjectDetectionModel") and outputs only {inference_id, predictions} (no model_id field). v2 keeps the v3-shape output set including model_id. Loader entries (loader.py:410-419) wrapped in if/else swap. Smoke tests assert output dict shape (model_id presence/absence) and that predictions are inference_models.Detections in both paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirror v3_tensor IS pattern for older versions. v1 keeps legacy type aliases (RoboflowInstanceSegmentationModel, InstanceSegmentationModel) and outputs only {inference_id, predictions}. v2 keeps the v3-shape outputs including model_id. Both pass mask_format="rle" to the adapter and response_mask_format="rle" to InferenceConfiguration per the locked PRED.13 explicit-enforcement decision. Loader if/else now groups all 3 IS versions in a single swap block. Smoke tests assert output shape per version and mask_format/ response_mask_format enforcement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- common/tensor_prediction_metadata.py: KeyPoints added to PredictionWithSingularMetadata union (same image_metadata slot pattern as Detections/InstanceDetections). - common/remote_response_converters.py: new dict_response_to_key_points helper. Pads keypoint xy/confidence per instance to max_kps (same convention as numpy add_inference_keypoints_to_sv_detections). Per-instance bbox info lands in key_points_metadata as {bbox_xyxy, bbox_confidence, detection_id, parent_id}. - KP v1/v2/v3 tensor siblings: adapter returns Tuple[List[KeyPoints], Optional[List[Detections]]] — block unpacks to KeyPoints as the canonical predictions output. Standard attach_prediction_metadata for image_metadata population. Pass key_points_threshold (matches model signature) on local path, keypoint_confidence_threshold on remote (matches SDK InferenceConfiguration). - Loader if/else now groups all 3 KP versions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mentation_result - common/remote_response_converters.py: new dict_response_to_semantic_segmentation_result. Decodes base64 PNG segmentation_mask -> torch.int64 tensor (H, W) where pixel value = class_id. Decodes optional confidence_mask -> float32 normalised. class_map (intensity -> label) goes into image_metadata. - SS v1: minimal manifest, no confidence_mode. v2: confidence_mode + custom_confidence. Both pass through to run_tensor_native_inference with input_color_format="rgb". - Remote path: class_names dict built from response's class_map by walking responses and coercing intensity strings to int. - Loader if/else groups both SS versions. Smoke tests on v2 cover local return type and remote base64 PNG decode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ta helper - common/tensor_prediction_metadata.py: new attach_classification_prediction_metadata for the batch-shaped ClassificationPrediction. Single-label classification returns ONE prediction object for the whole batch with class_id/confidence tensors of shape (bs,) and a plural images_metadata: List[dict] of length bs. The helper writes one metadata dict per image and returns the list of resolved inference_ids. - common/remote_response_converters.py: new dict_responses_to_classification_prediction. Takes the full List[response] for a batch and builds a single ClassificationPrediction by reading each response's top class name and matching it back to class_id from the response's predictions list. - MC v3 / v2 / v1 siblings: each block calls the adapter once (batch in, batch out), attaches metadata via the plural helper, then slices the ClassificationPrediction per image into BlockResult rows so downstream consumers receive a one-length view per image without needing to know the batch index. v3 has confidence_mode/custom_confidence, v2 has direct confidence + ALOps fields, v1 minimal + legacy type aliases + STRING_KIND inference_id + no model_id output. - Loader if/else groups all 3 MC versions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Multi-label classification returns List[MultiLabelClassificationPrediction] from the adapter (one per image, not batch-shaped like single-label). Each prediction object has singular image_metadata, so the standard attach_prediction_metadata helper applies directly — no need for the plural images_metadata helper used for single-label. Remote response shape is dict-keyed: predictions: Dict[class_name, {class_id, confidence}] predicted_classes: [class_name, ...] dict_response_to_multi_label_classification walks predicted_classes (only the classes over threshold) and pulls class_id/confidence from the predictions dict, building per-image tensors. class_names dict is harvested by walking each response's predictions dict — preserves the model's full known class table when seen across the batch (different from OD/IS where the class list is sparse). v3 has best/default/custom confidence_mode. v2: direct confidence + AL. v1: minimal + legacy type aliases + STRING_KIND inference_id + no model_id output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

attach_classification_prediction_metadata uses List[str] in its return annotation and inference_ids parameter. The List import was previously removed when cleaning up the unused class_names: List[str] annotation in attach_prediction_metadata, but never restored when the plural- metadata helper was added later in the session — causing NameError at module import time, which broke test collection across every test that transitively imports this module. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bridge between inference_models native prediction types and sv.Detections / sv.KeyPoints / dict, with image_metadata broadcast and bboxes_metadata folded into per-detection arrays. Used by Phase 5 consumer block tensor siblings that wrap an existing sv-shaped implementation: convert at input boundary, run numpy logic, optionally convert back via sv_detections_to_inference_models_detections when downstream is tensor-aware. For InstanceDetections this is the materialisation point — RLE masks are converted to dense numpy here, via the same coco_rle_masks_to_numpy_mask used by inference_models.to_supervision(). Per [ITERATE PRED.4]: RLE stays compact until a dense consumer asks for it; this is that boundary. Also exposes: - key_points_to_supervision_with_metadata for sv.KeyPoints - classification_prediction_to_dict_per_image / multi_label_classification_to_dict for sv-less classification consumers - sv_detections_to_inference_models_detections (reverse direction) - to_supervision_with_metadata (generic dispatch by isinstance) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ment Two implementation flavours kicked off: **Tensor-native** (Phase 5a — crop hot-path): - absolute_static_crop/v1_tensor.py - relative_static_crop/v1_tensor.py Both slice `WorkflowImageData.tensor_image` directly (no numpy materialisation) and build the cropped child via `create_crop_from_tensor`. Dimensions resolved through `_read_shape_without_materialization`. **Wrap-and-delegate** (Phase 5c sample — mutator): - fusion/detections_classes_replacement/v1_tensor.py The classes-replacement logic is heavily sv.Detections-shaped (matches detection_id ↔ parent_id, replaces class arrays in .data, regenerates detection_ids). Reimplementing it natively would be a substantial port. The tensor sibling instead: 1. Converts the tensor inputs at the boundary via `to_supervision_with_metadata` and the classification lowering helpers (single/multi-label → dict shape the numpy block already understands). 2. Delegates to the numpy DetectionsClassesReplacementBlockV1. 3. Converts the sv.Detections output back to inference_models.Detections via `sv_detections_to_inference_models_detections`, preserving the upstream image_metadata so downstream tensor consumers see consistent inference_id/model_id/class_names. This wrap-and-delegate pattern is the pragmatic Phase 5 default for consumers whose internals are too sv-shaped to port natively in one session. The materialisation cost is paid by the block, not the engine (no engine coercion per PRED.6). Downstream still receives inference_models native — the round-trip is local. Dynamic_crop deferred to a follow-up; its per-detection mask slicing needs the same wrap-and-delegate treatment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…or siblings Most consumer blocks have heavily sv.Detections-shaped internals; reimplementing each natively is orthogonal to the workflow-engine-level tensor contract. New common/wrap_consumer.py factory produces a tensor-mode sibling for any numpy consumer block via subclass-and-override: OriginalBlockV1 = make_tensor_wrapper_block(_NumpyImpl) The wrapper: - shares the wrapped class's __name__ / __qualname__ / __module__ so the loader if/else binds the same identifier in both branches - intercepts run() args+kwargs, materialising any inference_models native prediction (Detections, InstanceDetections, KeyPoints, ClassificationPrediction, MultiLabelClassificationPrediction) into sv.Detections / dict via to_supervision.py helpers - recursively materialises Batch contents (preserving indices) and lists - delegates the materialised call to super().run() - returns the result as-is (image / sv-shaped predictions / sink-side action — downstream tensor consumers requiring inference_models native output should use a hand-written sibling like detections_classes_replacement/v1_tensor.py) Per [ITERATE PRED.6] the materialisation cost is paid by the block, not the engine. No engine coercion. 70 wrapper siblings landed (5 lines each, generated programmatically): analytics: data_aggregator, detection_event_log, line_counter v1/v2, overlap, path_deviation v1/v2, time_in_zone v1/v2/v3, velocity classical_cv: distance_measurement, mask_area_measurement, mask_edge_snap, size_measurement, template_matching formatters: vlm_as_classifier v1/v2, vlm_as_detector v1/v2 fusion: detections_consensus, detections_list_rollup, detections_stitch sinks: onvif_movement, roboflow/{custom_metadata, dataset_upload v1/v2, model_monitoring_inference_aggregator, vision_events} transformations: byte_tracker v1/v2/v3, detection_offset, detections_combine, detections_filter, detections_merge, detections_transformation, dynamic_crop, dynamic_zones, bounding_rect, per_class_confidence_filter, perspective_correction, stabilize_detections, stitch_ocr_detections v1/v2 visualizations: background_color, blur, bounding_box, circle, classification_label, color, corner, crop, dot, ellipse, halo v1/v2, heatmap, icon, keypoint, label, line_zone, mask, model_comparison, pixelate, polygon v1/v2, polygon_zone, trace, triangle Loader: 78 if/else swap blocks now (1 per producer + 1 per wrapped consumer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nce_models Without output conversion the wrapper leaked sv.Detections to downstream tensor-native consumers. Now the wrapper recursively walks the run() result, converting any sv.Detections back to inference_models.Detections via sv_detections_to_inference_models_detections. Recursion handles BlockResult shape (List[Dict[str, Any]]) and nested dicts. Visualizer / sink outputs (images, status payloads, scalars) pass through unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Earlier survey of inference_models_adapters.py found only 5 adapter classes with run_tensor_native_inference (one per core-CV task family), which led the plan to flag Phase 3.B as gated on "adapter extension work". Re-checking against the broader inference/models/ tree (the old inference layer's per-model adapter files) showed adapters with run_tensor_native_inference already exist in inference/models/<family>/ <family>_inference_models.py for 20+ foundation models. The precondition was already met. The user confirmed: adapters exist for everything except yolo_world. 24 foundation workflow blocks wrapped via the existing make_tensor_wrapper_block factory (same 5-line shape as Phase 5 fan-out): - clip/v1 - clip_comparison/v1, v2 - depth_estimation/v1 - easy_ocr/v1 - florence2/v1, v2 - gaze/v1 - glm_ocr/v1 - moondream2/v1 - ocr/v1 (DocTR) - perception_encoder/v1 - qwen/v1 (Qwen25VL) - qwen3_5vl/v1, v2 - qwen3vl/v1 - seg_preview/v1 - segment_anything2/v1 - segment_anything2_video/v1 - segment_anything3/v1, v2, v3 - segment_anything3_3d/v1 (nested under SAM3_3D_OBJECTS_ENABLED) - smolvlm/v1 Loader: 24 new if/else swaps. segment_anything3_3d nested inside the SAM3_3D_OBJECTS_ENABLED feature flag block. Out of scope: - yolo_world (no adapter in inference/models/ — separate work) - lmm (deprecated per feedback_mediapipe_deprecation_scope adjacent) - external API blocks (anthropic, gemini, openai, etc. — no inference_models implementation) Branch totals: 117 tensor sibling files, 111 ENABLE_TENSOR_DATA_REPRESENTATION swap blocks in the loader. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… siblings The factory abstraction was the wrong call: hides what each block does behind a decorator, makes per-block edits awkward, and obscures the type-swap intent. Rewriting straight: every tensor sibling is now a verbatim copy of its numpy v<N>.py source. The loader's if/else swap picks the same class in both modes; the file structures are visible and editable per-block. Concretely: - Deleted inference/core/workflows/core_steps/common/wrap_consumer.py. - 94 wrapper-based tensor siblings rewritten as verbatim copies of their corresponding numpy source files. - 58 additional tensor siblings created for previously skipped in-scope blocks: third_party/{barcode_detection, qr_code_detection}, fusion/{buffer, dimension_collapse, image_stack}, flow_control/{inner_workflow, continue_if, delta_filter, rate_limiter}, transformations/{image_slicer v1/v2, qr_code_generator, camera_calibration, stitch_images}, cache/{cache_set, cache_get}, sinks/{webhook, s3, email_notification v1/v2, local_file, slack/notification, twilio/sms v1/v2}, secrets_providers/environment_secrets_store, sampling/{identify_changes, identify_outliers}, formatters/{first_non_empty_or_default, property_definition, csv, json_parser, expression}, trackers/{sort, bytetrack, botsort, ocsort}, math/cosine_similarity, visualizations/{reference_path, text_display, grid}, classical_cv/{contrast_enhancement, sift, morphological_transformation v1/v2, pixel_color_count, contrast_equalization, background_subtraction, threshold, image_blur, contours, camera_focus v1/v2, convert_grayscale, motion_detection, dominant_color, image_preprocessing, sift_comparison v1/v2}. Loader: replaced the wrap_consumer-era swap entries with single-pass top-level rewrites that ignore imports already inside swap blocks. Out of scope (external APIs / deprecated / no-adapter): anthropic_claude, cog_vlm, google_gemini, google_gemma, google_vision_ocr, kimi_openrouter, llama_vision, lmm_classifier, openai, openai_compatible, openrouter, qwen3_5_openrouter, qwen3_6_openrouter, qwen_vlm, stability_ai, lmm, yolo_world. Branch totals: 175 tensor sibling files, 163 ENABLE_TENSOR_DATA_REPRESENTATION swap blocks in loader.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ped + tensor source attached) Earlier producer siblings emitted `inference_models.{Detections, InstanceDetections, KeyPoints, ClassificationPrediction, MultiLabelClassificationPrediction}` directly. Consumer siblings (verbatim copies of numpy) expect `sv.Detections` / dicts — the data flow broke at every producer→consumer boundary. Fix: dual representation. Each producer's output is the numpy-mode shape (sv.Detections for OD/IS/KP/SemSeg, dict for classification), with the original inference_models native source preserved in `.data[TENSOR_NATIVE_PREDICTION_KEY]` (or as a dict entry for the classification dicts). - Consumers (verbatim sv-shaped copies of numpy) work unchanged. - Tensor-aware consumers read `.data[TENSOR_NATIVE_PREDICTION_KEY]` to recover the inference_models native form and skip re-materialisation. Added helpers in common/to_supervision.py: - build_dual_detections (OD) - build_dual_instance_detections (IS; rasterises RLE to dense at this boundary — tensor source in .data still carries RLE for tensor-aware consumers) - build_dual_key_points (KP) - build_dual_classification_dicts (single-label; list per image) - build_dual_multi_label_dict (multi-label; dict per image) - build_dual_semantic_segmentation (passthrough for now — SemSeg's numpy-mode output is itself a specialised structure and the sv-detections-per-class rendering is non-trivial) Updated producer siblings (15 files): OD v1/v2/v3, IS v1/v2/v3, KP v1/v2/v3, MC v1/v2/v3 (per-image dict shape replaces sliced ClassificationPrediction), ML v1/v2/v3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PawelPeczek-Roboflow and others added 30 commits May 20, 2026 15:10

Try to build jetson-utils for JP 5.1

e8f4139

Adjust JP 5.1 build

b59a98e

Fix

c7362b3

Fake opencv being installed

0ea2a84

draft of changes

4a2127e

Try to fix build

0719780

Add support for tensor-native interface for models

948c4de

Compress build steps

0d5e48f

Try to fix build

d46021d

Try to fix build

b33b5cc

WIP - safe commit

24bcbee

PawelPeczek-Roboflow and others added 8 commits May 25, 2026 19:07

PawelPeczek-Roboflow closed this May 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impl/2026 05 22 workflows tensor data representation implementation#2371

Impl/2026 05 22 workflows tensor data representation implementation#2371
PawelPeczek-Roboflow wants to merge 38 commits into
mainfrom
impl/2026-05-22-workflows-tensor-data-representation-implementation

PawelPeczek-Roboflow commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

PawelPeczek-Roboflow commented May 26, 2026

What does this PR do?

Type of Change

Testing

Checklist

Additional Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant