Skip to content

Impl/2026 05 22 workflows tensor data representation implementation#2371

Closed
PawelPeczek-Roboflow wants to merge 38 commits into
mainfrom
impl/2026-05-22-workflows-tensor-data-representation-implementation
Closed

Impl/2026 05 22 workflows tensor data representation implementation#2371
PawelPeczek-Roboflow wants to merge 38 commits into
mainfrom
impl/2026-05-22-workflows-tensor-data-representation-implementation

Conversation

@PawelPeczek-Roboflow
Copy link
Copy Markdown
Collaborator

What does this PR do?

Related Issue(s):

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Other:

Testing

  • I have tested this change locally
  • I have added/updated tests for this change

Test details:

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors
  • I have updated the documentation accordingly (if applicable)

Additional Context

PawelPeczek-Roboflow and others added 30 commits May 20, 2026 15:10
Add an optional torch.Tensor representation to WorkflowImageData with
lazy BGR<->RGB conversion between the two backings. Layout contract:
numpy is HWC uint8 BGR (cv2 native), tensor is HWC uint8 RGB
(inference-models / torch convention). dtype is preserved; no implicit
device moves.

- __init__ accepts tensor_image; "empty" check covers the new field.
- numpy_image property: if only tensor is set, materialize via
  detach().to("cpu").numpy() with a channel flip and cache the result.
- tensor_image property: mirror fallback from numpy with channel flip.
- copy_and_replace propagates tensor_image.
- create_crop_from_tensor: tensor-native sibling of create_crop with
  identical metadata math.
- _read_shape_without_materialization avoids forcing device->host just
  to fill parent_metadata / workflow_root_ancestor_metadata origin
  coordinates when only the tensor representation is set.

Public surface is unchanged; the field is opt-in. Serialization (via
base64_image -> numpy_image -> JPEG) continues to work transparently.
New common/deserializers_tensor.py and common/serializers_tensor.py.
The numpy files are untouched per the plan's locked [ITERATE 4.A]
decision; the tensor file's deserialize_image_kind handles raw
torch.Tensor input and the dict-shape {"type": "tensor", "value": ...}
input, and delegates all other inputs (np.ndarray, base64, URL, dict
with type=base64/url) to the numpy implementation. The serializer
sibling currently re-exports the numpy functions because the lazy
tensor->numpy fallback in WorkflowImageData makes serialise_image
correct in both modes; the file exists to keep the loader's import
swap symmetric and to give future tensor-aware optimisations a
landing spot without touching the loader contract.
Mirror module per the plan's Step 5b. attach_parents_coordinates_*
helpers go through ImageParentMetadata.origin_coordinates, which is
populated by WorkflowImageData.parent_metadata /
workflow_root_ancestor_metadata via _read_shape_without_materialization,
so the tensor mirrors currently delegate to the numpy implementations.
The module exists so future tensor-specific divergence has a landing
spot the loader can swap to without touching the numpy file.
…path

map_inference_kwargs unconditionally sets input_color_format="bgr",
which is the right default for the cv2-derived numpy paths (preprocess
/ predict / postprocess) but breaks the new run_tensor_native_inference
entry points: workflows tensor blocks pass RGB tensors per the
workflows tensor-data-representation plan and need a way to opt out of
the BGR override.

In each of the four adapter classes that implement
run_tensor_native_inference (object detection, instance segmentation,
classification, semantic segmentation), pop input_color_format from
kwargs before map_inference_kwargs (default None), then restore it
afterwards. map_inference_kwargs itself is untouched, so every old
execution path keeps the BGR default it has today.

Effect: callers of run_tensor_native_inference can pass
input_color_format="rgb" (or "bgr", or leave it None) and the value
travels through to the underlying model unchanged.
New v3_tensor.py next to v3.py. Per the plan's locked decisions:
- Manifest is verbatim (same type literal, name, version, description,
  fields, outputs, ui_manifest). Class name unchanged so the loader's
  if/else swap binds the same identifier in both branches.
- run_locally calls model_manager.run_tensor_native_inference with
  the per-image torch tensors from WorkflowImageData.tensor_image and
  passes input_color_format="rgb" (adapter now respects caller's value).
  Skips convert_inference_detections_batch_to_sv_detections because
  the adapter returns sv.Detections directly.
- attach_parents_coordinates_to_batch_of_sv_detections_tensor used on
  the local path; numpy mirror reused on the remote path per
  [ITERATE 6.A].
- run_remotely materialises base64_image (which lazily goes
  tensor -> numpy -> JPEG) and hits the same HTTP API as v3.py. Remote
  response is dict-shaped so the numpy converter applies.
- inference_id read from sv.Detections.data when present; uuid4 fallback
  per image per [ITERATE 6.B].
Wire the three pieces the plan's Step 3 calls for, gated on
ENABLE_TENSOR_DATA_REPRESENTATION:

- Import env flag.
- Move serialise_image / serialise_sv_detections / serialise_rle_sv_detections
  and deserialize_image_kind / deserialize_detections_kind /
  deserialize_rle_detections_kind into an if/else swap that picks the
  _tensor module when the flag is on. All other (de)serializer
  functions stay imported from the numpy file (the tensor file only
  mirrors the image/detection trio that has tensor-aware behaviour).
- Object-detection V3 block import becomes an if/else that picks
  v3_tensor.py when the flag is on. Class name unchanged, so the
  blocks = [...] list and load_blocks() are untouched.

load_kinds() and KINDS_SERIALIZERS / KINDS_DESERIALIZERS dict
construction are unchanged per the plan ("no new kinds").
…ponse conversion

Two utilities the per-block tensor producer siblings will share, per plan
v3.1 step A:

- `core_steps/common/tensor_prediction_metadata.py` —
  `attach_prediction_metadata` populates `image_metadata` on
  `inference_models.{Detections,InstanceDetections,
  MultiLabelClassificationPrediction,SemanticSegmentationResult}` from a
  `WorkflowImageData`'s parent / root-parent metadata. One dict per
  prediction (no per-detection replication). `inference_id` resolved as
  existing-metadata > explicit-arg > minted uuid4. `class_names=None`
  signals "global list unavailable" (remote path) — the key is omitted
  in that case. `ClassificationPrediction` rejected (plural
  `images_metadata` needs a dedicated helper).

- `core_steps/common/remote_response_converters.py` —
  `dict_response_to_object_detections` builds an
  `inference_models.Detections` from one HTTP-API response dict. Mirrors
  `convert_inference_detections_batch_to_sv_detections` semantics
  (`utils.py:105`) but emits tensor-native instead of `sv.Detections`.
  Per-detection `detection_id`/`parent_id`/`class` go to
  `bboxes_metadata`. Top-level `inference_id` / image dims go to
  `image_metadata`.

Unit tests cover key population, inference_id resolution precedence,
empty-predictions, class-name handling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ions natively

The previous pilot (`3c0ff2d10`) returned `List[inference_models.Detections]`
from `run_tensor_native_inference` but then ran sv.Detections-shaped
post-processing on it (`attach_prediction_type_info_to_sv_detections_batch`,
`filter_out_unwanted_classes_from_sv_detections_batch`,
`attach_parents_coordinates_to_batch_of_sv_detections_tensor`) — broken
because `inference_models.Detections` has no `.data` dict. User review
during the first /implement run identified the architectural cause:
predictions should also type-swap under the flag (not just images), and
metadata belongs in `image_metadata`/`bboxes_metadata` on the
tensor-native types, not replicated across `sv.Detections.data` arrays.

This rewrite, per plan v3.1:

- `run_locally`: pass tensor inputs + `input_color_format="rgb"` to the
  adapter, take its `List[Detections]` return value, attach metadata
  via `attach_prediction_metadata` once per prediction. Drop NMS /
  class-filter / coordinate-attach functions — model already applies
  NMS / class filter / max_detections; `attach_prediction_metadata`
  records parent / root-parent coordinates in `image_metadata`. Pull
  `class_names` from `model_manager.get_class_names(model_id)`.

- `run_remotely`: materialise base64 via the existing lazy fallback,
  call the same HTTP path as v3.py, then convert each response dict
  into `inference_models.Detections` via the new
  `dict_response_to_object_detections` converter. `class_names=None`
  on the attach call — the global list is not available remotely;
  per-detection class strings are preserved in `bboxes_metadata` by
  the converter.

Manifest verbatim from v3.py. Class name unchanged. The loader's
existing `if/else` swap (`93299327e`) picks this file when
`ENABLE_TENSOR_DATA_REPRESENTATION=True`.

Unit tests with MagicMock'd model_manager + InferenceHTTPClient cover
metadata attach, color-format passthrough, class_filter passthrough,
empty predictions, inference_id resolution, dimensions writing,
singleton-response wrapping.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nsor paths

Previous shape was `List[str]` with `None` to signal "not available
remotely". User reshaped: a `Dict[int, str]` mapping is constructible
in either mode and lets downstream consumers do `class_names[class_id]`
without index alignment.

- `tensor_prediction_metadata.attach_prediction_metadata` accepts
  `class_names: Optional[Dict[int, str]]`. Defensive copy via
  `dict(class_names)`. `None` still omits the key entirely.

- `remote_response_converters.class_id_to_name_from_responses` —
  new utility that merges a sparse `class_id -> class_name` mapping
  across a batch of response dicts. First-seen wins on duplicates.
  Coerces class_id to int (some APIs hand back string-numbers).
  Skips entries missing either id or name.

- `remote_response_converters.dict_response_to_object_detections`
  no longer puts per-detection `class` into `bboxes_metadata` —
  redundant with the dict mapping carried at the prediction level.

- OD pilot `v3_tensor.py`:
  - Local: `dict(enumerate(model_manager.get_class_names(...)))` builds
    the full mapping.
  - Remote: `class_id_to_name_from_responses(responses)` builds the
    sparse mapping from labels actually seen in the batch.

Tests updated to assert dict shapes; new tests cover sparse construction,
defensive-copy semantics, missing-id/name skipping, int-coercion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New helper in remote_response_converters.py that mirrors
dict_response_to_object_detections but builds inference_models
InstanceDetections with the mask field set to InstancesRLEMasks —
no polygon-to-mask rasterization on the path.

Reads `response.image.width/height` for InstancesRLEMasks.image_size
(preferred) with fallback to per-detection `rle.size`. Per-detection
`rle.counts` go into masks list-of-bytes. Raises ValueError if any
detection is missing its `rle` field — caller must configure the
HTTP request with InferenceConfiguration(response_mask_format="rle").

Tests cover RLE construction, image-dim resolution precedence,
empty-predictions case, error on missing rle, inference_id propagation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the OD v3 tensor pilot pattern for instance segmentation:

- v3_tensor.py: manifest verbatim from v3.py (same type literal, name,
  version, ui_manifest, fields including mask_decode_mode +
  tradeoff_factor). Class name unchanged.
- run_locally: passes tensor inputs + input_color_format="rgb" to
  run_tensor_native_inference. Adapter's map_inference_kwargs
  auto-selects mask_format="rle" when the model supports it
  (inference_models_adapters.py:304), so InstanceDetections.mask
  comes back as InstancesRLEMasks for free on local — RLE preservation
  end-to-end with no caller action.
- run_remotely: configures InferenceConfiguration with
  response_mask_format="rle". Each response dict goes through
  dict_response_to_instance_detections which builds InstancesRLEMasks
  directly — no polygon-to-mask rasterization on the path.
- Both paths attach metadata via attach_prediction_metadata; local mode
  gets a full class_names dict from model_manager.get_class_names,
  remote mode gets the sparse dict from class_id_to_name_from_responses.

Loader if/else swap added at loader.py:380 (was unconditional import of
the numpy v3 block). Same pattern as the OD v3 swap at line 416.

Tests cover RLE preservation (local + remote), dense-mask passthrough
(model without RLE support), empty predictions, response_mask_format
configuration, sparse class_names construction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ce call to map_inference_kwargs

All 5 `run_tensor_native_inference` methods (OD, IS, KP, Classification,
SemSeg) called `self.map_inference_kwargs(**kwargs)` (unpacked) while
the method signature is `def map_inference_kwargs(self, kwargs: dict)`
(positional). Python raises TypeError on every real invocation —
either "got an unexpected keyword argument" when kwargs has entries
or "missing 1 required positional argument" when empty.

The bug was masked in workflow-block unit tests because they MagicMock
model_manager.run_tensor_native_inference; the adapter code path never
ran. Surfaced when designing the IS v3 tensor sibling — passing
mask_format="rle" through the call chain forced a closer read.

Fix: change the 5 sites to positional `self.map_inference_kwargs(kwargs)`
matching the 12 other internal callers (preprocess / predict /
postprocess on each adapter).

Originally introduced by 948c4de ("Add support for tensor-native
interface for models"). No semantic change; only makes the code
actually executable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass mask_format="rle" explicitly to run_tensor_native_inference rather
than relying on the adapter's auto-selection in map_inference_kwargs.

Rationale: the model's post_process defaults mask_format="dense"
(yolov8_instance_segmentation_onnx.py:224 et al). The adapter's
auto-selection works only when the model supports RLE; that's a silent
side effect of the adapter layer that the workflow tensor block
shouldn't depend on. By declaring the intent at the call site:
- The block reads as obviously compact-aware
- If the underlying model doesn't list "rle" in supported_mask_formats,
  the model raises ModelInputError loudly instead of returning dense
- Behaviour matches the remote path which also enforces
  response_mask_format="rle" via InferenceConfiguration

Test added asserting mask_format="rle" lands in the adapter kwargs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the v3_tensor pilot pattern for the older OD block versions.
Manifests verbatim from v1.py/v2.py — v1 keeps its legacy type alias
list ("RoboflowObjectDetectionModel", "ObjectDetectionModel") and
outputs only {inference_id, predictions} (no model_id field). v2 keeps
the v3-shape output set including model_id.

Loader entries (loader.py:410-419) wrapped in if/else swap.

Smoke tests assert output dict shape (model_id presence/absence) and
that predictions are inference_models.Detections in both paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror v3_tensor IS pattern for older versions. v1 keeps legacy type
aliases (RoboflowInstanceSegmentationModel, InstanceSegmentationModel)
and outputs only {inference_id, predictions}. v2 keeps the v3-shape
outputs including model_id. Both pass mask_format="rle" to the adapter
and response_mask_format="rle" to InferenceConfiguration per the
locked PRED.13 explicit-enforcement decision.

Loader if/else now groups all 3 IS versions in a single swap block.

Smoke tests assert output shape per version and mask_format/
response_mask_format enforcement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- common/tensor_prediction_metadata.py: KeyPoints added to
  PredictionWithSingularMetadata union (same image_metadata slot pattern
  as Detections/InstanceDetections).
- common/remote_response_converters.py: new
  dict_response_to_key_points helper. Pads keypoint xy/confidence per
  instance to max_kps (same convention as numpy
  add_inference_keypoints_to_sv_detections). Per-instance bbox info
  lands in key_points_metadata as {bbox_xyxy, bbox_confidence,
  detection_id, parent_id}.
- KP v1/v2/v3 tensor siblings: adapter returns
  Tuple[List[KeyPoints], Optional[List[Detections]]] — block unpacks
  to KeyPoints as the canonical predictions output. Standard
  attach_prediction_metadata for image_metadata population. Pass
  key_points_threshold (matches model signature) on local path,
  keypoint_confidence_threshold on remote (matches SDK
  InferenceConfiguration).
- Loader if/else now groups all 3 KP versions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mentation_result

- common/remote_response_converters.py:
  new dict_response_to_semantic_segmentation_result. Decodes
  base64 PNG segmentation_mask -> torch.int64 tensor (H, W) where
  pixel value = class_id. Decodes optional confidence_mask -> float32
  normalised. class_map (intensity -> label) goes into image_metadata.
- SS v1: minimal manifest, no confidence_mode. v2: confidence_mode +
  custom_confidence. Both pass through to run_tensor_native_inference
  with input_color_format="rgb".
- Remote path: class_names dict built from response's class_map by
  walking responses and coercing intensity strings to int.
- Loader if/else groups both SS versions.

Smoke tests on v2 cover local return type and remote base64 PNG decode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ta helper

- common/tensor_prediction_metadata.py: new
  attach_classification_prediction_metadata for the batch-shaped
  ClassificationPrediction. Single-label classification returns ONE
  prediction object for the whole batch with class_id/confidence
  tensors of shape (bs,) and a plural images_metadata: List[dict] of
  length bs. The helper writes one metadata dict per image and returns
  the list of resolved inference_ids.
- common/remote_response_converters.py:
  new dict_responses_to_classification_prediction. Takes the full
  List[response] for a batch and builds a single ClassificationPrediction
  by reading each response's top class name and matching it back to
  class_id from the response's predictions list.
- MC v3 / v2 / v1 siblings: each block calls the adapter once (batch
  in, batch out), attaches metadata via the plural helper, then slices
  the ClassificationPrediction per image into BlockResult rows so
  downstream consumers receive a one-length view per image without
  needing to know the batch index. v3 has confidence_mode/custom_confidence,
  v2 has direct confidence + ALOps fields, v1 minimal + legacy type
  aliases + STRING_KIND inference_id + no model_id output.
- Loader if/else groups all 3 MC versions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-label classification returns List[MultiLabelClassificationPrediction]
from the adapter (one per image, not batch-shaped like single-label).
Each prediction object has singular image_metadata, so the standard
attach_prediction_metadata helper applies directly — no need for the
plural images_metadata helper used for single-label.

Remote response shape is dict-keyed:
  predictions: Dict[class_name, {class_id, confidence}]
  predicted_classes: [class_name, ...]
dict_response_to_multi_label_classification walks predicted_classes
(only the classes over threshold) and pulls class_id/confidence from
the predictions dict, building per-image tensors.

class_names dict is harvested by walking each response's predictions
dict — preserves the model's full known class table when seen across
the batch (different from OD/IS where the class list is sparse).

v3 has best/default/custom confidence_mode. v2: direct confidence + AL.
v1: minimal + legacy type aliases + STRING_KIND inference_id + no
model_id output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PawelPeczek-Roboflow and others added 8 commits May 25, 2026 19:07
attach_classification_prediction_metadata uses List[str] in its return
annotation and inference_ids parameter. The List import was previously
removed when cleaning up the unused class_names: List[str] annotation
in attach_prediction_metadata, but never restored when the plural-
metadata helper was added later in the session — causing NameError at
module import time, which broke test collection across every test
that transitively imports this module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bridge between inference_models native prediction types and
sv.Detections / sv.KeyPoints / dict, with image_metadata broadcast
and bboxes_metadata folded into per-detection arrays.

Used by Phase 5 consumer block tensor siblings that wrap an existing
sv-shaped implementation: convert at input boundary, run numpy logic,
optionally convert back via sv_detections_to_inference_models_detections
when downstream is tensor-aware.

For InstanceDetections this is the materialisation point — RLE masks
are converted to dense numpy here, via the same
coco_rle_masks_to_numpy_mask used by inference_models.to_supervision().
Per [ITERATE PRED.4]: RLE stays compact until a dense consumer asks
for it; this is that boundary.

Also exposes:
- key_points_to_supervision_with_metadata for sv.KeyPoints
- classification_prediction_to_dict_per_image / multi_label_classification_to_dict
  for sv-less classification consumers
- sv_detections_to_inference_models_detections (reverse direction)
- to_supervision_with_metadata (generic dispatch by isinstance)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ment

Two implementation flavours kicked off:

**Tensor-native** (Phase 5a — crop hot-path):
- absolute_static_crop/v1_tensor.py
- relative_static_crop/v1_tensor.py

Both slice `WorkflowImageData.tensor_image` directly (no numpy
materialisation) and build the cropped child via
`create_crop_from_tensor`. Dimensions resolved through
`_read_shape_without_materialization`.

**Wrap-and-delegate** (Phase 5c sample — mutator):
- fusion/detections_classes_replacement/v1_tensor.py

The classes-replacement logic is heavily sv.Detections-shaped
(matches detection_id ↔ parent_id, replaces class arrays in .data,
regenerates detection_ids). Reimplementing it natively would be a
substantial port. The tensor sibling instead:

1. Converts the tensor inputs at the boundary via
   `to_supervision_with_metadata` and the classification lowering
   helpers (single/multi-label → dict shape the numpy block already
   understands).
2. Delegates to the numpy DetectionsClassesReplacementBlockV1.
3. Converts the sv.Detections output back to
   inference_models.Detections via
   `sv_detections_to_inference_models_detections`, preserving the
   upstream image_metadata so downstream tensor consumers see
   consistent inference_id/model_id/class_names.

This wrap-and-delegate pattern is the pragmatic Phase 5 default for
consumers whose internals are too sv-shaped to port natively in one
session. The materialisation cost is paid by the block, not the engine
(no engine coercion per PRED.6). Downstream still receives
inference_models native — the round-trip is local.

Dynamic_crop deferred to a follow-up; its per-detection mask slicing
needs the same wrap-and-delegate treatment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…or siblings

Most consumer blocks have heavily sv.Detections-shaped internals; reimplementing each natively is orthogonal to the workflow-engine-level tensor contract. New common/wrap_consumer.py factory produces a tensor-mode sibling for any numpy consumer block via subclass-and-override:

    OriginalBlockV1 = make_tensor_wrapper_block(_NumpyImpl)

The wrapper:
- shares the wrapped class's __name__ / __qualname__ / __module__ so the loader if/else binds the same identifier in both branches
- intercepts run() args+kwargs, materialising any inference_models native prediction (Detections, InstanceDetections, KeyPoints, ClassificationPrediction, MultiLabelClassificationPrediction) into sv.Detections / dict via to_supervision.py helpers
- recursively materialises Batch contents (preserving indices) and lists
- delegates the materialised call to super().run()
- returns the result as-is (image / sv-shaped predictions / sink-side action — downstream tensor consumers requiring inference_models native output should use a hand-written sibling like detections_classes_replacement/v1_tensor.py)

Per [ITERATE PRED.6] the materialisation cost is paid by the block, not the engine. No engine coercion.

70 wrapper siblings landed (5 lines each, generated programmatically):

analytics: data_aggregator, detection_event_log, line_counter v1/v2, overlap, path_deviation v1/v2, time_in_zone v1/v2/v3, velocity
classical_cv: distance_measurement, mask_area_measurement, mask_edge_snap, size_measurement, template_matching
formatters: vlm_as_classifier v1/v2, vlm_as_detector v1/v2
fusion: detections_consensus, detections_list_rollup, detections_stitch
sinks: onvif_movement, roboflow/{custom_metadata, dataset_upload v1/v2, model_monitoring_inference_aggregator, vision_events}
transformations: byte_tracker v1/v2/v3, detection_offset, detections_combine, detections_filter, detections_merge, detections_transformation, dynamic_crop, dynamic_zones, bounding_rect, per_class_confidence_filter, perspective_correction, stabilize_detections, stitch_ocr_detections v1/v2
visualizations: background_color, blur, bounding_box, circle, classification_label, color, corner, crop, dot, ellipse, halo v1/v2, heatmap, icon, keypoint, label, line_zone, mask, model_comparison, pixelate, polygon v1/v2, polygon_zone, trace, triangle

Loader: 78 if/else swap blocks now (1 per producer + 1 per wrapped consumer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nce_models

Without output conversion the wrapper leaked sv.Detections to downstream
tensor-native consumers. Now the wrapper recursively walks the run()
result, converting any sv.Detections back to inference_models.Detections
via sv_detections_to_inference_models_detections.

Recursion handles BlockResult shape (List[Dict[str, Any]]) and nested
dicts. Visualizer / sink outputs (images, status payloads, scalars) pass
through unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier survey of inference_models_adapters.py found only 5 adapter
classes with run_tensor_native_inference (one per core-CV task family),
which led the plan to flag Phase 3.B as gated on "adapter extension
work". Re-checking against the broader inference/models/ tree (the
old inference layer's per-model adapter files) showed adapters with
run_tensor_native_inference already exist in inference/models/<family>/
<family>_inference_models.py for 20+ foundation models. The precondition
was already met.

The user confirmed: adapters exist for everything except yolo_world.

24 foundation workflow blocks wrapped via the existing
make_tensor_wrapper_block factory (same 5-line shape as Phase 5 fan-out):

- clip/v1
- clip_comparison/v1, v2
- depth_estimation/v1
- easy_ocr/v1
- florence2/v1, v2
- gaze/v1
- glm_ocr/v1
- moondream2/v1
- ocr/v1 (DocTR)
- perception_encoder/v1
- qwen/v1 (Qwen25VL)
- qwen3_5vl/v1, v2
- qwen3vl/v1
- seg_preview/v1
- segment_anything2/v1
- segment_anything2_video/v1
- segment_anything3/v1, v2, v3
- segment_anything3_3d/v1 (nested under SAM3_3D_OBJECTS_ENABLED)
- smolvlm/v1

Loader: 24 new if/else swaps. segment_anything3_3d nested inside the
SAM3_3D_OBJECTS_ENABLED feature flag block.

Out of scope:
- yolo_world (no adapter in inference/models/ — separate work)
- lmm (deprecated per feedback_mediapipe_deprecation_scope adjacent)
- external API blocks (anthropic, gemini, openai, etc. — no
  inference_models implementation)

Branch totals: 117 tensor sibling files, 111 ENABLE_TENSOR_DATA_REPRESENTATION
swap blocks in the loader.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… siblings

The factory abstraction was the wrong call: hides what each block does
behind a decorator, makes per-block edits awkward, and obscures the
type-swap intent. Rewriting straight: every tensor sibling is now a
verbatim copy of its numpy v<N>.py source. The loader's if/else swap
picks the same class in both modes; the file structures are visible
and editable per-block.

Concretely:
- Deleted inference/core/workflows/core_steps/common/wrap_consumer.py.
- 94 wrapper-based tensor siblings rewritten as verbatim copies of
  their corresponding numpy source files.
- 58 additional tensor siblings created for previously skipped
  in-scope blocks: third_party/{barcode_detection, qr_code_detection},
  fusion/{buffer, dimension_collapse, image_stack},
  flow_control/{inner_workflow, continue_if, delta_filter, rate_limiter},
  transformations/{image_slicer v1/v2, qr_code_generator,
  camera_calibration, stitch_images},
  cache/{cache_set, cache_get}, sinks/{webhook, s3,
  email_notification v1/v2, local_file, slack/notification,
  twilio/sms v1/v2}, secrets_providers/environment_secrets_store,
  sampling/{identify_changes, identify_outliers},
  formatters/{first_non_empty_or_default, property_definition, csv,
  json_parser, expression}, trackers/{sort, bytetrack, botsort,
  ocsort}, math/cosine_similarity, visualizations/{reference_path,
  text_display, grid}, classical_cv/{contrast_enhancement, sift,
  morphological_transformation v1/v2, pixel_color_count,
  contrast_equalization, background_subtraction, threshold,
  image_blur, contours, camera_focus v1/v2, convert_grayscale,
  motion_detection, dominant_color, image_preprocessing,
  sift_comparison v1/v2}.

Loader: replaced the wrap_consumer-era swap entries with single-pass
top-level rewrites that ignore imports already inside swap blocks.

Out of scope (external APIs / deprecated / no-adapter):
anthropic_claude, cog_vlm, google_gemini, google_gemma,
google_vision_ocr, kimi_openrouter, llama_vision, lmm_classifier,
openai, openai_compatible, openrouter, qwen3_5_openrouter,
qwen3_6_openrouter, qwen_vlm, stability_ai, lmm, yolo_world.

Branch totals: 175 tensor sibling files, 163 ENABLE_TENSOR_DATA_REPRESENTATION
swap blocks in loader.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ped + tensor source attached)

Earlier producer siblings emitted `inference_models.{Detections,
InstanceDetections, KeyPoints, ClassificationPrediction,
MultiLabelClassificationPrediction}` directly. Consumer siblings
(verbatim copies of numpy) expect `sv.Detections` / dicts — the
data flow broke at every producer→consumer boundary.

Fix: dual representation. Each producer's output is the numpy-mode
shape (sv.Detections for OD/IS/KP/SemSeg, dict for classification),
with the original inference_models native source preserved in
`.data[TENSOR_NATIVE_PREDICTION_KEY]` (or as a dict entry for the
classification dicts).

- Consumers (verbatim sv-shaped copies of numpy) work unchanged.
- Tensor-aware consumers read `.data[TENSOR_NATIVE_PREDICTION_KEY]`
  to recover the inference_models native form and skip
  re-materialisation.

Added helpers in common/to_supervision.py:
- build_dual_detections (OD)
- build_dual_instance_detections (IS; rasterises RLE to dense
  at this boundary — tensor source in .data still carries RLE
  for tensor-aware consumers)
- build_dual_key_points (KP)
- build_dual_classification_dicts (single-label; list per image)
- build_dual_multi_label_dict (multi-label; dict per image)
- build_dual_semantic_segmentation (passthrough for now —
  SemSeg's numpy-mode output is itself a specialised structure
  and the sv-detections-per-class rendering is non-trivial)

Updated producer siblings (15 files): OD v1/v2/v3, IS v1/v2/v3,
KP v1/v2/v3, MC v1/v2/v3 (per-image dict shape replaces sliced
ClassificationPrediction), ML v1/v2/v3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant