Add SAM2 streaming video tracker (inference_models + workflow block) by hansent · Pull Request #2245 · roboflow/inference

hansent · 2026-04-20T16:47:24Z

Summary

New HF-backed SAM2Video model in inference_models/inference_models/models/sam2_video/ — inherits from the new shared HFStreamingVideoBase so the streaming (prompt / track) contract is defined once and reused by future HF video trackers.
Registered under model architecture sam2video (instance-segmentation, HF backend). Four variants shipped: sam2video/tiny, sam2video/small (default), sam2video/base-plus, sam2video/large — all four weight zips uploaded to rf-platform-models/ and registered on staging via model-registry-sdk#5.
New workflow block roboflow_core/segment_anything_2_video@v1 (LOCAL-only — raises NotImplementedError on remote execution since per-video session state can't cross a process boundary). Multiplexes a single SAM2 predictor across videos by keying state on WorkflowImageData.video_metadata.video_identifier.
Three prompt modes on the block: first_frame (prompt once, then track), every_n_frames (re-seed every N frames), every_frame (re-seed every frame).
Fixes a real bug in the HF input_boxes nesting ([image, boxes, coords] — 3 levels, not 4) that the mocked unit tests missed and only surfaced under real weights.
The legacy SegmentAnything2VideoRT / sam2_rt.SAM2ForStream (Meta's sam2 package camera predictor) is untouched by this PR; both remain registered concurrently.

Out of scope for this PR

SAM3 video (streaming) was implemented on this branch alongside SAM2 but stripped in the final commit. The facebook/sam3 checkpoints are gated on HuggingFace and we're still waiting on access. HFStreamingVideoBase is deliberately left in models/common/ so the SAM3 follow-up can inherit it unchanged.

Test plan

25 unit tests pass — block logic (tests/workflows/unit_tests/core_steps/models/foundation/test_segment_anything2_video.py, test_streaming_video_common.py) + SAM2 HF model (inference_models/tests/unit_tests/models/test_sam2_video.py)
Staging registration verified: sam2video/small loads via AutoModel.from_pretrained + runs prompt + track end-to-end against api.roboflow.one
Registration script for all four variants lives at roboflow/model-registry-sdk PR Include class_id and class_list in inference response #5
Run integration tests on a GPU runner once any uploaded variant is sealed
Smoke test the workflow block via InferencePipeline + a short MP4
After staging is fully verified: register against production

Introduces a SAM3 streaming tracker that mirrors the existing SAM2ForStream interface (prompt / track returning (masks, object_ids, state_dict)) so both can be used interchangeably by upstream code. - inference_models/models/sam3_rt/sam3_pytorch.py: SAM3ForStream backed by HuggingFace transformers' Sam3VideoModel / Sam3VideoProcessor. The native sam3 package's video predictor requires a full video resource upfront; the transformers port exposes init_video_session + per-frame model(frame=...), which is the shape we need for InferencePipeline-style streaming. - Accepts bbox and/or text prompts; state_dict is opaque (wraps the HF Sam3VideoInferenceSession) and must be kept in memory by the caller — it's not serializable across processes. - Register (segment-anything-3-rt, INSTANCE_SEGMENTATION_TASK, BackendType.HF) in models_registry alongside the existing SAM2-RT entry. Tests: - inference_models/tests/unit_tests/models/test_sam3_rt.py — 24 unit tests covering helpers (_normalise_bboxes, _unpack_processed_outputs, etc.) plus class behaviour using MagicMock model/processor. No weights required. - inference_models/tests/integration_tests/models/test_sam2_rt_predictions.py — new integration suite for the existing SAM2ForStream (prompt -> track on synthetic frames, centroid-moves assertion, track-without-prompt raises, torch.Tensor input). - inference_models/tests/integration_tests/models/test_sam3_rt_predictions.py — analogous suite for SAM3ForStream. - conftest.py: sam2_rt_package and sam3_rt_package fixtures download zips from rf-platform-models; docstrings list the expected file contents for upload. https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN

Two new LOCAL-only workflow blocks that drive the inference_models streaming trackers (SAM2ForStream / SAM3ForStream) from workflows powered by InferencePipeline. Both blocks multiplex a single model instance across many videos by keying state_dicts on video_metadata.video_identifier, reset sessions when frame_number rolls back, and support three prompt modes: first_frame, every_n_frames, every_frame. - inference/core/workflows/core_steps/models/foundation/ _streaming_video_common.py: shared helpers (state bookkeeping, prompt-vs-track decision logic, sv.Detections assembly with SAM-assigned tracker_ids). - segment_anything2_video/v1.py: SAM2VideoTrackerBlockV1 (type: roboflow_core/segment_anything_2_video@v1, default model_id: segment-anything-2-rt). - segment_anything3_video/v1.py: SAM3VideoTrackerBlockV1 (type: roboflow_core/sam3_video@v1, default model_id: segment-anything-3-rt). Additionally accepts text prompts via class_names; boxes win when both are supplied. - Both raise NotImplementedError on REMOTE step execution — per-video session state cannot survive a remote boundary. - Models are loaded via inference_models.AutoModel.from_pretrained so backend negotiation / package download / caching flow through the standard inference_models pipeline. - Registered in core_steps/loader.py. Tests (30 total, all passing, no weights required): - test_segment_anything2_video.py — 10 tests covering manifest, REMOTE rejection, first_frame/every_n_frames/every_frame modes, state threading across track calls, multi-stream isolation, stream-restart detection. - test_segment_anything3_video.py — 9 tests with similar coverage plus text-vs-box prompt routing. - test_streaming_video_common.py — 11 tests for the shared helpers. https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN

Refactors the streaming trackers into a shared HuggingFace transformers base and adds a SAM2Video counterpart to the existing SAM3Video. The older sam2_rt (SAM2ForStream using Meta's sam2 camera predictor) is kept untouched — per the feedback it hasn't been exercised much in practice. Model classes ------------- - inference_models/models/common/hf_streaming_video.py: HFStreamingVideoBase containing all the HF streaming boilerplate — session init, prompt/track methods, mask/obj_id extraction, opaque state_dict contract. - inference_models/models/sam2_video/sam2_video_hf.py: SAM2Video (lazy-imports transformers.Sam2VideoModel / Sam2VideoProcessor; rejects text prompts). - inference_models/models/sam3_video/sam3_video_hf.py: SAM3Video, moved from the previous sam3_rt path; now a thin ~25-line subclass after the shared base absorbed the helpers (lazy-imports transformers.Sam3VideoModel / Sam3VideoProcessor; accepts both text and box prompts). Registry -------- - sam2video: (INSTANCE_SEGMENTATION_TASK, BackendType.HF) -> SAM2Video - sam3video: (INSTANCE_SEGMENTATION_TASK, BackendType.HF) -> SAM3Video - segment-anything-2-rt stays registered against SAM2ForStream. - segment-anything-3-rt entry dropped (never released). Workflow blocks now default to these ids: - roboflow_core/segment_anything_2_video@v1 -> "sam2video" - roboflow_core/sam3_video@v1 -> "sam3video" Tests ----- - Unit tests: added test_sam2_video.py (4 SAM2-specific), renamed test_sam3_rt.py -> test_sam3_video.py and updated imports (24 tests covering helpers on the shared base plus SAM3 class behaviour). - Integration tests: renamed SAM3 file, added SAM2 counterpart. New fixtures sam2_video_package / sam3_video_package (expected zips at rf-platform-models/sam2video.zip and rf-platform-models/sam3video.zip). - Workflow block tests updated to use sam2video / sam3video ids. - All 58 non-integration tests pass locally. https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN

Removes the integration tests and fixture I'd added for the existing SAM2ForStream (sam2_rt) — keeping them would require uploading a segment-anything-2-rt.zip to the test assets bucket, but the goal for this PR is to leave that untested path alone. - Deleted tests/integration_tests/models/test_sam2_rt_predictions.py - Removed SAM2_RT_PACKAGE_URL + sam2_rt_package fixture from conftest.py - Fixed two docstring references that still said SAM2ForStream when they now point at the new SAM2Video. The SAM2ForStream registry entry itself stays — it's the legacy model that existed before this branch and we're not touching it. https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN

Captures the state of this branch while session context is fresh — what's where, how to test, what needs uploading, known gotchas, and a sketch of the follow-up "add a model" Claude skill. Doc is temporary and should be deleted before the PR merges. https://claude.ai/code/session_01T3k4sfbkaV3warwV7MHRZN

- Default workflow block to sam2video/small, advertise all four Hiera backbones (tiny / small / base-plus / large) via examples and get_supported_model_variants. - Fix Sam2VideoProcessor.add_inputs_to_inference_session call: the processor expects input_boxes with 3 nesting levels ([image, boxes, coords]); we were passing 4, which raised ValueError on the first real-weights prompt. Unit tests missed it because they mock the processor — surfaced by end-to-end verify against the uploaded sam2video-small.zip. - Point the inference_models integration fixture URL at sam2video-small.zip (the variant that matches the new default). - Update sam2video workflow-block unit tests to pass the new sam2video/small default through the mocks.

Descope the SAM3 streaming-video work to a follow-up so this PR can ship SAM2 video alone. SAM3's HF port requires the gated facebook/sam3 checkpoints, which aren't available yet. Removed: - inference_models SAM3 model class, registry entry, unit + integration tests, and test fixture - sam3_video workflow block + loader registration + unit tests - SAM_VIDEO_HANDOFF.md (served its purpose during the branch work) Left in place: - HFStreamingVideoBase in inference_models/models/common — reusable, SAM2 uses it today and a future SAM3 port can inherit unchanged - _streaming_video_common workflow helpers — still used by the SAM2 video block - SAM2 video class, registry entry, workflow block, and all SAM2 tests Comments that referenced SAM3Video / test_sam3_video.py have been generalised or trimmed so nothing dangles.

hansent added 6 commits April 20, 2026 11:50

hansent force-pushed the claude/sam-video-tracking-inference-models branch from cb663b6 to d7e6945 Compare April 20, 2026 16:50

roboflow deleted a comment from CLAassistant Apr 20, 2026

hansent mentioned this pull request Apr 20, 2026

Add add-inference-model Claude skill #2247

Open

2 tasks

hansent changed the title ~~Claude/sam video tracking inference models~~ Add SAM2 streaming video tracker (inference_models + workflow block) Apr 20, 2026

Merge branch 'main' into claude/sam-video-tracking-inference-models

c1db562

hansent marked this pull request as ready for review April 20, 2026 20:01

hansent requested review from PawelPeczek-Roboflow, dkosowski87, grzegorz-roboflow, probicheaux, rafel-roboflow and yeldarby as code owners April 20, 2026 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SAM2 streaming video tracker (inference_models + workflow block)#2245

Add SAM2 streaming video tracker (inference_models + workflow block)#2245
hansent wants to merge 8 commits intomainfrom
claude/sam-video-tracking-inference-models

hansent commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hansent commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Out of scope for this PR

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hansent commented Apr 20, 2026 •

edited

Loading