# Video Segmentation with SAM 3

[![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/opengeos/segment-geospatial/blob/main/docs/examples/sam3_video_segmentation.ipynb)

This notebook demonstrates how to use SAM 3 for video segmentation and tracking. SAM 3 provides:

- **Text prompts**: Segment objects using natural language (e.g., "person", "car")
- **Point prompts**: Add clicks to segment and refine objects
- **Object tracking**: Track segmented objects across all video frames
- **Time series support**: Process GeoTIFF time series with georeferencing


## Installation

SAM 3 requires CUDA-capable GPU. Install with:


In [None]:
# %pip install "segment-geospatial[samgeo3]"
# import sys, samgeo

# print("Python executable:", sys.executable)
# print("samgeo file:", getattr(samgeo, "__file__", None))
# print("samgeo version:", getattr(samgeo, "__version__", None))
# print("Has SamGeo3Video?", "SamGeo3Video" in dir(samgeo))

# import sys, importlib
# sys.path.insert(0, "/segment-geospatial")  # make sure your repo wins

# import samgeo
# importlib.reload(samgeo)

# print("samgeo file:", getattr(samgeo, "__file__", None))
# print("Names with 'SamGeo3':", [n for n in dir(samgeo) if "SamGeo3" in n])



## Import Libraries


In [1]:
import os
import sys, importlib
sys.path.insert(0, "/segment-geospatial")  # make sure your repo wins
from samgeo import SamGeo3Video, download_file


To use SamGeo 2, install it as:
	pip install segment-geospatial[samgeo2]
>>> LOADED samgeo FROM /segment-geospatial/samgeo/__init__.py


## Initialize Video Predictor

The `SamGeo3Video` class provides a simplified API for video segmentation. It automatically uses all available GPUs.


In [2]:
sam = SamGeo3Video()

[0m[32mINFO 2025-12-09 15:24:31,879 1146200 sam3_video_predictor.py: 299:[0m using the following GPU IDs: [0, 1]


Using GPUs: [0, 1]


[0m[32mINFO 2025-12-09 15:24:32,082 1146200 sam3_video_predictor.py: 315:[0m 


	*** START loading model on all ranks ***


[0m[32mINFO 2025-12-09 15:24:32,082 1146200 sam3_video_predictor.py: 317:[0m loading model on rank=0 with world_size=2 -- this could take a while ...
[0m[32mINFO 2025-12-09 15:24:32,082 1146200 sam3_video_predictor.py: 317:[0m loading model on rank=0 with world_size=2 -- this could take a while ...
[0m[32mINFO 2025-12-09 15:24:35,963 1146200 sam3_video_base.py: 124:[0m setting max_num_objects=10000 and num_obj_for_compile=16
[0m[32mINFO 2025-12-09 15:24:37,827 1146200 sam3_video_predictor.py: 319:[0m loading model on rank=0 with world_size=2 -- DONE locally
[0m[32mINFO 2025-12-09 15:24:37,827 1146200 sam3_video_predictor.py: 376:[0m spawning 1 worker processes
[0m[32mINFO 2025-12-09 15:24:39,472 1146667 sam3_video_predictor.py: 460:[0m starting worker process rank=1 with world_size=2
[0m[32mINFO 2025-12-09 15:24:39,575 1146667 sam3_video_pre

## Load a Video

You can load from different sources:
- MP4 video file
- Directory of JPEG frames
- Directory of GeoTIFFs (for remote sensing time series)


In [None]:
url = "https://github.com/opengeos/datasets/releases/download/videos/cars.mp4"
video_path = download_file(url)

In [3]:
# sam.set_video(video_path)
sam.set_video(os.path.abspath("/data/sam3/sources/IMG_4346.MOV"), frame_rate=3, image_output_dir="./output/image", image_ext=".png")
# sam.set_video("./output/images")  # test setting from image directory

Extracting frames to: /segment-geospatial/tests/output/image
Video FPS: 30
Total Frames: 3403
Saving every 10 frame(s)


frame loading (image folder) [rank=1]:   0%|          | 0/341 [00:00<?, ?it/s]

Finished saving 341 images to /segment-geospatial/tests/output/image


frame loading (image folder) [rank=1]: 100%|██████████| 341/341 [00:30<00:00, 11.14it/s]
frame loading (image folder) [rank=0]: 100%|██████████| 341/341 [00:30<00:00, 11.12it/s]


Loaded 341 frames. Session started.


In [None]:
sam.show_video(video_path)

## Text-Prompted Segmentation

Use natural language to describe objects. SAM 3 finds all instances and tracks them.


In [4]:
# Segment all car in the video
sam.generate_masks("circular connector")

Session reset.
[Gloo] Rank 1 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
[Gloo] Rank 0 is connected to 1 peer ranks. Expected number of connected peer ranks is : 1
Found 1 object(s) matching 'circular connector' on frame 0.


propagate_in_video:   0%|          | 0/341 [00:00<?, ?it/s]

propagate_in_video: 0it [00:00, ?it/s]

Propagated masks to 341 frames.


## Visualize Results


In [None]:
# Show the first frame with masks
sam.show_frame(0, axis="on")

In [None]:
# Show multiple frames in a grid
sam.show_frames(frame_stride=50, ncols=3)

## Remove Objects

Remove specific objects by ID and re-propagate.


In [None]:
# Remove object 2 and re-propagate
sam.remove_object(2)
sam.propagate()
sam.show_frame(0)

## Point Prompts

Add objects back or refine segmentation using point prompts.


In [None]:
# Add back object 2 with a positive point click
sam.add_point_prompts(
    points=[[335, 203]],  # [x, y] coordinates
    labels=[1],  # 1=positive, 0=negative
    obj_id=2,
    frame_idx=0,
)
sam.propagate()
sam.show_frame(0)

## Refine with Multiple Points

Use positive and negative points to refine the mask.


In [None]:
# Refine to segment only the shirt (not pants)
sam.add_point_prompts(
    points=[[335, 195], [335, 220]],  # detect windshield, not the car
    labels=[1, 0],  # positive, negative
    obj_id=2,
    frame_idx=0,
)
sam.propagate()
sam.show_frames(frame_stride=20, ncols=3)

## Save Results

Save masks as images or create an output video.


In [5]:
os.makedirs("output", exist_ok=True)

# Save mask images
sam.save_masks("output/mask",binary=True,prefix="")

Saving masks... Prefix: 


Saving masks: 100%|██████████| 341/341 [00:11<00:00, 30.43it/s]

Saved 341 mask files to output/mask





In [None]:
# Save video with blended masks
sam.save_video("output/segmented.mp4", fps=25)

## Close Session

Close the session to free GPU resources.


In [6]:
sam.close()

[0m[32mINFO 2025-12-09 15:34:03,490 1146667 sam3_video_predictor.py: 250:[0m removed session 0649bb01-d3a7-4395-a753-ce6d74010e9e; live sessions: [], GPU memory: 5094 MiB used and 8568 MiB reserved (max over time: 8108 MiB used and 8568 MiB reserved)
[0m[32mINFO 2025-12-09 15:34:03,544 1146200 sam3_video_predictor.py: 250:[0m removed session 0649bb01-d3a7-4395-a753-ce6d74010e9e; live sessions: [], GPU memory: 5118 MiB used and 12478 MiB reserved (max over time: 11461 MiB used and 12478 MiB reserved)


Session closed.


To completely shutdown and free all resources:

In [7]:
sam.shutdown()

[0m[32mINFO 2025-12-09 15:34:05,799 1146200 sam3_video_predictor.py: 512:[0m shutting down 1 worker processes
[0m[32mINFO 2025-12-09 15:34:05,800 1146667 sam3_video_predictor.py: 484:[0m worker rank=1 shutting down
[0m[32mINFO 2025-12-09 15:34:06,076 1146200 sam3_video_predictor.py: 518:[0m shut down 1 worker processes


Predictor shutdown complete.
