[![Roboflow Notebooks](https://media.roboflow.com/notebooks/template/bannertest2-2.png?ik-sdk-version=javascript-1.4.3&updatedAt=1672932710194)](https://github.com/roboflow/notebooks)

# Segment Images with Segment Anything 2 (SAM2)

---

[![GitHub](https://badges.aleen42.com/src/github.svg)](https://github.com/facebookresearch/segment-anything-2)

Segment Anything Model 2 (SAM 2) is a foundation model designed to address promptable visual segmentation in both images and videos. The model extends its functionality to video by treating images as single-frame videos. Its design, a simple transformer architecture with streaming memory, enables real-time video processing. A model-in-the-loop data engine, which enhances the model and data through user interaction, was built to collect the SA-V dataset, the largest video segmentation dataset to date. SAM 2, trained on this extensive dataset, delivers robust performance across diverse tasks and visual domains.

![segment anything model](https://media.roboflow.com/notebooks/examples/segment-anything-model-2-paper.jpg)

This notebook is an extension of the official [notebook](https://github.com/facebookresearch/segment-anything-2/blob/main/notebooks/image_predictor_example.ipynb) prepared by Meta AI.

## Complementary materials

---

[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-segment-images-with-sam-2.ipynb)
[![Roboflow](https://raw.githubusercontent.com/roboflow-ai/notebooks/main/assets/badges/roboflow-blogpost.svg)](https://blog.roboflow.com/what-is-segment-anything-2)

We recommend that you follow along in this notebook while reading the blog post on Segment Anything Model 2 (SAM2).

[![SAM2 blogpost](https://media.roboflow.com/notebooks/examples/blog-what-is-sam-2.png)](https://blog.roboflow.com/what-is-segment-anything-2)

## Setup

### Before you start

Let's make sure that we have access to GPU. We can use `nvidia-smi` command to do that. In case of any problems navigate to `Edit` -> `Notebook settings` -> `Hardware accelerator`, set it to `GPU`, and then click `Save`.

In [71]:
!nvidia-smi

/bin/bash: line 1: nvidia-smi: command not found


**NOTE:** To make it easier for us to manage datasets, images and models we create a `HOME` constant.

In [72]:
import os
HOME = os.getcwd()
print("HOME:", HOME)

HOME: /content/segment-anything-2


### Install SAM2 and dependencies

In [73]:
!git clone https://github.com/facebookresearch/segment-anything-2.git
%cd {HOME}/segment-anything-2
!pip install -e . -q

Cloning into 'segment-anything-2'...
remote: Enumerating objects: 1070, done.[K
remote: Total 1070 (delta 0), reused 0 (delta 0), pack-reused 1070 (from 1)[K
Receiving objects: 100% (1070/1070), 128.11 MiB | 26.46 MiB/s, done.
Resolving deltas: 100% (381/381), done.
/content/segment-anything-2/segment-anything-2
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
  Building editable for SAM-2 (pyproject.toml) ... [?25l[?25hdone


In [74]:
!pip install -q supervision jupyter_bbox_widget

### Download SAM2 checkpoints

**NOTE:** SAM2 is available in 4 different model sizes ranging from the lightweight "sam2_hiera_tiny" (38.9M parameters) to the more powerful "sam2_hiera_large" (224.4M parameters).

In [75]:
!mkdir -p {HOME}/checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt -P {HOME}/checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt -P {HOME}/checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt -P {HOME}/checkpoints
!wget -q https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt -P {HOME}/checkpoints

### Download example data

**NONE:** Let's download few example images. Feel free to use your images or videos.

In [76]:
!mkdir -p {HOME}/data
!wget -q https://media.roboflow.com/notebooks/examples/dog.jpeg -P {HOME}/data
!wget -q https://media.roboflow.com/notebooks/examples/dog-2.jpeg -P {HOME}/data
!wget -q https://media.roboflow.com/notebooks/examples/dog-3.jpeg -P {HOME}/data
!wget -q https://media.roboflow.com/notebooks/examples/dog-4.jpeg -P {HOME}/data

### Imports

In [77]:
import cv2
import torch
import base64

import numpy as np
import supervision as sv

from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator

**NOTE:** This code enables mixed-precision computing for faster deep learning. It uses bfloat16 for most calculations and, on newer NVIDIA GPUs, leverages TensorFloat-32 (TF32) for certain operations to further boost performance.

## Load model





In [78]:
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
CHECKPOINT = f"{HOME}/checkpoints/sam2_hiera_large.pt"
CONFIG = "sam2_hiera_l.yaml"

sam2_model = build_sam2(CONFIG, CHECKPOINT, device=DEVICE, apply_postprocessing=False)

In [79]:
mask_generator = SAM2AutomaticMaskGenerator(sam2_model)

**NOTE:** OpenCV loads images in BGR format by default, so we convert to RGB for compatibility with the mask generator.

## Prompting with boxes

In [80]:
predictor = SAM2ImagePredictor(sam2_model)

In [84]:
IMAGE_PATH = f"{HOME}/MR/MR.jpg"

image_bgr = cv2.imread(IMAGE_PATH)
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

error: OpenCV(4.11.0) /io/opencv/modules/imgproc/src/color.cpp:199: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'


切换下一张图片。

In [None]:
import os
import cv2

folder_path = f"{HOME}/data"

current_image_name = os.path.basename(IMAGE_PATH)

# 获取排序后的图像列表
image_list = sorted([f for f in os.listdir(folder_path) if f.lower().endswith(('.jpg', '.jpeg', '.png'))])

# 获取当前图像索引并找下一个图像
current_index = image_list.index(current_image_name)
if current_index + 1 < len(image_list):
    next_image_name = image_list[current_index + 1]
    IMAGE_PATH = os.path.join(folder_path, next_image_name)
else:
    print("No more images.")
    IMAGE_PATH = None

# 读取并处理图像
if IMAGE_PATH:
    image_bgr = cv2.imread(IMAGE_PATH)
    image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
print(IMAGE_PATH)

### Interactive box prompt

In [None]:
def encode_image(filepath):
    with open(filepath, 'rb') as f:
        image_bytes = f.read()
    encoded = str(base64.b64encode(image_bytes), 'utf-8')
    return "data:image/jpg;base64,"+encoded

**NOTE:** Execute cell below and use your mouse to **draw bounding box** on the image 👇

In [None]:
IS_COLAB = True

if IS_COLAB:
    from google.colab import output
    output.enable_custom_widget_manager()

from jupyter_bbox_widget import BBoxWidget

widget = BBoxWidget()
widget.image = encode_image(IMAGE_PATH)
widget

In [None]:
widget.bboxes

**NOTE:** `Sam2ImagePredictor.predict` method takes `np.ndarray` `box` argument in `[x_min, y_min, x_max, y_max]` format.

In [None]:
default_box = [
    {'x': 166, 'y': 835, 'width': 99, 'height': 175, 'label': ''},
    {'x': 472, 'y': 885, 'width': 168, 'height': 249, 'label': ''},
    {'x': 359, 'y': 727, 'width': 27, 'height': 155, 'label': ''},
    {'x': 164, 'y': 1044, 'width': 279, 'height': 163, 'label': ''}
]

boxes = widget.bboxes if widget.bboxes else default_box
boxes = np.array([
    [
        box['x'],
        box['y'],
        box['x'] + box['width'],
        box['y'] + box['height']
    ] for box in boxes
])

In [None]:
predictor.set_image(image_rgb)

masks, scores, logits = predictor.predict(
    box=boxes,
    multimask_output=False
)

# With one box as input, predictor returns masks of shape (1, H, W);
# with N boxes, it returns (N, 1, H, W).
if boxes.shape[0] != 1:
    masks = np.squeeze(masks)


癌肿

In [None]:
import os
import numpy as np
from PIL import Image

# 获取原图尺寸
height, width = image_rgb.shape[:2]

# 统一处理 mask 格式
if boxes.shape[0] == 1:
    masks = masks[np.newaxis, :, :] if masks.ndim == 2 else masks
else:
    masks = np.squeeze(masks)  # (N, H, W)

# 构造保存路径
image_name = os.path.splitext(os.path.basename(IMAGE_PATH))[0]
save_path = f"{HOME}/{image_name}_label.png"

# 如果标签图已存在，加载它；否则创建新图
if os.path.exists(save_path):
    label_map = np.array(Image.open(save_path))
    print(f"载入已有标签图: {save_path}")
else:
    label_map = np.zeros((height, width), dtype=np.uint8)
    print(f"新建标签图: {save_path}")

# 将 mask 区域赋值为 1
for mask in masks:
    label_map[mask.astype(bool)] = 1

# 保存结果
Image.fromarray(label_map).save(save_path)
print(f"标签图已保存到: {save_path}")


淋巴结

In [None]:
import os
import numpy as np
from PIL import Image

# 获取原图尺寸
height, width = image_rgb.shape[:2]

# 统一处理 mask 格式
if boxes.shape[0] == 1:
    masks = masks[np.newaxis, :, :] if masks.ndim == 2 else masks
else:
    masks = np.squeeze(masks)  # (N, H, W)

# 构造保存路径
image_name = os.path.splitext(os.path.basename(IMAGE_PATH))[0]
save_path = f"{HOME}/{image_name}_label.png"

# 如果标签图已存在，加载它；否则创建新图
if os.path.exists(save_path):
    label_map = np.array(Image.open(save_path))
    print(f"载入已有标签图: {save_path}")
else:
    label_map = np.zeros((height, width), dtype=np.uint8)
    print(f"新建标签图: {save_path}")

# 将 mask 区域赋值为 2
for mask in masks:
    label_map[mask.astype(bool)] = 2

# 保存结果
Image.fromarray(label_map).save(save_path)
print(f"标签图已保存到: {save_path}")

### Results visualisation

In [None]:
box_annotator = sv.BoxAnnotator(color_lookup=sv.ColorLookup.INDEX)
mask_annotator = sv.MaskAnnotator(color_lookup=sv.ColorLookup.INDEX)

detections = sv.Detections(
    xyxy=sv.mask_to_xyxy(masks=masks),
    mask=masks.astype(bool)
)

source_image = box_annotator.annotate(scene=image_bgr.copy(), detections=detections)
segmented_image = mask_annotator.annotate(scene=image_bgr.copy(), detections=detections)

sv.plot_images_grid(
    images=[source_image, segmented_image],
    grid_size=(1, 2),
    titles=['source image', 'segmented image']
)

将可视化的图像保存

In [None]:
# 创建保存路径
output_dir = os.path.join(HOME, "output")
os.makedirs(output_dir, exist_ok=True)

# 获取原图像文件名（不带路径）
image_name = os.path.basename(IMAGE_PATH)

# 构造保存路径
output_path = os.path.join(output_dir, image_name)

# 保存分割结果图像
cv2.imwrite(output_path, segmented_image)
print(f"Segmented image saved to: {output_path}")


## Prompting with points

**NOTE:** Execute cell below and use your mouse to **draw points** on the image 👇

In [None]:
IS_COLAB = True

if IS_COLAB:
    from google.colab import output
    output.enable_custom_widget_manager()

from jupyter_bbox_widget import BBoxWidget

widget = BBoxWidget()
widget.image = encode_image(IMAGE_PATH)
widget

In [None]:
widget.bboxes

In [None]:
default_box = [
    {'x': 330, 'y': 450, 'width': 0, 'height': 0, 'label': ''},
    {'x': 191, 'y': 665, 'width': 0, 'height': 0, 'label': ''},
    {'x': 86, 'y': 879, 'width': 0, 'height': 0, 'label': ''},
    {'x': 425, 'y': 727, 'width': 0, 'height': 0, 'label': ''}
]

boxes = widget.bboxes if widget.bboxes else default_box
input_point = np.array([
    [
        box['x'],
        box['y']
    ] for box in boxes
])
input_label = np.ones(input_point.shape[0])

In [None]:
masks, scores, logits = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    multimask_output=True,
)

In [None]:
import cv2
import numpy as np
import supervision as sv
import matplotlib.pyplot as plt

# 假设你有 masks（掩膜）和 scores（得分）
# masks: 形状为 (num_masks, H, W)
# scores: 形状为 (num_masks, )

# 假设你有原图 image_bgr

alpha = 0.3  # 透明度（0=完全透明，1=完全不透明）

# 掩膜叠加函数：将单个掩膜用 alpha blending 的方式叠加到图像上
def overlay_mask_on_image(image, mask, color=(255, 0, 0), alpha=0.3):
    # 转为 float32 以便加权混合
    image = image.astype(np.float32)
    color_layer = np.full_like(image, color, dtype=np.float32)

    # mask 扩展为 3 通道
    mask_3d = np.stack([mask] * 3, axis=-1).astype(np.float32)

    # 混合公式
    blended = image * (1 - mask_3d * alpha) + color_layer * (mask_3d * alpha)

    return blended.astype(np.uint8)

# 可视化函数：对每个掩膜进行叠加并返回结果图像
def visualize_masks_on_image(image_bgr, masks, scores):
    num_masks = masks.shape[0]
    images_with_masks = []

    for i in range(num_masks):
        mask = masks[i].astype(np.uint8)
        blended = overlay_mask_on_image(image_bgr.copy(), mask, color=(255, 0, 0), alpha=alpha)
        images_with_masks.append(blended)

    return images_with_masks

# 获取叠加掩膜的图像
images_with_masks = visualize_masks_on_image(image_bgr, masks, scores)

# 显示图像网格
sv.plot_images_grid(
    images=images_with_masks,
    titles=[f"score: {score:.2f}" for score in scores],
    grid_size=(1, len(images_with_masks)),
    size=(12, 12)
)


In [None]:
sv.plot_images_grid(
    images=masks,
    titles=[f"score: {score:.2f}" for score in scores],
    grid_size=(1, 3),
    size=(12, 12)
)