Cell 1: Install necessary libraries

In [1]:
# Install the required libraries if running the code on your local machine.
!pip install transformers      # Hugging Face's Transformers library
!pip install gradio            # Gradio for creating interactive interfaces
!pip install timm              # PyTorch image models
!pip install torchvision       # PyTorch's vision library for image processing




Cell 2: Suppress warning messages

In [None]:
# Suppress warning messages from Hugging Face Transformers to keep the output clean.
from transformers.utils import logging
logging.set_verbosity_error()


Cell 3: Mask Generation with SAM

In [None]:
# Import the Hugging Face pipeline for mask generation using the SAM model.
from transformers import pipeline

# Load the pre-trained SAM model from the specified path. The model is used to generate masks for image segmentation.
sam_pipe = pipeline("mask-generation", model="./models/Zigeng/SlimSAM-uniform-77")


Cell 4: Load and preprocess the input image

In [None]:
# Load the input image using PIL and resize it for the segmentation pipeline.
from PIL import Image
raw_image = Image.open('meta_llamas.jpg')  # Open an image file
raw_image = raw_image.resize((720, 375))   # Resize the image to a fixed size for processing


Cell 5: Run the segmentation pipeline


In [None]:
# Run the SAM pipeline on the raw image with specified batch size. The points_per_batch argument controls the efficiency of the model inference.
# A higher value of 'points_per_batch' will make the inference faster.
output = sam_pipe(raw_image, points_per_batch=32)


Cell 6: Visualize the generated masks

In [None]:
# Visualize the segmentation masks generated by the SAM pipeline. 
# The 'show_pipe_masks_on_image' function displays the masks overlaid on the input image.
from helper import show_pipe_masks_on_image
show_pipe_masks_on_image(raw_image, output)  # Display masks on the original image


Cell 7: Faster inference for a single point

In [None]:
# Import the SAM model and processor for faster inference.
from transformers import SamModel, SamProcessor

# Load the SAM model and processor for single-point segmentation.
model = SamModel.from_pretrained("./models/Zigeng/SlimSAM-uniform-77")
processor = SamProcessor.from_pretrained("./models/Zigeng/SlimSAM-uniform-77")

# Resize the raw image as before for consistency in processing.
raw_image = raw_image.resize((720, 375))


Cell 8: Define a single point for segmentation


In [None]:
# Define a 2D point in the image where you want to focus segmentation. 
# For example, this point might correspond to a region of interest (e.g., a blue shirt).
input_points = [[[1600, 700]]]  # Example 2D point in the image that corresponds to the region of interest.


Cell 9: Prepare inputs for the model


In [None]:
# Prepare the inputs for the model using the image and the single point. The processor creates the necessary tensors.
# The return_tensors="pt" argument ensures that the inputs are returned as PyTorch tensors.
inputs = processor(raw_image, input_points=input_points, return_tensors="pt")


Cell 10: Run the model and get the predicted mask


In [None]:
# Use the model to generate segmentation outputs for the given inputs. The model is run in no_grad() mode to avoid gradient calculations.
import torch
with torch.no_grad():
    outputs = model(**inputs)


Cell 11: Post-process the predicted masks

In [None]:
# Use the processor to post-process the predicted masks. The post_process_masks function ensures that the masks are converted to a usable format.
# The inputs["original_sizes"] and inputs["reshaped_input_sizes"] are used to adjust the mask size.
predicted_masks = processor.image_processor.post_process_masks(
    outputs.pred_masks,
    inputs["original_sizes"],
    inputs["reshaped_input_sizes"]
)


Cell 12: Inspect the number and size of predicted masks

In [None]:
# The length of predicted_masks corresponds to the number of images used in the input.
print(len(predicted_masks))  # Prints the number of predicted masks

# Inspect the shape of the first predicted mask.
predicted_mask = predicted_masks[0]
print(predicted_mask.shape)  # Print the shape of the first predicted mask (Height x Width)


Cell 13: Display the Intersection over Union (IoU) scores

In [None]:
# IoU (Intersection over Union) scores help evaluate the accuracy of the predicted masks. 
# These scores measure the overlap between the predicted mask and the ground truth mask.
print(outputs.iou_scores)  # Display IoU scores for each predicted mask


Cell 14: Visualize the predicted mask on the image

In [None]:
# Use a helper function to visualize the predicted mask overlaid on the raw image.
# In this case, we are visualizing the first three predicted masks.
from helper import show_mask_on_image

# Loop through the first three predicted masks and overlay them on the raw image.
for i in range(3):
    show_mask_on_image(raw_image, predicted_mask[:, i])  # Display each mask on the image


Explanation:
Pipeline Initialization: We initialize the SAM model pipeline for mask generation, specifying a pre-trained model from Hugging Face's SlimSAM-uniform-77.

Image Processing: The input image is loaded using PIL, resized to a fixed resolution, and passed to the SAM model for segmentation.

Mask Generation: The SAM pipeline generates segmentation masks, and these masks are displayed using a helper function.

Faster Inference: For faster inference, we focus on a specific region of the image using a single 2D point. The processor converts the image and point into model inputs, which are then passed to the SAM model to produce a more focused mask.

Post-processing: The predicted masks are post-processed to adjust their size to match the input image's dimensions.

IoU Scores: The Intersection over Union (IoU) scores are displayed to evaluate the segmentation accuracy.

Mask Visualization: Finally, we visualize the predicted masks overlaid on the input image using a helper function.

