# Mask Generation

Mask generation is the task of generating semantically meaningful masks for an image. This task is very similar to image segmentation, but there are differences between the two tasks. The most notable difference is while image segmentation models are trained on labeled datasets and are limited to the classes they have seen during training (they return a set of masks and corresponding classes, given an image), mask generation models are trained on large amounts of data and can be used to infer previously unseen masked segments of an image.

Mask generation models operate in two modes.
- Prompting mode: In this mode, the model takes in an image and a prompt, where a prompt can be a 2D point location (XY coordinates) in the image within an object or a bounding box surrounding an object. In prompting mode, the model only returns the mask over the object that the prompt is pointing out.
- Segment Everything mode: In segment everything, given an image, the model generates every mask in the image. To do so, a grid of points is generated and overlaid on the image for inference.

This mask generation task is supported by Segment Anything Model (SAM:https://github.com/facebookresearch/sam2). It’s a powerful model that consists of a Vision Transformer-based image encoder, a prompt encoder, and a two-way transformer mask decoder. Images and prompts are encoded, and the decoder takes these embeddings and generates valid masks. SAM serves as a powerful foundation model for segmentation as it has large data coverage. It is trained on SA-1B, a dataset with 1 million images and 1.1 billion masks.

This guide teaches how to:
1. Infer in Segment Everything mode with batching,
2. Infer in Point Prompting mode,
3. Infer in Box Prompting mode.

# Libraries

In [None]:
pip install -q transformers

In [None]:
import requests
from PIL import Image
from transformers import pipeline


# Mask Generation using pipeline()

In [None]:
checkpoint = "facebook/sam2-hiera-tiny"
mask_generator = pipeline(model=checkpoint, task="mask-generation")

In [None]:
img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")