Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,8 @@
title: Image Feature Extraction
- local: tasks/mask_generation
title: Mask Generation
- local: tasks/promptable_visual_segmentation
title: Promptable Visual Segmentation
- local: tasks/keypoint_detection
title: Keypoint detection
- local: tasks/knowledge_distillation_for_image_classification
Expand Down
6 changes: 6 additions & 0 deletions docs/source/en/main_classes/pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,12 @@ Pipelines available for multimodal tasks include the following.
- __call__
- all

### PromptableVisualSegmentationPipeline

[[autodoc]] PromptableVisualSegmentationPipeline
- __call__
- all

### VisualQuestionAnsweringPipeline

[[autodoc]] VisualQuestionAnsweringPipeline
Expand Down
4 changes: 4 additions & 0 deletions docs/source/en/model_doc/auto.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,10 @@ The following auto classes are available for the following natural language proc

[[autodoc]] AutoModelForMaskGeneration

### AutoModelForPromptableVisualSegmentation

[[autodoc]] AutoModelForPromptableVisualSegmentation

### AutoModelForSeq2SeqLM

[[autodoc]] AutoModelForSeq2SeqLM
Expand Down
50 changes: 44 additions & 6 deletions docs/source/en/model_doc/edgetam.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,52 @@ The original code can be found [here](https://github.com/facebookresearch/EdgeTA

## Usage example

### Automatic Mask Generation with Pipeline
### Promptable Visual Segmentation Pipeline

The easiest way to use EdgeTAM is through the `promptable-visual-segmentation` pipeline:

```python
>>> from transformers import pipeline

>>> segmenter = pipeline(model="yonigozlan/EdgeTAM-hf", task="promptable-visual-segmentation")
>>> # Single point prompt
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000077595.jpg",
... input_points=[[[[450, 600]]]],
... input_labels=[[[1]]],
... )
[[{'score': 0.87, 'mask': tensor([...])}]]

>>> # Box prompt
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000136466.jpg",
... input_boxes=[[[59, 144, 76, 163]]],
... )
[[{'score': 0.92, 'mask': tensor([...])}]]

>>> # Multiple points for refinement
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000136466.jpg",
... input_points=[[[[450, 600], [500, 620]]]],
... input_labels=[[[1, 0]]], # 1=positive, 0=negative
... )
[[{'score': 0.85, 'mask': tensor([...])}]]
```

<Tip>

**Note:** The pipeline output format differs from using the model and processor manually. The pipeline returns a standardized format (list of lists of dicts with `score` and `mask`) to ensure consistency across all transformers pipelines, while the processor's `post_process_masks()` returns raw tensors.

</Tip>

### Automatic Mask Generation Pipeline

EdgeTAM can be used for automatic mask generation to segment all objects in an image using the `mask-generation` pipeline:

```python
>>> from transformers import pipeline

>>> generator = pipeline("mask-generation", model="yonigozlan/edgetam-1", device=0)
>>> generator = pipeline("mask-generation", model="yonigozlan/EdgeTAM-hf", device=0)
>>> image_url = "https://huggingface.co/datasets/hf-internal-testing/sam2-fixtures/resolve/main/truck.jpg"
>>> outputs = generator(image_url, points_per_batch=64)

Expand All @@ -69,8 +107,8 @@ from accelerate import Accelerator

>>> device = Accelerator().device

>>> model = EdgeTamModel.from_pretrained("yonigozlan/edgetam-1").to(device)
>>> processor = Sam2Processor.from_pretrained("yonigozlan/edgetam-1")
>>> model = EdgeTamModel.from_pretrained("yonigozlan/EdgeTAM-hf").to(device)
>>> processor = Sam2Processor.from_pretrained("yonigozlan/EdgeTAM-hf")

>>> image_url = "https://huggingface.co/datasets/hf-internal-testing/sam2-fixtures/resolve/main/truck.jpg"
>>> raw_image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")
Expand Down Expand Up @@ -166,8 +204,8 @@ from accelerate import Accelerator

>>> device = Accelerator().device

>>> model = EdgeTamModel.from_pretrained("yonigozlan/edgetam-1").to(device)
>>> processor = Sam2Processor.from_pretrained("yonigozlan/edgetam-1")
>>> model = EdgeTamModel.from_pretrained("yonigozlan/EdgeTAM-hf").to(device)
>>> processor = Sam2Processor.from_pretrained("yonigozlan/EdgeTAM-hf")

>>> # Load multiple images
>>> image_urls = [
Expand Down
42 changes: 42 additions & 0 deletions docs/source/en/model_doc/sam.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,48 @@ Tips:
This model was contributed by [ybelkada](https://huggingface.co/ybelkada) and [ArthurZ](https://huggingface.co/ArthurZ).
The original code can be found [here](https://github.com/facebookresearch/segment-anything).

## Usage examples with 🤗 Transformers

### Promptable Visual Segmentation Pipeline

The easiest way to use SAM is through the `promptable-visual-segmentation` pipeline:

```python
>>> from transformers import pipeline

>>> segmenter = pipeline(model="facebook/sam-vit-base", task="promptable-visual-segmentation")
>>> # Single point prompt
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000077595.jpg",
... input_points=[[[[450, 600]]]],
... input_labels=[[[1]]],
... )
[[{'score': 0.87, 'mask': tensor([...])}]]

>>> # Box prompt
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000136466.jpg",
... input_boxes=[[[59, 144, 76, 163]]],
... )
[[{'score': 0.92, 'mask': tensor([...])}]]

>>> # Multiple points for refinement
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000136466.jpg",
... input_points=[[[[450, 600], [500, 620]]]],
... input_labels=[[[1, 0]]], # 1=positive, 0=negative
... )
[[{'score': 0.85, 'mask': tensor([...])}]]
```

<Tip>

**Note:** The pipeline output format differs from using the model and processor manually. The pipeline returns a standardized format (list of lists of dicts with `score` and `mask`) to ensure consistency across all transformers pipelines, while the processor's `post_process_masks()` returns raw tensors.

</Tip>

### Basic Usage with Model and Processor

Below is an example on how to run mask generation given an image and a 2D point:

```python
Expand Down
40 changes: 39 additions & 1 deletion docs/source/en/model_doc/sam2.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,45 @@ The original code can be found [here](https://github.com/facebookresearch/sam2/t

## Usage example

### Automatic Mask Generation with Pipeline
### Promptable Visual Segmentation Pipeline

The easiest way to use SAM2 is through the `promptable-visual-segmentation` pipeline:

```python
>>> from transformers import pipeline

>>> segmenter = pipeline(model="facebook/sam2.1-hiera-large", task="promptable-visual-segmentation")
>>> # Single point prompt
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000077595.jpg",
... input_points=[[[[450, 600]]]],
... input_labels=[[[1]]],
... )
[[{'score': 0.87, 'mask': tensor([...])}]]

>>> # Box prompt
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000136466.jpg",
... input_boxes=[[[59, 144, 76, 163]]],
... )
[[{'score': 0.92, 'mask': tensor([...])}]]

>>> # Multiple points for refinement
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000136466.jpg",
... input_points=[[[[450, 600], [500, 620]]]],
... input_labels=[[[1, 0]]], # 1=positive, 0=negative
... )
[[{'score': 0.85, 'mask': tensor([...])}]]
```

<Tip>

**Note:** The pipeline output format differs from using the model and processor manually. The pipeline returns a standardized format (list of lists of dicts with `score` and `mask`) to ensure consistency across all transformers pipelines, while the processor's `post_process_masks()` returns raw tensors.

</Tip>

### Automatic Mask Generation Pipeline

SAM2 can be used for automatic mask generation to segment all objects in an image using the `mask-generation` pipeline:

Expand Down
40 changes: 39 additions & 1 deletion docs/source/en/model_doc/sam3_tracker.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,45 @@ This model was contributed by [yonigozlan](https://huggingface.co/yonigozlan) an

## Usage example

### Automatic Mask Generation with Pipeline
### Promptable Visual Segmentation Pipeline

The easiest way to use Sam3Tracker is through the `promptable-visual-segmentation` pipeline:

```python
>>> from transformers import pipeline

>>> segmenter = pipeline(model="facebook/sam3", task="promptable-visual-segmentation")
>>> # Single point prompt
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000077595.jpg",
... input_points=[[[[450, 600]]]],
... input_labels=[[[1]]],
... )
[[{'score': 0.87, 'mask': tensor([...])}]]

>>> # Box prompt
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000136466.jpg",
... input_boxes=[[[59, 144, 76, 163]]],
... )
[[{'score': 0.92, 'mask': tensor([...])}]]

>>> # Multiple points for refinement
>>> segmenter(
... "http://images.cocodataset.org/val2017/000000136466.jpg",
... input_points=[[[[450, 600], [500, 620]]]],
... input_labels=[[[1, 0]]], # 1=positive, 0=negative
... )
[[{'score': 0.85, 'mask': tensor([...])}]]
```

<Tip>

**Note:** The pipeline output format differs from using the model and processor manually. The pipeline returns a standardized format (list of lists of dicts with `score` and `mask`) to ensure consistency across all transformers pipelines, while the processor's `post_process_masks()` returns raw tensors.

</Tip>

### Automatic Mask Generation Pipeline

Sam3Tracker can be used for automatic mask generation to segment all objects in an image using the `mask-generation` pipeline:

Expand Down
Loading