Add Promptable Visual Segmentation pipeline #43613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

yonigozlan wants to merge 9 commits into huggingface:main from yonigozlan:add-pvs-pipeline

docs/source/en/_toctree.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -309,6 +309,8 @@ @@
             title: Image Feature Extraction
           - local: tasks/mask_generation
             title: Mask Generation
+          - local: tasks/promptable_visual_segmentation
+            title: Promptable Visual Segmentation
           - local: tasks/keypoint_detection
             title: Keypoint detection
           - local: tasks/knowledge_distillation_for_image_classification
@@ Expand Down @@

docs/source/en/main_classes/pipelines.md

-Original file line number
+Diff line change
@@ Expand Up @@
         - __call__
         - all
+    ### PromptableVisualSegmentationPipeline
+    [[autodoc]] PromptableVisualSegmentationPipeline
+        - __call__
+        - all
     ### VisualQuestionAnsweringPipeline
     [[autodoc]] VisualQuestionAnsweringPipeline
@@ Expand Down @@

docs/source/en/model_doc/auto.md

-Original file line number
+Diff line change
@@ Expand Up @@
     [[autodoc]] AutoModelForMaskGeneration
+    ### AutoModelForPromptableVisualSegmentation
+    [[autodoc]] AutoModelForPromptableVisualSegmentation
     ### AutoModelForSeq2SeqLM
     [[autodoc]] AutoModelForSeq2SeqLM
@@ Expand Down @@

docs/source/en/model_doc/edgetam.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -39,14 +39,52 @@ The original code can be found [here](https://github.com/facebookresearch/EdgeTA
  
    ## Usage example

    ### Automatic Mask Generation with Pipeline

    ### Promptable Visual Segmentation Pipeline

    The easiest way to use EdgeTAM is through the `promptable-visual-segmentation` pipeline:

    ```python

    >>> from transformers import pipeline

    >>> segmenter = pipeline(model="yonigozlan/EdgeTAM-hf", task="promptable-visual-segmentation")

    >>> # Single point prompt

    >>> segmenter(

    ...     "http://images.cocodataset.org/val2017/000000077595.jpg",

    ...     input_points=[[[[450, 600]]]],

    ...     input_labels=[[[1]]],

    ... )

    [[{'score': 0.87, 'mask': tensor([...])}]]

    >>> # Box prompt

    >>> segmenter(

    ...     "http://images.cocodataset.org/val2017/000000136466.jpg",

    ...     input_boxes=[[[59, 144, 76, 163]]],

    ... )

    [[{'score': 0.92, 'mask': tensor([...])}]]

    >>> # Multiple points for refinement

    >>> segmenter(

    ...     "http://images.cocodataset.org/val2017/000000136466.jpg",

    ...     input_points=[[[[450, 600], [500, 620]]]],

    ...     input_labels=[[[1, 0]]],  # 1=positive, 0=negative

    ... )

    [[{'score': 0.85, 'mask': tensor([...])}]]

    ```

    <Tip>

    **Note:** The pipeline output format differs from using the model and processor manually. The pipeline returns a standardized format (list of lists of dicts with `score` and `mask`) to ensure consistency across all transformers pipelines, while the processor's `post_process_masks()` returns raw tensors.

    </Tip>

    ### Automatic Mask Generation Pipeline

    EdgeTAM can be used for automatic mask generation to segment all objects in an image using the `mask-generation` pipeline:

    ```python

    >>> from transformers import pipeline

    >>> generator = pipeline("mask-generation", model="yonigozlan/edgetam-1", device=0)

    >>> generator = pipeline("mask-generation", model="yonigozlan/EdgeTAM-hf", device=0)

    >>> image_url = "https://huggingface.co/datasets/hf-internal-testing/sam2-fixtures/resolve/main/truck.jpg"

    >>> outputs = generator(image_url, points_per_batch=64)

    @@ -69,8 +107,8 @@ from accelerate import Accelerator
  
    >>> device = Accelerator().device

    >>> model = EdgeTamModel.from_pretrained("yonigozlan/edgetam-1").to(device)

    >>> processor = Sam2Processor.from_pretrained("yonigozlan/edgetam-1")

    >>> model = EdgeTamModel.from_pretrained("yonigozlan/EdgeTAM-hf").to(device)

    >>> processor = Sam2Processor.from_pretrained("yonigozlan/EdgeTAM-hf")

    >>> image_url = "https://huggingface.co/datasets/hf-internal-testing/sam2-fixtures/resolve/main/truck.jpg"

    >>> raw_image = Image.open(requests.get(image_url, stream=True).raw).convert("RGB")

    @@ -166,8 +204,8 @@ from accelerate import Accelerator
  
    >>> device = Accelerator().device

    >>> model = EdgeTamModel.from_pretrained("yonigozlan/edgetam-1").to(device)

    >>> processor = Sam2Processor.from_pretrained("yonigozlan/edgetam-1")

    >>> model = EdgeTamModel.from_pretrained("yonigozlan/EdgeTAM-hf").to(device)

    >>> processor = Sam2Processor.from_pretrained("yonigozlan/EdgeTAM-hf")

    >>> # Load multiple images

    >>> image_urls = [

docs/source/en/model_doc/sam.md

-Original file line number
+Diff line change
@@ Expand Up / @@ -44,6 +44,48 @@ Tips: @@
     This model was contributed by [ybelkada](https://huggingface.co/ybelkada) and [ArthurZ](https://huggingface.co/ArthurZ).
     The original code can be found [here](https://github.com/facebookresearch/segment-anything).
+    ## Usage examples with 🤗 Transformers
+    ### Promptable Visual Segmentation Pipeline
+    The easiest way to use SAM is through the `promptable-visual-segmentation` pipeline:
+    ```python
+    >>> from transformers import pipeline
+    >>> segmenter = pipeline(model="facebook/sam-vit-base", task="promptable-visual-segmentation")
+    >>> # Single point prompt
+    >>> segmenter(
+    ...     "http://images.cocodataset.org/val2017/000000077595.jpg",
+    ...     input_points=[[[[450, 600]]]],
+    ...     input_labels=[[[1]]],
+    ... )
+    [[{'score': 0.87, 'mask': tensor([...])}]]
+    >>> # Box prompt
+    >>> segmenter(
+    ...     "http://images.cocodataset.org/val2017/000000136466.jpg",
+    ...     input_boxes=[[[59, 144, 76, 163]]],
+    ... )
+    [[{'score': 0.92, 'mask': tensor([...])}]]
+    >>> # Multiple points for refinement
+    >>> segmenter(
+    ...     "http://images.cocodataset.org/val2017/000000136466.jpg",
+    ...     input_points=[[[[450, 600], [500, 620]]]],
+    ...     input_labels=[[[1, 0]]],  # 1=positive, 0=negative
+    ... )
+    [[{'score': 0.85, 'mask': tensor([...])}]]
+    ```
+    <Tip>
+    **Note:** The pipeline output format differs from using the model and processor manually. The pipeline returns a standardized format (list of lists of dicts with `score` and `mask`) to ensure consistency across all transformers pipelines, while the processor's `post_process_masks()` returns raw tensors.
+    </Tip>
+    ### Basic Usage with Model and Processor
     Below is an example on how to run mask generation given an image and a 2D point:
     ```python
@@ Expand Down @@

docs/source/en/model_doc/sam2.md

-Original file line number
+Diff line change
@@ Expand Up @@
     ## Usage example
-    ### Automatic Mask Generation with Pipeline
+    ### Promptable Visual Segmentation Pipeline
+    The easiest way to use SAM2 is through the `promptable-visual-segmentation` pipeline:
+    ```python
+    >>> from transformers import pipeline
+    >>> segmenter = pipeline(model="facebook/sam2.1-hiera-large", task="promptable-visual-segmentation")
+    >>> # Single point prompt
+    >>> segmenter(
+    ...     "http://images.cocodataset.org/val2017/000000077595.jpg",
+    ...     input_points=[[[[450, 600]]]],
+    ...     input_labels=[[[1]]],
+    ... )
+    [[{'score': 0.87, 'mask': tensor([...])}]]
+    >>> # Box prompt
+    >>> segmenter(
+    ...     "http://images.cocodataset.org/val2017/000000136466.jpg",
+    ...     input_boxes=[[[59, 144, 76, 163]]],
+    ... )
+    [[{'score': 0.92, 'mask': tensor([...])}]]
+    >>> # Multiple points for refinement
+    >>> segmenter(
+    ...     "http://images.cocodataset.org/val2017/000000136466.jpg",
+    ...     input_points=[[[[450, 600], [500, 620]]]],
+    ...     input_labels=[[[1, 0]]],  # 1=positive, 0=negative
+    ... )
+    [[{'score': 0.85, 'mask': tensor([...])}]]
+    ```
+    <Tip>
+    **Note:** The pipeline output format differs from using the model and processor manually. The pipeline returns a standardized format (list of lists of dicts with `score` and `mask`) to ensure consistency across all transformers pipelines, while the processor's `post_process_masks()` returns raw tensors.
+    </Tip>
+    ### Automatic Mask Generation Pipeline
     SAM2 can be used for automatic mask generation to segment all objects in an image using the `mask-generation` pipeline:
@@ Expand Down @@

docs/source/en/model_doc/sam3_tracker.md

-Original file line number
+Diff line change
@@ Expand Up @@
     ## Usage example
-    ### Automatic Mask Generation with Pipeline
+    ### Promptable Visual Segmentation Pipeline
+    The easiest way to use Sam3Tracker is through the `promptable-visual-segmentation` pipeline:
+    ```python
+    >>> from transformers import pipeline
+    >>> segmenter = pipeline(model="facebook/sam3", task="promptable-visual-segmentation")
+    >>> # Single point prompt
+    >>> segmenter(
+    ...     "http://images.cocodataset.org/val2017/000000077595.jpg",
+    ...     input_points=[[[[450, 600]]]],
+    ...     input_labels=[[[1]]],
+    ... )
+    [[{'score': 0.87, 'mask': tensor([...])}]]
+    >>> # Box prompt
+    >>> segmenter(
+    ...     "http://images.cocodataset.org/val2017/000000136466.jpg",
+    ...     input_boxes=[[[59, 144, 76, 163]]],
+    ... )
+    [[{'score': 0.92, 'mask': tensor([...])}]]
+    >>> # Multiple points for refinement
+    >>> segmenter(
+    ...     "http://images.cocodataset.org/val2017/000000136466.jpg",
+    ...     input_points=[[[[450, 600], [500, 620]]]],
+    ...     input_labels=[[[1, 0]]],  # 1=positive, 0=negative
+    ... )
+    [[{'score': 0.85, 'mask': tensor([...])}]]
+    ```
+    <Tip>
+    **Note:** The pipeline output format differs from using the model and processor manually. The pipeline returns a standardized format (list of lists of dicts with `score` and `mask`) to ensure consistency across all transformers pipelines, while the processor's `post_process_masks()` returns raw tensors.
+    </Tip>
+    ### Automatic Mask Generation Pipeline
     Sam3Tracker can be used for automatic mask generation to segment all objects in an image using the `mask-generation` pipeline:
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Promptable Visual Segmentation pipeline #43613

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Add Promptable Visual Segmentation pipeline #43613

Are you sure you want to change the base?

Uh oh!

Add Promptable Visual Segmentation pipeline #43613

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!