Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Segmentation pipeline #13828

Merged
merged 6 commits into from
Oct 8, 2021

Conversation

mishig25
Copy link
Contributor

@mishig25 mishig25 commented Oct 1, 2021

What does this PR do?

TLDR

Implements image-segmentation pipeline (for DetrForSegmentation atm).

API specifications

Input: image source (identical to the input of image-classification & object-detection pipelines)
Output:

List(Dict(
    mask: str, # base64 str
    label: float,
    score: int,
))

Two design choices I've made and would like to discuss (& modify if needed):

  1. Output png_string has masks information using same mechanism as COCO panoptic segmentation annotations. See section 4. Panoptic Segmentation from https://cocodataset.org/#format-data. Paraphrasing a bit:

per-pixel segment ids are stored in the PNG string. Each segment is assigned a unique id. Unlabeled pixels (void) are assigned a value of 0. Note that when you load the PNG as an RGB image, you will need to compute the ids via ids=R+G256+B256^2.

  1. Image segmentation pipeline accepts subtask arg. There are different variations of segmentation task (semantic, instance, panoptic, etc. see image below). If a model doesn't implement requested subtask, it gets defaulted to what's available. See example below:
    logger.warning("No subtask was supplied, defaulted to panoptic")
    return self.post_process_panoptic(outputs, processed_sizes, threshold=threshold)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Update 1:

Updated output shape:

List(Dict(
    mask: str, # base64 str
    label: float,
    score: int,
))

The goal for any segmentation architecture is to implement the most detailed version of segmentation subtask they can so that other subtasks can be reconstructed (if needed). For example, if an architecture post_process_segmentation method implements part-aware panoptic, other subtasks (including semantic, instance, etc.) can be reconstructed from part-aware panoptic output since all the details needed are in there

Copy link
Contributor

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice PR!

I think we can add more representative power to the pipeline output removing the need for subtask or at least internalize it's complexity (so we don't force model implementors to add irrelevant stuff)

The tests can be simplified quite a bit IMO.
If we really want subtask to exist, we need test that display the differences they imply.

Happy to help on changing the representation if needed.

Comment on lines +49 to +69
@staticmethod
def load_image(image: Union[str, "Image.Image"]):
if isinstance(image, str):
if image.startswith("http://") or image.startswith("https://"):
# We need to actually check for a real protocol, otherwise it's impossible to use a local file
# like http_huggingface_co.png
image = Image.open(requests.get(image, stream=True).raw)
elif os.path.isfile(image):
image = Image.open(image)
else:
raise ValueError(
f"Incorrect path or url, URLs must start with `http://` or `https://`, and {image} is not a valid path"
)
elif isinstance(image, Image.Image):
pass
else:
raise ValueError(
"Incorrect format used for image. Should be a URL linking to an image, a local path, or a PIL image."
)
image = image.convert("RGB")
return image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be time to create a utils file and put it in there maybe so we can just reuse this code all the time and test it separately. (it should be another PR, just mentioning it here)

@LysandreJik Would you agree ?

src/transformers/pipelines/image_segmentation.py Outdated Show resolved Hide resolved
tests/test_pipelines_image_segmentation.py Outdated Show resolved Hide resolved
tests/test_pipelines_image_segmentation.py Outdated Show resolved Hide resolved
tests/test_pipelines_image_segmentation.py Outdated Show resolved Hide resolved
src/transformers/pipelines/image_segmentation.py Outdated Show resolved Hide resolved
tests/test_pipelines_image_segmentation.py Outdated Show resolved Hide resolved
MODEL_FOR_IMAGE_SEGMENTATION_MAPPING_NAMES = OrderedDict(
[
# Model for Image Segmentation mapping
("detr", "DetrForSegmentation"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DetrForSegmentation was probably not the best name. Could we add an alias called DetrForImageSegmentation? Or should we distinguish between the different kinds of segmentation (this particular one is panoptic segmentation)

Copy link
Contributor Author

@mishig25 mishig25 Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, as noted in this discussion, we will not make any distinction between different segmentation subtasks and expect models/architectures to implement post_process_segmentation method that implements "the most detailed version of subtask" they can.

More details:
This output shape

List(Dict(
    mask: str, // base64 str
    label: float,
    score: int,
    parent: int // -1 for all for DetrForSegmentation
))

will allow us to implement part-aware panoptic segmentation (which is the most detailed segmentation subtask). And using this output, a user can re-construct semantic, instance or any othre segmentation subtask if they want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion of DetrForImageSegmentation. Do you want me to add this alias in a different PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding the alias in a different PR sounds good to me.

mishig25 and others added 4 commits October 4, 2021 21:18
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Copy link
Contributor

@Narsil Narsil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@mishig25
Copy link
Contributor Author

mishig25 commented Oct 8, 2021

Please let me know if I should merge this PR @Narsil @NielsRogge @LysandreJik

@Narsil
Copy link
Contributor

Narsil commented Oct 8, 2021

It's gtg for me.

@mishig25 mishig25 merged commit 026866d into huggingface:master Oct 8, 2021
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
* Implement img seg pipeline

* Update src/transformers/pipelines/image_segmentation.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/image_segmentation.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update output shape with individual masks

* Rm dev change

* Remove loops in test

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants