-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image Segmentation pipeline #13828
Image Segmentation pipeline #13828
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice PR!
I think we can add more representative power to the pipeline output removing the need for subtask
or at least internalize it's complexity (so we don't force model implementors to add irrelevant stuff)
The tests can be simplified quite a bit IMO.
If we really want subtask
to exist, we need test that display the differences they imply.
Happy to help on changing the representation if needed.
@staticmethod | ||
def load_image(image: Union[str, "Image.Image"]): | ||
if isinstance(image, str): | ||
if image.startswith("http://") or image.startswith("https://"): | ||
# We need to actually check for a real protocol, otherwise it's impossible to use a local file | ||
# like http_huggingface_co.png | ||
image = Image.open(requests.get(image, stream=True).raw) | ||
elif os.path.isfile(image): | ||
image = Image.open(image) | ||
else: | ||
raise ValueError( | ||
f"Incorrect path or url, URLs must start with `http://` or `https://`, and {image} is not a valid path" | ||
) | ||
elif isinstance(image, Image.Image): | ||
pass | ||
else: | ||
raise ValueError( | ||
"Incorrect format used for image. Should be a URL linking to an image, a local path, or a PIL image." | ||
) | ||
image = image.convert("RGB") | ||
return image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be time to create a utils file and put it in there maybe so we can just reuse this code all the time and test it separately. (it should be another PR, just mentioning it here)
@LysandreJik Would you agree ?
MODEL_FOR_IMAGE_SEGMENTATION_MAPPING_NAMES = OrderedDict( | ||
[ | ||
# Model for Image Segmentation mapping | ||
("detr", "DetrForSegmentation"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DetrForSegmentation
was probably not the best name. Could we add an alias called DetrForImageSegmentation
? Or should we distinguish between the different kinds of segmentation (this particular one is panoptic segmentation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, as noted in this discussion, we will not make any distinction between different segmentation subtasks and expect models/architectures to implement post_process_segmentation method that implements "the most detailed version of subtask" they can.
More details:
This output shape
List(Dict(
mask: str, // base64 str
label: float,
score: int,
parent: int // -1 for all for DetrForSegmentation
))
will allow us to implement part-aware panoptic
segmentation (which is the most detailed segmentation subtask). And using this output, a user can re-construct semantic
, instance
or any othre segmentation subtask if they want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the suggestion of DetrForImageSegmentation
. Do you want me to add this alias in a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, adding the alias in a different PR sounds good to me.
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Please let me know if I should merge this PR @Narsil @NielsRogge @LysandreJik |
It's gtg for me. |
* Implement img seg pipeline * Update src/transformers/pipelines/image_segmentation.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update src/transformers/pipelines/image_segmentation.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update output shape with individual masks * Rm dev change * Remove loops in test Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
What does this PR do?
TLDR
Implements
image-segmentation
pipeline (for DetrForSegmentation atm).API specifications
Input: image source (identical to the input of
image-classification
&object-detection
pipelines)Output:
Two design choices I've made and would like to discuss (& modify if needed):
png_string
has masks information using same mechanism as COCO panoptic segmentation annotations. See section4. Panoptic Segmentation
from https://cocodataset.org/#format-data. Paraphrasing a bit:subtask
arg. There are different variations of segmentation task (semantic, instance, panoptic, etc. see image below). If a model doesn't implement requested subtask, it gets defaulted to what's available. See example below:transformers/src/transformers/models/detr/feature_extraction_detr.py
Lines 738 to 739 in dd5c269
Before submitting
Pull Request section?
to it if that's the case.
Discussed in Tracking integration for Image Segmentation hub-docs#43, New object detection and image segmentation widgets hub-docs#6, Widget Image Segmentation huggingface_hub#378
documentation guidelines,
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Update 1:
Updated output shape:
The goal for any segmentation architecture is to implement the most detailed version of segmentation subtask they can so that other subtasks can be reconstructed (if needed). For example, if an architecture post_process_segmentation method implements part-aware panoptic, other subtasks (including semantic, instance, etc.) can be reconstructed from part-aware panoptic output since all the details needed are in there