Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix detections_to_coco_annotations function for empty polygons. #1086

Draft
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

Youho99
Copy link

@Youho99 Youho99 commented Apr 2, 2024

Description

Fix issue #1085

Allows you not to cause an error if a polygon is empty

Modify the fields of the json file:

  • the order of fields is affected
    • the bbox, area, or segmentation field including now the last fields of the annotation block
  • from now on, there will be either the bbox and area fields (when working with bboxes), or the segmentation field (when working with segmentation)

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Follow Minimal Reproducible Example of issue #1085

Any specific deployment considerations

Json fields modified (see comment in description)

Docs

  • Docs updated? What were the changes:

@CLAassistant
Copy link

CLAassistant commented Apr 2, 2024

CLA assistant check
All committers have signed the CLA.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Apr 2, 2024

Hi @Youho99! Thanks a lot for the time you spend creating this PR. Could you explain what's the problem with having bbox, area and segmentation simultaneously?

@Youho99
Copy link
Author

Youho99 commented Apr 2, 2024

Hi @Youho99! Thanks a lot for the time you spend creating this PR. Could you explain what's the problem with having bbox, area and segmentation simultaneously?

I don't know if it's really a problem to have both bbox, area, and segmentation.

Just, in this case, there are things to modify:

  • Be careful not to use the bbox generated by GroundingDino (the one currently used), as it will not be the segmentation bbox. We therefore need to create a polygon_to_xyxy function for coco.
  • For the rest, I think it will be automatically managed by my code

For information, I dissociated bbox, area from segmentation for 2 reasons:

  • In yolo and pascal_voc formats, we do not have bbox information when using segmentation. In the idea where I replicated how these formats work, I also replicated this.
  • During my tests, I exported the annotations to CVAT to visualize them. When I had an empty segmentation, the annotation that was displayed was that of the bbox (GroundingDino). But this scenario should not occur with my code.

Would you like me to also do the polygon_to_xyxy function for coco?

PS: the result of the polygon_to_xyxy function, for the pascal_vo and yolo formats, is never used for the polygon scenario. In fact, we should just make the xyxy argument optional in the object_to_pascal_voc and object_to_yolo functions. But that's not part of this problem, it's just something I noticed while I was studying the code ;).

@Youho99
Copy link
Author

Youho99 commented Apr 3, 2024

Small modification:

I left the "segmentation" field empty when the technology is detection (bbox), because otherwise, we could not import into CVAT (so I suppose that the "segmentation" field is obligatory in the coco format)

@SkalskiP
Copy link
Collaborator

SkalskiP commented Apr 5, 2024

@LinasKo could you take a look at this next week?

@LinasKo
Copy link
Collaborator

LinasKo commented Apr 5, 2024

Sure.

I've already verified the error happens and is subsequently fixed by this PR, but haven't looked at all into coco's format specification.

@LinasKo
Copy link
Collaborator

LinasKo commented Apr 11, 2024

Hi @Youho99 👋

I'll be reviewing this tomorrow. So far, I have one question: should object_to_coco remove the bbox when returning a segmentation? By my intuition, if it's available, should we not include it in the result?

Can you also comment a bit about the bbox generated by GroundingDino versus the one we have? Does it mean something else other than the detected object boxes?

@Youho99
Copy link
Author

Youho99 commented Apr 11, 2024

Hi @LinasKo !

What do I think of it :
In the case of segmentation, the bbox will necessarily be available. The code provided in my PR filters empty or too small segmentations (I followed the filter rules of the base code, so no worries about the filtering logic).
So in theory it does not pose a problem to leave the additional bbox information in the coco json. Afterwards, it is not impossible that in certain specific use cases, only segmentation data is needed without bbox data (I don't see any cases like this, I'm just hypothesizing).

I would therefore be in favor of leaving the bbox information (and the area, it goes together) in addition to the segmentation information.

Note: In my current PR, I decided to follow the same logic as the yolo and pascal_voc formats, which are formats that do not save bbox information.

For the question of the difference in the calculation of the bbox:
autodistill_grounded_sam works like this:
1/ it invokes autodistill_grounding_dino, which generates a bbox.
2/ autodistill_grounded_sam retrieves the bbox, and segments the object in the bbox (with bbox prompt mode)
3/ the autodistill_grounding_dino bbox information and the autodistill_grounded_sam segmentation information are saved in the json

So here's the problem:
The autodistill_grounding_dino bbox is not the segmentation bbox, but the bbox used to generate the segmentation.
It is thus necessary to recalculate the bbox of the segmentation (taking the min and max x and y of the segmentation), as is already done for yolo and pascal_voc via the polygon_to_xyxy function (because these formats do not record the information of bbox of segmentation)

I made a diagram to illustrate my points, admire my artistic talents!

image

@LinasKo
Copy link
Collaborator

LinasKo commented Apr 12, 2024

That's a very thorough explanation :)
(A truly majestic depiction 🧑‍🎨)

Let's add back the bbox in coco segmentation, even if it's inconsistent with other from_X methods. Who knows; maybe someone out there is relying on it.


I'd be keen to fix the Grounded SAM bbox issue too. Am I right that if we used Grounded SAM, e.g. via from_sam in supervision/detections/core.py, we're picking the wrong boxes for our xyxy?

I'll ask internally too - I might be confused about potential entry points of grounded SAM data into our ecosystem.

But this sounds like a separate PR.

@Youho99
Copy link
Author

Youho99 commented Apr 12, 2024

Yes, I agree to add it too
If a person tells us that for their use, they necessarily have to remove the bbox, then perhaps we will have to add a parameter to a function to filter, depending on the user's wishes.
But the bbox must be that of segmentation (and not that of grounding dino ;D)

For the second part of the message:
I have the impression that you are indeed using the wrong bbox. However, I haven't checked it there. I know that the bbox contained in the Detections object is the grounding dino bbox, so it should not be called (and that seems to be the case here)

I made a code (for another use) allowing to recalculate the SAM bboxes precisely starting from a larger bbox (this simulates our use case). I'll see about making it open-source (I'll have the info in a few hours) if that helps.
But there is nothing very complex, you just need to read the mask, retrieve the coordinates of the extrema (in length and width) to calculate the bbox

@SkalskiP
Copy link
Collaborator

Be careful not to use the bbox generated by GroundingDino (the one currently used), as it will not be the segmentation bbox. We therefore need to create a polygon_to_xyxy function for coco.

@Youho99 Nowhere in the Supervision repository do we directly depend on GroundingDINO results. Could you point out where you saw this?

I have the impression that you are indeed using the wrong bbox. However, I haven't checked it there. I know that the bbox contained in the Detections object is the grounding dino bbox, so it should not be called (and that seems to be the case here)

Once again, I'm not sure why we are talking about GroundingDINO here. :)

@Youho99
Copy link
Author

Youho99 commented Apr 12, 2024

@SkalskiP

Here's why I'm talking about grounding dino and grounded sam:

I use Supervision, autodistill_grounding_dino and autodistill_grounded_sam, as in issue #1085

In fact, when we do as_coco(), the code takes the Detections object, reads the annotations.bbox and annotations.segmentation part
So here, in fact, no direct link with grounding dino or grounded sam

However, I'm talking about how the Detections object is constructed, where the data inside comes from.
When I use autodistill_grounded_sam to generate a Detections object ( GroundedSAM(...).label() ), it calls autodistill_grounding_dino to generate the bboxes, which are then saved in the annotations.bbox part of the Detections object. These bboxes (output of autodistill_grounding_dino) are then used to segment 1 object per bbox (via the SAM bbox prompt) which are then saved in the annotations.segmentation part of the same Detections object.

And so, when we convert the Detections object into coco, we note the bboxes of Grounding Dino, and the segmentations of Grounded SAM (and therefore we do not record the bboxes of the segmentation masks)

Therefore, it may also seem logical to modify the registration of the segmentation bboxes directly in autodistill_grounded_sam (by which I mean that it will certainly be more modular to do so, by solving the problem at the root, if other modules do not not present the problem)

@Youho99
Copy link
Author

Youho99 commented Apr 12, 2024

There would therefore be 2 PRs:
A first allowing not to cause an error if a polygon is empty (on Supervision)

A second allowing the bboxes of the segmentation masks to be recorded in the Detections object generated by autodistill_grounded_sam (on autodistill_grounded_sam)

supervision/dataset/formats/coco.py Show resolved Hide resolved
supervision/dataset/formats/coco.py Outdated Show resolved Hide resolved
@SkalskiP
Copy link
Collaborator

@Youho99

There would therefore be 2 PRs:
A first allowing not to cause an error if a polygon is empty (on Supervision)

A second allowing the bboxes of the segmentation masks to be recorded in the Detections object generated by autodistill_grounded_sam (on autodistill_grounded_sam)

This is the solution I would lean towards. Supervision strives to be a general library. In this case, its task is to save the user's input as a COCO dataset.
I agree—we need to fix the issue causing the error. I already reviewed your PR and left 2 comments.

Regarding the specific problem with GroundedSAM, it should be resolved in autodistill.

supervision/dataset/formats/coco.py Outdated Show resolved Hide resolved
Youho99 and others added 2 commits April 15, 2024 14:44
Always include box, area, and segmentation in the result COCO JSON.
supervision/dataset/formats/coco.py Outdated Show resolved Hide resolved
@Youho99
Copy link
Author

Youho99 commented Apr 16, 2024

@SkalskiP @LinasKo
I have just made the necessary modifications to handle disjointed masks again.

Then, I will switch to autodistill_grounded_sam to calculate the bbox of the segmentations


While doing my tests, I noticed that I had an error when I tried to regenerate the DetectionDataset object with from_coco, when I activated the force_masks parameter.

I tried with supervision 0.19.0, and I get the error ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

I therefore deduce that I did not introduce this bug, but that it exists...

@Youho99
Copy link
Author

Youho99 commented Apr 18, 2024

Hello !

I took the opportunity to correct the bug cited in my previous message as well.

Now, the force_masks parameter of the from_coco() function no longer raises an error. I made sure to also support disjoint masks.

@Youho99 Youho99 requested a review from SkalskiP April 25, 2024 13:14
@Youho99
Copy link
Author

Youho99 commented Jun 3, 2024

Any news about this Pull Request? Could you review it?
@LinasKo @SkalskiP

@Youho99 Youho99 marked this pull request as draft June 20, 2024 09:25
@jschultz299
Copy link

Hi @Youho99, not sure if you are still working on this, but I am also dealing with inconsistent and often disjointed masks. I was able to cherry pick from your draft PR into the current develop branch, and converting detections with disjointed masks to COCO polygon annotations using as_coco worked well.

Converting from COCO annotations using from_coco() and force_masks=True, however, I ran into an issue. Right now, it looks like you are taking every segmentation in the image and merging them together. Instead, I think we should perform this operation on each detection separately, then concatenate the masks together. That way, we only merge masks belonging to the same detection and preserve the number of total masks. Happy to discuss further or submit a PR if others are still interested in this feature.

Proposed solution (line 85):

if with_masks:
    masks = []
    for image_annotation in image_annotations:
        polygons = [
            np.reshape(np.asarray(segmentation, dtype=np.int32), (-1, 2))
            for segmentation in image_annotation["segmentation"]
        ]
        separate_masks = _polygons_to_masks(
            polygons=polygons, resolution_wh=resolution_wh
        )
        mask = (
            np.sum(
                separate_masks, axis=0, keepdims=True
            )  # merge mask parts representing a disjoint mask
            > 0  # ensure that the result is a binary mask
        )
        masks.append(mask)

    mask = np.concatenate(masks, axis=0)

@LinasKo
Copy link
Collaborator

LinasKo commented Oct 14, 2024

Hi @jschultz299 👋

It's on us, that this PR is not reviewed yet. I'll have a look at a few hacktoberfest issues, and come back to this. It's high time I gave it the attention it deserves.

@Youho99
Copy link
Author

Youho99 commented Oct 15, 2024

Since then, the library code has changed a lot (so I put the PR in Draft).

I haven't retested since, but I'm pretty sure that if the problem hasn't been solved via other PRs, then you have to start the solution from scratch and reimplement it with the current version of supervision.

I'm not working on the project where I needed this feature anymore, so it's a bit far for me right now.

We should test the case I proposed showing the bug, to see if it is fixed or not yet.

@LinasKo
Copy link
Collaborator

LinasKo commented Oct 15, 2024

Thanks @Youho99. I'll take over and get it sorted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants