Skip to content

Conversation

@hardikdava
Copy link
Contributor

Description

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested?

  • Tested locally with dataset coco128 (downloaded from roboflow)
  • Import using supervision.DetectionDataset
  • Visualized using supervision.BoxAnnotator
  • Saved dataset using supervision.DetectionDataset

Docs

  • Docs updated?

@yclicc
Copy link

yclicc commented Jun 26, 2023

Is this supposed to support segmentation masks? It doesn't seem to be doing that for me, it only exports the bounding boxes.

@hardikdava
Copy link
Contributor Author

@yclicc Not yet. But it will be supported in near future. Currently, it is supporting only object detection format.

@SkalskiP
Copy link
Collaborator

Hi @hardikdava 👋🏻 ! I was AFK at the CVPR conference last week. I'll do a code review today.

@hardikdava
Copy link
Contributor Author

@SkalskiP I also added support for instance segmentation masks loading. I tested locally. Please also test with coco segmentation dataset.

@hardikdava
Copy link
Contributor Author

Is this supposed to support segmentation masks? It doesn't seem to be doing that for me, it only exports the bounding boxes.

@yclicc You can try now. It should be working.

@yclicc
Copy link

yclicc commented Jun 26, 2023

@hardikdava that throws an error with my data on line 84 of dataset/formats/coco.py because "_polygons" is a dict. COCO format allows for segmentations either as a list of polygons or as a run length encoding (which is what I've got).

image

@hardikdava
Copy link
Contributor Author

@yclicc , have you used latest code? please share your dataset if possible? I tried using segmentation dataset from roboflow.

@yclicc
Copy link

yclicc commented Jun 26, 2023

This is latest code, but your code is expecting a list of polygons (which is a possible segmentation format for COCO, just not the only option). See https://opencv.org/blog/2021/10/12/introduction-to-the-coco-dataset/ for details and examples of both possible types of mask format.

@yclicc
Copy link

yclicc commented Jun 26, 2023

I can't share my dataset, but I'm generating it with CVAT's mask annotation mode, as opposed to the polygon creator. So perhaps unsurprisingly, it has exported a COCO format with Run Length Encoding instead of a list of polygons.

@hardikdava
Copy link
Contributor Author

@yclicc this PR only supports for polygons. But I can take a look and check for possibility to extend the functionality.

@yclicc
Copy link

yclicc commented Jun 26, 2023

I'll see if I can come up with a fix and then put in a pull request on your repo.

@hardikdava
Copy link
Contributor Author

@yclicc I tried to export dataset using CVAT by annotating object using "Draw new Mask" and export dataset in coco format. It still exported as polygons which is supported by this PR.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jun 26, 2023

Hi @hardikdava 👋🏻 ! This is so awesome to see you contributing this functionality to supervision. The biggest success of an opensource maintainer is to see people contributing to the project they build. Thanks a lot for that. 🙏🏻

  • I'm working on the review. Left a few comments already.
  • Could you also run those two commands below in the root of your fork directory? This will ensure your new code is formatted constantly with the rest of supervision.
black supervision
isort --profile black supervision/
  • Would you be okey with writing some unit tests?

NOTE: I still need to test loading and saving on the example dataset.

@hardikdava
Copy link
Contributor Author

@SkalskiP I have committed the changes you requested. Also ran two commands with black and isort. Please review it.

@hardikdava
Copy link
Contributor Author

Hi @hardikdava 👋🏻 ! This is so awesome to see you contributing this functionality to supervision. The biggest success of an opensource maintainer is to see people contributing to the project they build. Thanks a lot for that. 🙏🏻

* I'm working on the review. Left a few comments already.

* Could you also run those two commands below in the root of your fork directory? This will ensure your new code is formatted constantly with the rest of `supervision`.
black supervision
isort --profile black supervision/
* Would you be okey with writing some unit tests?

NOTE: I still need to test loading and saving on the example dataset.

@SkalskiP I would like to work on new features. I know unittests are important but I am not fan of implementing them ;)

@SkalskiP
Copy link
Collaborator

@hardikdava I did some tests and noticed two critical bugs. I marked them in the code. You can also use this Google Colab as reference: https://colab.research.google.com/drive/1spwwYyYO-LN3RDzp5s8whG1_7JzROm-m?usp=sharing

@SkalskiP
Copy link
Collaborator

@hardikdava

I would like to work on new features. I know unittests are important but I am not fan of implementing them ;)

No worries I can take care of that. For now, let's make sure simple tests in Google Colab work. And when we will be done with that. I'll make unittests ;)

@hardikdava
Copy link
Contributor Author

@SkalskiP please mark the critical part in the code and i will take care of them tomorrow. It's already late.

@SkalskiP SkalskiP added enhancement New feature or request version: 0.11.0 Feature to be added in `0.11.0` release labels Jun 27, 2023
@hardikdava
Copy link
Contributor Author

Hi @SkalskiP 👋🏻, thanks for merging my contribution. I am glad to see my contribution in community. Always happy to help! how should we discuss about new feature in github or else?

@SkalskiP
Copy link
Collaborator

@hardikdava 🔥

@yclicc
Copy link

yclicc commented Jun 28, 2023

I've found a problem with this code, in that the image_annotation["category_id"] might be indexed from 1 instead of from zero, or may even skip numbers (not sure if that is valid COCO, but it should be coped with properly). However, coco_categories_to_classes returns the class names as an (obviously zero indexed) list with no skips, but then coco_annotations_to_detections sets the class_ids to simply the image_annotation["category_id"] even though this may not be zero indexed.

@SkalskiP
Copy link
Collaborator

Hi @yclicc, I noticed that problem while reviewing one of the current PRs. The fix should be in main now. Could you try using it once again?

@yclicc
Copy link

yclicc commented Jun 28, 2023

Yep, that's fixed it. Thanks!

@SkalskiP
Copy link
Collaborator

@yclicc awesome! We will probably release those changes today or tomorrow.

@yclicc
Copy link

yclicc commented Jun 28, 2023

Ah, no I'm afraid this is still present. Suggest modifying coco_categories_to_classes to make it also return a lookup table from category["id"] to index in the new category list.

@SkalskiP
Copy link
Collaborator

I just tested it. It works for me. Making it a lookup table would require changing every other format we support. Can you give me any example of a dataset it doesn't work for?

@yclicc
Copy link

yclicc commented Jun 28, 2023

Any COCO dataset where the numbering of the ids of the categories doesn't start at 0 and/or skips a number.

E.g. "categories": [{"id":1, "name":"Albatross", "supercategory": ""}, {"id": 3,"name":"Walrus", "supercategory":""}]
That will then return a dataset.yaml of the form

names:
- Albatross
- Walrus
nc: 2

when outputting to YOLO, but the indices of each row of the YOLO dataset will be 1 and 3 instead of 0 and 1. By a lookup table, you don't need to modify anything except the COCO code. Have coco_categories_to_classes return a lookup table, then have coco_annotations_to_detections accept that lookup table and then when the class_ids variable is set have it lookup in that lookup table for each image_annotation["category_id"]. The only downside is that converting from COCO to itself will lose information by changing the IDs in the outputted COCO. But I'd say that's a price worth paying to have the result work properly with YOLO.

@SkalskiP
Copy link
Collaborator

So the ids of your input dataset are not ordered from 0 to n-1 where n is the number of categories? I see how it breaks.

In that case, we have two options:

  • Remapping ids during COCO loading (I would prepare that)
  • Storing classes as dictionary not a list

@yclicc
Copy link

yclicc commented Jun 28, 2023

Mine are actually consecutive from 1 onwards, which is the format CVAT output for me, but the possibility of a number being missed in general (rather than just 0) ought to be considered I think.

@SkalskiP
Copy link
Collaborator

@yclicc Yeah this is something that we need to take into consideration :/

@SkalskiP
Copy link
Collaborator

@hardikdava could you take a look at the problem @yclicc described?

@hardikdava
Copy link
Contributor Author

@SkalskiP sure. I will take a look at it and try to fix it.

@SkalskiP
Copy link
Collaborator

@hardikdava awesome!

@hardikdava
Copy link
Contributor Author

hardikdava commented Jun 29, 2023

@SkalskiP and @yclicc some of the things I found about coco dataset.

Issue: Category map ids assignment

  • COCO dataset category map ids starting from 1 not 0. Source: src1 src2 src3
  • But coco dataset downloaded from Robfolow has category map ids starting from 0.
  • Even though starting ids from 0 does not create an issue. Tested with pycocotools(python API to work with coco dataset).
  • We can make it something like ids will start from 1 for only coco dataset i.e. +1 offset to class id.

Issue: Segmentation polygon type

“annotations”: [{
    ”segmentation”:
    {	
        “counts”: [34, 55, 10, 71]
        “size”: [240, 480]
    },
    “area”: 600.4,
    “iscrowd”: 1,
    “Image_id:” 122214,
    “bbox”: [473.05, 395.45, 38.65, 28.92],
    “category_id”: 15,
    “id”: 934
}
  • I personally download instance segmentation label from CVAT and it does not have above mentioned format.
  • I would suggest let the segmentation as polygons and later if we receive some more issue from other users then we will plan to improve it.

Let me know your views soon.

@yclicc
Copy link

yclicc commented Jun 29, 2023

Category map ids assignment
The thing is, once it is loaded as a detection we want it to be agnostic to the fact that came from COCO format. So you would want to make it zero indexed so that when it is then written to YOLO (the most common usecase for the Supervision library I would think?) it works properly and plays nicely with the YOLO models. If indeed it is the convention to start at 1 for COCO then the exporter that goes back to COCO could of course do that.

Segmentation polygon type
It remains the case that the RLE format is part of the COCO spec, though, so you may assume that people will try to load such data in. As it is now, it throws an unhandled exception when this happens. Might it be worth at least doing a try catch and raising a NotImplementedError if segmentation is a dict instead of a numpy array?

@hardikdava
Copy link
Contributor Author

@yclicc my thoughts,

Segmentation polygon type
I can add something like if the segmentation annotation datatype is dict then it will raise NotImplementedError. This can easily be done.

Category map ids assignment

  • supervision is not bounded by yolo. So it has to be work with yolo or any other format.
  • The idea of offset will be included while writing the coco dataset. But currently, supervision does not have any information whether the original dataset is loaded from coco or yolo so think of following situations:
  • situation-1: dataset is loaded from coco and saving in coco format then offset of 1in class_id is not needed. Because class_id is already starting from 1
  • situation-2: dataset is loaded from other than coco format and saving in coco format then offset of 1 in class_id is needed as coco class_ids are starting from 1.

@SkalskiP
Copy link
Collaborator

Hi @hardikdava and @yclicc 👋🏻 How about we just remap class ids when we load COCO annotations?

@hardikdava
Copy link
Contributor Author

@yclicc @SkalskiP Found the solution. We should offset the coco-category_id by -1 when we load the coco dataset. So every class ids are starting from 0 regardless of dataset. We can offset class id by +1 while we are storing dataset only for coco format. This will fix the issue of storing the dataset. I do not see any issue for this.

@hardikdava hardikdava mentioned this pull request Jun 30, 2023
@hardikdava
Copy link
Contributor Author

@yclicc can you try with #169 ? Let me know how it goes.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jul 4, 2023

@yclicc I created a fix for this issue #176. Could you take a look and try to see if it works?

@yclicc
Copy link

yclicc commented Jul 5, 2023

Seems good to me, though it still fails as my dataset contains RLEs rather than polygons.

@yclicc
Copy link

yclicc commented Jul 5, 2023

Yep, all seems to work fine when I add my RLE code back in (which sadly I can't share). Thanks!

@SkalskiP
Copy link
Collaborator

SkalskiP commented Jul 5, 2023

Yep, all seems to work fine when I add my RLE code back in (which sadly I can't share). Thanks!

@yclicc We will add RLE support in feature releases 💯 Thanks for testing.

@hardikdava looks like we will move forward with #176.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request version: 0.11.0 Feature to be added in `0.11.0` release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants