Add capability to convert keypoints from COCO to YOLOv5 format #117

chrisrapson · 2023-07-06T21:19:43Z

In case you're interested in converting keypoints as well as bounding boxes.

Following the format from here:
WongKinYiu/yolov7#1267

I did some manual tests on images with:

no annotations (outputs a text file with only spaces)
bounding box annotations but no keypoint annotations (outputs a text file with 5 fields per line, as usual)
combinations of visible and not-visible keypoints (outputs the usual 5 fields per line, followed by Nx3 values for keypoints)

alexheat · 2023-07-07T16:02:50Z

Thank you @chrisrapson I will check it out.

alexheat · 2023-07-07T16:09:39Z

@chrisrapson , how is this different than the Segmentation support described here #65

If this is something different than the segmentation support. Could you possibly provide a sample dataset so I can test it out?

chrisrapson · 2023-07-09T22:52:36Z

It's definitely similar, but not quite the same. The COCO dataset is the canonical example. They explain the format for keypoints here:
https://cocodataset.org/#format-data

Here's another dataset that has keypoint labels in the VOC format
https://sites.google.com/view/animal-pose/

It looks like it would be challenging to include both segmentation and keypoint data in a YOLO-formatted file, because in a YOLO-formatted file data is interpreted based on its position within a list. Segmentations are an arbitrarily long list of pairs of floats. Keypoints are a list of triplets of floats. The number of keypoints should be the same for all images in a dataset, but won't be the same across datasets. It wouldn't be possible to know when the list of segmentations ended, and the list of keypoints began (or vice versa).

I think it would be simplest to restrict users to convert only one of either keypoints or segmentations to YOLO. I can't think of a use case where somebody would train a network that needs both segmentation and keypoint data. That is, add a flag keypoints that has equivalent functionality to your segmentations option. Then enforce that maximum one of segmentations or keypoints can be True.

alexheat · 2023-07-10T03:16:47Z

Thank you. I had never used key points before but I am getting it. Could you help me with a few more things:

Can you recomend a coco dataset that I can use to test it? Should I just get one of the segmentation ones here? https://cocodataset.org/#download
Could you add a doc string to the export function that explains this new functionality?

chrisrapson · 2023-07-10T21:01:58Z

The keypoints task and the segmentation challenges use the same images. The annotations are saved in person_keypoints_train2017.json and person_keypoints_val2017.json instead of instances_train2017.json.

One possible place to download them is from huggingface: https://huggingface.co/datasets/merve/coco/tree/main/annotations

Good idea about the doc string. I'll add that and a boolean keypoints flag, and then update the PR.

chrisrapson · 2023-07-13T01:55:18Z

See the two extra commits. The first adds a docstring and a boolean flag. For the second I added the capability to export keypoints to COCO.

alexheat · 2023-07-13T11:43:17Z

=================================== FAILURES ===================================
_______________________________ test_export_coco _______________________________

coco_dataset = <pylabel.dataset.Dataset object at 0x7f5c2191fc10>

def test_export_coco(coco_dataset):

  path_to_coco_export = coco_dataset.export.ExportToCoco()

tests/test_main.py:174:

self = <pylabel.exporter.Export object at 0x7f5c2191f310>, output_path = None
cat_id_index = None

def ExportToCoco(self, output_path=None, cat_id_index=None):
    """
    Writes COCO annotation files to disk (in JSON format) and returns the path to files.

    Args:
        output_path (str):
            This is where the annotation files will be written. If not-specified then the path will be derived from the path_to_annotations and
            name properties of the dataset object.
        cat_id_index (int):
            Reindex the cat_id values so that they start from an int (usually 0 or 1) and
            then increment the cat_ids to index + number of categories continuously.
            It's useful if the cat_ids are not continuous in the original dataset.
            Some models like Yolo require starting from 0 and others like Detectron require starting from 1.

    Returns:
        A list with 1 or more paths (strings) to annotations files.

    Example:
        >>> dataset.exporter.ExportToCoco()
        ['data/labels/dataset.json']

    """
    # Copy the dataframe in the dataset so the original dataset doesn't change when you apply the export tranformations
    df = self.dataset.df.copy(deep=True)
    # Replace empty string values with NaN
    df = df.replace(r"^\s*$", np.nan, regex=True)
    pd.to_numeric(df["cat_id"])

    df["ann_iscrowd"] = df["ann_iscrowd"].fillna(0)

    if cat_id_index != None:
        assert isinstance(cat_id_index, int), "cat_id_index must be an int."
        _ReindexCatIds(df, cat_id_index)

    df_outputI = []
    df_outputA = []
    df_outputC = []
    list_i = []
    list_c = []
    json_list = []

    pbar = tqdm(desc="Exporting to COCO file...", total=df.shape[0])
    for i in range(0, df.shape[0]):
        images = [
            {
                "id": df["img_id"][i],
                "folder": df["img_folder"][i],
                "file_name": df["img_filename"][i],
                "path": df["img_path"][i],
                "width": df["img_width"][i],
                "height": df["img_height"][i],
                "depth": df["img_depth"][i],
            }
        ]

        # Skip this if cat_id is na
        if not pd.isna(df["cat_id"][i]):
            annotations = [
                {
                    "image_id": df["img_id"][i],
                    "id": df.index[i],
                    "segmented": df["ann_segmented"][i],
                    "bbox": [
                        df["ann_bbox_xmin"][i],
                        df["ann_bbox_ymin"][i],
                        df["ann_bbox_width"][i],
                        df["ann_bbox_height"][i],
                    ],
                    "area": df["ann_area"][i],
                    "segmentation": df["ann_segmentation"][i],
                    "iscrowd": df["ann_iscrowd"][i],
                    "pose": df["ann_pose"][i],
                    "truncated": df["ann_truncated"][i],
                    "category_id": int(df["cat_id"][i]),
                    "difficult": df["ann_difficult"][i],
                }
            ]

            # include keypoints, if available
            if "ann_keypoints" in df.keys():

              n_keypoints = int(len(df["ann_keypoints"][i]) / 3)  # 3 numbers per keypoint: x,y,visibility

E TypeError: object of type 'numpy.float64' has no len()

pylabel/exporter.py:821: TypeError
----------------------------- Captured stderr call -----------------------------

Exporting to COCO file...: 0%| | 0/4888 [00:00<?, ?it/s]
=============================== warnings summary ===============================
../../../../../opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/jupyter_bbox_widget/bbox.py:48
/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/jupyter_bbox_widget/bbox.py:48: DeprecationWarning: Traits should be given as instances, not types (for example, Int(), not Int). Passing types is deprecated in traitlets 4.1.
classes = List(Unicode).tag(sync=True)

../../../../../opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/jupyter_bbox_widget/bbox.py:50
/opt/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/jupyter_bbox_widget/bbox.py:50: DeprecationWarning: Traits should be given as instances, not types (for example, Int(), not Int). Passing types is deprecated in traitlets 4.1.
colors = List(Unicode, [

alexheat · 2023-07-13T11:43:48Z

I think this is the relevant error message

DeprecationWarning: Traits should be given as instances, not types (for example, Int(), not Int). Passing types is deprecated in traitlets 4.1. colors = List(Unicode, [

alexheat · 2023-07-13T11:45:04Z

You can find the instructions to run the tests manually here
https://github.com/pylabel-project/pylabel/tree/dev/tests

alexheat · 2023-07-16T11:08:33Z

@chrisrapson I have cherry picked your commits for the yolo output and released it in the latest package, v52. Thank you!

For to Coco export, the issue is in this part of the code:

~/Code/scratch/pylabel/pylabel/exporter.py in ExportToCoco(self, output_path, cat_id_index)
    827                 if "ann_keypoints" in df.keys():
    828                     n_keypoints = int(
--> 829                         len(df["ann_keypoints"][i]) / 3
    830                     )  # 3 numbers per keypoint: x,y,visibility
    831                     annotations[0]["num_keypoints"] = n_keypoints
TypeError: object of type 'float' has no len()

Is the issue the [i] in len(df["ann_keypoints"][i])?

… without keypoint labels)

chrisrapson · 2023-07-17T06:24:35Z

I think I've fixed it now. I only tested it on my dataset which had keypoint labels for all images. I misunderstood the logic that a dataset (or image) with no keypoint labels wouldn't have "ann_keypoints" in its list of keys, but of course it has that column filled with "" which are converted to np.nan.

Once I found the test and ran it, it wasn't too hard to implement the if statement properly. I've updated the PR.

There's still no automatic test that really verifies the new feature, but that would require adding a new dataset which had keypoint labels.

alexheat · 2023-07-18T00:22:38Z

Thank you @chrisrapson . I Merged it and published it in the latest release .v53.

It would be awesome to have a sample notebook to demo the functionality for others to add to the library at https://github.com/pylabel-project/samples.

Is that something you would be able to do do (someday)?

Add capability to convert keypoints from COCO to YOLOv5 format

2339d22

Chris Rapson and others added 2 commits July 12, 2023 14:51

Guard keypoints export to YOLO with a flag, and add documentation

4a12a6c

Export keypoints to COCO format, if available

22c7d14

Bump v# to 52

86af20f

Make COCO keypoints exporter work with NaNs (i.e. datasets and images…

badfabf

… without keypoint labels)

alexheat merged commit 893253a into pylabel-project:dev Jul 17, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add capability to convert keypoints from COCO to YOLOv5 format #117

Add capability to convert keypoints from COCO to YOLOv5 format #117

chrisrapson commented Jul 6, 2023

alexheat commented Jul 7, 2023

alexheat commented Jul 7, 2023 •

edited

chrisrapson commented Jul 9, 2023

alexheat commented Jul 10, 2023

chrisrapson commented Jul 10, 2023

chrisrapson commented Jul 13, 2023

alexheat commented Jul 13, 2023

alexheat commented Jul 13, 2023

alexheat commented Jul 13, 2023

alexheat commented Jul 16, 2023

chrisrapson commented Jul 17, 2023 •

edited

alexheat commented Jul 18, 2023

Add capability to convert keypoints from COCO to YOLOv5 format #117

Add capability to convert keypoints from COCO to YOLOv5 format #117

Conversation

chrisrapson commented Jul 6, 2023

alexheat commented Jul 7, 2023

alexheat commented Jul 7, 2023 • edited

chrisrapson commented Jul 9, 2023

alexheat commented Jul 10, 2023

chrisrapson commented Jul 10, 2023

chrisrapson commented Jul 13, 2023

alexheat commented Jul 13, 2023

alexheat commented Jul 13, 2023

alexheat commented Jul 13, 2023

alexheat commented Jul 16, 2023

chrisrapson commented Jul 17, 2023 • edited

alexheat commented Jul 18, 2023

alexheat commented Jul 7, 2023 •

edited

chrisrapson commented Jul 17, 2023 •

edited