Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the polygon annotation in the dataset #1823

Closed
SylvainArd opened this issue Nov 20, 2022 · 36 comments
Closed

Questions about the polygon annotation in the dataset #1823

SylvainArd opened this issue Nov 20, 2022 · 36 comments
Assignees

Comments

@SylvainArd
Copy link

Hello,.
I would like to know if the points in the JSON training file formatted in COCO must be inside their polygons or if it does not matter. Moreover I would like to know if the detection of points takes into account the neighborhood of the point on the one hand and the neighborhood of its polygon on the other hand. Finally I would like to know how to customize my point and polygon classes in your software.
Thanks
Cordially

@ly015
Copy link
Member

ly015 commented Nov 21, 2022

Hi, thanks for your interest in MMPose.

  1. It should not matter whether the keypoints are inside the polygons. The polygon annotation is not used in most algorithms in MMPose, except for Associative Embedding where the polygon is used to generate masks of invalid instances (instances with no labeled keypoints).
  2. The keypoint detection model usually takes the bounding-box area (top-down) or the whole image (bottom-up) as the input. As mentioned above, the polygon information is not used in inference.
  3. Could you please further clarify your needs? Do you need to use MMPose with customized data formats other than COCO?

@SylvainArd
Copy link
Author

SylvainArd commented Nov 21, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 21, 2022

In this case, I think you can directly use TopDownCocoDataset (for mmpose 0.x) or CocoDataset (for mmpose 1.0). You would probably need a metainfo config to describe the keypoint definition of your data (e.g. the metainfo of coco)

@SylvainArd
Copy link
Author

SylvainArd commented Nov 21, 2022 via email

@SylvainArd
Copy link
Author

SylvainArd commented Nov 21, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 21, 2022

You needn't change the dataset class (if your annotation file is in standard coco format), but only the metainfo file. And you will need to modify the config file to correctly set the path of the metainfo file and the annotation file, the model output channel number (usually equal to the point number), and (only in mmpose 0.x) other data-related parameters like here.

@SylvainArd
Copy link
Author

SylvainArd commented Nov 21, 2022 via email

@SylvainArd
Copy link
Author

SylvainArd commented Nov 21, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 21, 2022

Actually, we have documentation for these questions (assume you are using mmpose 0.29. Documents of other versions can also be found at readthedocs):

As for the input size, 224x224 (for general objects) or 192x256 (for persons) are typical settings for top-down models, where the input is the bounding-box area of a single object. If you would like to use a bottom-up method where you can input the entire image, please refer to configs of Associative Embedding, CID, or DEKR (can be found here)

@SylvainArd
Copy link
Author

SylvainArd commented Nov 21, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 21, 2022

In a bottom-up config, the image_size is the length of the shorter edge of the resized image to feed into the model. We recommend setting image_size as a multiple of 32, which is the common down-sampling factor of backbones.

base_size shouldn't matter if you do not turn on scale_aware_sigma (see here for details)

And you may need to adjust heatmap_size here according to the image_size, num_scales and the model. For example, in this config, HigherHRNet (backbone/head) is used to output 2-level heatmaps, which will have 1/4 and 1/2 size of the input respectively. So the heatmap_size is set as [128, 256] (where the input_size is 512).

Please also note that the polygon information is used in bottom-up methods to generate ignoring masks, as mentioned above. So it's better to reformat it to coco style in the annotation files.

@SylvainArd
Copy link
Author

SylvainArd commented Nov 21, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 21, 2022

with your program the points are associated with their bbox or object or not?

Yes.

how to specify to do prediction with CPU

Please check the argument --device=cpu of the demo scripts, as described in the documentation: https://mmpose.readthedocs.io/en/latest/demo.html#d-human-pose-demo

what version of Cuda is needed

It depends on the PyTorch version. MMPose 0.29.0 works with PyTorch>=1.5.0

the coordinates in COCO files are for the original size or the "image_size"?

The original size. The image will be resized to image_size and the keypoint coordinates will be transformed accordingly.

As for the polygon, in the bottom-up dataset, it will try to generate a mask of the invalid area (see https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/datasets/base/kpt_2d_sview_rgb_img_bottom_up_dataset.py#L136-L157), which will be ignored in loss computation. For example, COCO has instances of "crowd people" where the polygon is available but the keypoint is not. In this case, we usually do not compute loss from this area by using a mask generated from the polygon. If there is no such invalid object instance in your data, it should be fine to leave the 'polygon' annotation as it is, because it will not be used. Otherwise, it's better to convert the 'polygon' annotation into COCO format (the key should be "segmentation" and the content should be RLE or polygon as in https://cocodataset.org/#format-data).

@SylvainArd
Copy link
Author

SylvainArd commented Nov 21, 2022 via email

@SylvainArd
Copy link
Author

SylvainArd commented Nov 23, 2022 via email

@SylvainArd
Copy link
Author

SylvainArd commented Nov 23, 2022 via email

@SylvainArd
Copy link
Author

SylvainArd commented Nov 23, 2022

the bug is here :
def _get_mask(self, anno, idx):
"""Get ignore masks to mask out losses."""
coco = self.coco
img_info = coco.loadImgs(self.img_ids[idx])[0]

    m = np.zeros((img_info['height'], img_info['width']), dtype=np.float32)

    for obj in anno:
    
        print(obj)
        if 'segmentation' in obj:
            if obj['iscrowd']:
                rle = xtcocotools.mask.frPyObjects(obj['segmentation'],
                                                   img_info['height'],
                                                   img_info['width'])
                m += xtcocotools.mask.decode(rle)
            elif obj['num_keypoints'] == 0:
                rles = xtcocotools.mask.frPyObjects(
                    obj['segmentation'], img_info['height'],
                    img_info['width'])
                for rle in rles:
                    m += xtcocotools.mask.decode(rle)

    return m < 0.5

in the file mmpose\datasets\datasets\base\kpt_2d_sview_rgb_img_bottom_up_dataset.py
my images have several objects will all iscrowd=0 and not num_keypoints, have I to put is_crowd=1, what is exactly iscrowd and num_keypoints please ?

@SylvainArd
Copy link
Author

I verified I must put iscrowd to 0, I think the line :
if obj['iscrowd']:
means "if obj['iscrowd'] is defined"
but it is evaluated as "if obj['iscrowd'] =1"
what do you think ?

@ly015
Copy link
Member

ly015 commented Nov 24, 2022

The definition of COCO format, including the description of 'iscrowd', can be found at: https://cocodataset.org/#format-data

Crowd annotations (iscrowd=1) are used to label large groups of objects (e.g. a crowd of people).

To prepare data for MMPose, if there are objects that have segmentation (or polygon) annotation but no keypoint annotation, it's better to set these objects as iscrowd==1, and they will be ignored in loss computation to avoid misleading supervision signals.

@SylvainArd
Copy link
Author

SylvainArd commented Nov 24, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 25, 2022

  1. joint_weights is used to control the loss weights of keypoints. You can set them according to your task.
  2. sigmas is obtained from the annotation and used for computing COCO AP/AR. More information can be found at COCO official website: https://cocodataset.org/#keypoints-eval. If sigmas are not available in your case, you can set them as arbitrary values and use metrics other than COCO AP/AR, e.g. NME
  3. The model selection should be depend on the requirement and constraint (model size, speed, accuracy requirement, ...). We provide model performance benchmark in our documentation for your reference: https://mmpose.readthedocs.io/en/dev-1.x/model_zoo_papers/algorithms.html
  4. In the dataset, an instance should be either valid (has at least one labeled keypoint) or invalid (no keypoint label, and marked as iscrowd==1)

@SylvainArd
Copy link
Author

SylvainArd commented Nov 25, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 25, 2022

If iscrowd==0 and num_keypoints==0 the mask will be generated by: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/datasets/base/kpt_2d_sview_rgb_img_bottom_up_dataset.py#L150-L155

If it crashes at this part, it's likely that your data format is different from the standard COCO format. We suggest users organize their data in COCO format to use MMPose's COCO dataset, or the user may have to implement a custom dataset class.

@SylvainArd
Copy link
Author

SylvainArd commented Nov 25, 2022 via email

@SylvainArd
Copy link
Author

SylvainArd commented Nov 25, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 25, 2022

In this case, no mask should be generated. You may need to either add num_keypoints in your annotation file which should be the number of visible keypoints of each object or modify this line to manually calculate visible keypoint number from obj['keypoints'].

@SylvainArd
Copy link
Author

SylvainArd commented Nov 25, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 25, 2022

As mentioned above, the mask is used in loss computation to ignore invalid regions in the image, where there are objects but no ground-truth keypoint labels. (Because of the absence of ground-truth keypoint labels, the loss will be incorrect and may cause performance degradation.)

If you have an object and also its keypoint label, the loss will be computed normally an no mask is needed.

@SylvainArd
Copy link
Author

SylvainArd commented Nov 25, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 25, 2022

The batch size can be set in the config file: https://mmpose.readthedocs.io/en/latest/tutorials/0_config.html

@SylvainArd
Copy link
Author

SylvainArd commented Nov 25, 2022 via email

@SylvainArd
Copy link
Author

SylvainArd commented Nov 25, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 25, 2022

@SylvainArd
Copy link
Author

SylvainArd commented Nov 25, 2022 via email

@ly015
Copy link
Member

ly015 commented Nov 25, 2022

The discussion has gone beyond the original scope of this issue. I would suggest opening separate issues each for a specific topic so other users can search for useful information when they encountered similar problems. This issue will be closed for now.

@ly015 ly015 closed this as completed Nov 25, 2022
@ly015 ly015 changed the title three questions Questions about the polygon annotation in the dataset Nov 25, 2022
@SylvainArd
Copy link
Author

SylvainArd commented Nov 25, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants