Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When analyze coco2017,this code ap is a little higher than original ap? #17

Closed
baolinhu opened this issue Mar 13, 2019 · 12 comments
Closed

Comments

@baolinhu
Copy link

When analyze coco2017,this code ap is a little higher than original ap?

@matteorr
Copy link
Owner

Hi @baolinhu, could you please be more specific?
It would be useful if you could post the values that you're referring to so I can double check. Also, the datasets are slightly different so I wouldn't be too surprised if that were the case.

@baolinhu
Copy link
Author

@matteorr Thanks for reply.I use this code to analyze coco2017val.Use the same prediction result JSON file.

  • This code output
    image
  • Use other local evaluation code or use an online server
    image

@Yishun99
Copy link

Same question, 0.7 point on mAP.

@matteorr
Copy link
Owner

matteorr commented Mar 15, 2019

@DouYishun, thanks for adding info. I assume you mean 0.007?

Right now, the only results that seem affected are AP@IoU=0.5:0.95 | area=all (.718 instead of .711), and AP@IoU=0.75 | area=all (.796 instead of .789). Recall is not affected.

What is surprising is that the overall AP on medium and large instances is the same, but when using area=all there is a difference. This makes me think the problem might have to do with a possible difference in the annotation of small objects (which are not considered in the COCO Keypoints, but my eval code might be looking at how many small objects there are when computing precision).

I'm currently looking into it and will post updates as soon as possible.

@Yishun99
Copy link

@matteorr Yes, I mean 0.007%.
Here is my results
Selection_946

It should be:
Selection_947

AP@OKS=0.5:0.95, AP@OKS=0.5 and AP@OKS=0.75 are affected.

@matteorr
Copy link
Owner

matteorr commented Mar 23, 2019

As I suspected in the previous post, the culprit is the different definition of area ranges during evaluation.

When wrapping the COCOeval.evaluate() function I pass the parameters from the COCOanalyze class:

self.cocoEval.params.areaRng    = self.params.areaRng
self.cocoEval.params.areaRngLbl = self.params.areaRngLbl
self.cocoEval.params.maxDets    = self.params.maxDets
self.cocoEval.params.iouThrs    = sorted(self.params.oksThrs)

These values are initialized in the Params class and my default values for the area ranges are different from the values that are defined in the original cocoeval repo.

Specifically, since COCO keypoints don't have small instances I believe that the all area range should not be defined to include annotations with less than 32**2 pixels. That's why I defined the all area range as [32 ** 2, 1e5 ** 2]. Conversely, in the coco repo they define the all area range for keypoints as exactly the same one used for bbox and segm, so [0 ** 2, 1e5 ** 2].

The definition discrepancy results in the fact that the number of ground truth instances counted is different for the two evaluations, resulting in different AP (higher for cocoanalyze since it considers less instances present), while obviously recall is not affected by that.

I think my solution makes more sense and reached out about it in the past, but they didn't change their code. You can easily choose the one you prefer though, by just changing the default param values in the Params class, or by overwriting them after you instantiated an object of class COCOanalyze and just accessing its params, i.e.:

coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints')
coco_analyze.params.areaRng = [[0 ** 2, 1e5 ** 2], [96 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2]]

After this change the results will match. I'm closing this issue, please put a thumbs up if you think its solved, or feel free to reopen if you have further comments.

@baolinhu
Copy link
Author

baolinhu commented Mar 26, 2019

@matteorr Thanks. I know your mean.But I still don't fully understand.Since COCO keypoints don't have small instances,the number of ground truth instances should be the same.Because the number of ground truth instances whose areas less than 32**2 pixels should be 0.So [0 ** 2, 1e5 ** 2] is equivalent to [32 ** 2, 1e5 ** 2].I think it will affect the number of FP(False Positive samples).

@matteorr
Copy link
Owner

@baolinhu - With the number of ground truth instances counted is different for the two evaluations I really meant the number of ground truth instances that are matched to a detection. Setting the area range to a different value will also determine which detections to ignore.

So in this particular case, detections smaller than 32**2 will be ignored by my evaluation code. To convince yourself, try removing all the small detections before loading them in the COCOanalyze class. I.e.:

new_team_split_dts = [d for d in team_split_dts if d['area']>32**2]
coco_gt = COCO( annFile )
coco_dt   = coco_gt.loadRes( new_team_split_dts )
coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints')
coco_analyze.evaluate(verbose=True, makeplots=False, savedir=saveDir, team_name=teamName)

You'll see that the results in this case are exactly the same regardless of the value coco_analyze.params.areaRng[0] being [0 ** 2, 1e5 ** 2] or [32 ** 2, 1e5 ** 2].

This makes sense to me. But if you still don't agree please post back, maybe I am missing your point.

@baolinhu
Copy link
Author

baolinhu commented Mar 27, 2019

  • Firstly, You solved my problem.Thanks.
  • Then I will state my point of view.new_team_split_dts = [d for d in team_split_dts if d['area']>32**2] this code will decrease False Positive samples, like when a detection result is positive but its area less than 32**2 which will be ignored. It may be a little unreasonable (you should stand in the perspective of not knowing the data set when evaluating allRng, you should not add this a priori information with an area larger than 32**2).
    P(precision) = TP / (TP + FP) will higher. As recall = TP/(TP + FN) is not affected by that, because (TP + FN) is the total number of ground truth instances not changed.
  • So I think the problem is the number of positive detection results counted is different for the two evaluations. Does it should be ignored when evaluating allRng?

@matteorr
Copy link
Owner

@baolinhu - Glad the issue is resolved.

To follow up one last time on what might be the "best" evaluation strategy, my interpretation is that since we know that COCO does not have ground truth keypoints for instances with area smaller that 32**2 it is better to ignore detections that have area too small, as they most likely will not be good because of the lack of training data. I agree this strategy might penalize algorithms that are making good keypoint predictions for small instances.

An interesting approach could be to ignore all detections with an area smaller than the minimum area with which an IoU of 0.5 is possible if the detection is perfectly overlapped with a ground truth of size 32**2.

In conclusion, I think there is no definitive "right" or "wrong" way of doing it. As long as you are aware about what are the consequences with either approach, and compare all algorithms using the same technique it shouldn't matter too much.

@baolinhu
Copy link
Author

Yeah,I agree with your conclusion.Thanks for your patience.

@DanBmh
Copy link

DanBmh commented Oct 4, 2021

coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints')
coco_analyze.params.areaRng = [[0 ** 2, 1e5 ** 2], [96 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2]]

After this change the results will match. I'm closing this issue, please put a thumbs up if you think its solved, or feel free to reopen if you have further comments.

Worked for me, but medium and large results are switched. In comparison with the linked code above it should be:
coco_analyze.params.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]

DanBmh pushed a commit to DanBmh/coco-analyze that referenced this issue Oct 4, 2021
@DanBmh DanBmh mentioned this issue Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants