When analyze coco2017,this code ap is a little higher than original ap? #17

baolinhu · 2019-03-13T02:50:23Z

When analyze coco2017,this code ap is a little higher than original ap?

matteorr · 2019-03-13T04:52:55Z

Hi @baolinhu, could you please be more specific?
It would be useful if you could post the values that you're referring to so I can double check. Also, the datasets are slightly different so I wouldn't be too surprised if that were the case.

baolinhu · 2019-03-13T06:53:45Z

@matteorr Thanks for reply.I use this code to analyze coco2017val.Use the same prediction result JSON file.

This code output
Use other local evaluation code or use an online server

Yishun99 · 2019-03-15T07:17:41Z

Same question, 0.7 point on mAP.

matteorr · 2019-03-15T16:04:32Z

@DouYishun, thanks for adding info. I assume you mean 0.007?

Right now, the only results that seem affected are AP@IoU=0.5:0.95 | area=all (.718 instead of .711), and AP@IoU=0.75 | area=all (.796 instead of .789). Recall is not affected.

What is surprising is that the overall AP on medium and large instances is the same, but when using area=all there is a difference. This makes me think the problem might have to do with a possible difference in the annotation of small objects (which are not considered in the COCO Keypoints, but my eval code might be looking at how many small objects there are when computing precision).

I'm currently looking into it and will post updates as soon as possible.

Yishun99 · 2019-03-16T01:35:45Z

@matteorr Yes, I mean 0.007%.
Here is my results

It should be:

AP@OKS=0.5:0.95, AP@OKS=0.5 and AP@OKS=0.75 are affected.

matteorr · 2019-03-23T18:35:02Z

As I suspected in the previous post, the culprit is the different definition of area ranges during evaluation.

When wrapping the COCOeval.evaluate() function I pass the parameters from the COCOanalyze class:

self.cocoEval.params.areaRng    = self.params.areaRng
self.cocoEval.params.areaRngLbl = self.params.areaRngLbl
self.cocoEval.params.maxDets    = self.params.maxDets
self.cocoEval.params.iouThrs    = sorted(self.params.oksThrs)

These values are initialized in the Params class and my default values for the area ranges are different from the values that are defined in the original cocoeval repo.

Specifically, since COCO keypoints don't have small instances I believe that the all area range should not be defined to include annotations with less than 32**2 pixels. That's why I defined the all area range as [32 ** 2, 1e5 ** 2]. Conversely, in the coco repo they define the all area range for keypoints as exactly the same one used for bbox and segm, so [0 ** 2, 1e5 ** 2].

The definition discrepancy results in the fact that the number of ground truth instances counted is different for the two evaluations, resulting in different AP (higher for cocoanalyze since it considers less instances present), while obviously recall is not affected by that.

I think my solution makes more sense and reached out about it in the past, but they didn't change their code. You can easily choose the one you prefer though, by just changing the default param values in the Params class, or by overwriting them after you instantiated an object of class COCOanalyze and just accessing its params, i.e.:

coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints')
coco_analyze.params.areaRng = [[0 ** 2, 1e5 ** 2], [96 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2]]

After this change the results will match. I'm closing this issue, please put a thumbs up if you think its solved, or feel free to reopen if you have further comments.

baolinhu · 2019-03-26T12:56:23Z

@matteorr Thanks. I know your mean.But I still don't fully understand.Since COCO keypoints don't have small instances,the number of ground truth instances should be the same.Because the number of ground truth instances whose areas less than 32**2 pixels should be 0.So [0 ** 2, 1e5 ** 2] is equivalent to [32 ** 2, 1e5 ** 2].I think it will affect the number of FP(False Positive samples).

matteorr · 2019-03-26T22:12:10Z

@baolinhu - With the number of ground truth instances counted is different for the two evaluations I really meant the number of ground truth instances that are matched to a detection. Setting the area range to a different value will also determine which detections to ignore.

So in this particular case, detections smaller than 32**2 will be ignored by my evaluation code. To convince yourself, try removing all the small detections before loading them in the COCOanalyze class. I.e.:

new_team_split_dts = [d for d in team_split_dts if d['area']>32**2]
coco_gt = COCO( annFile )
coco_dt   = coco_gt.loadRes( new_team_split_dts )
coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints')
coco_analyze.evaluate(verbose=True, makeplots=False, savedir=saveDir, team_name=teamName)

You'll see that the results in this case are exactly the same regardless of the value coco_analyze.params.areaRng[0] being [0 ** 2, 1e5 ** 2] or [32 ** 2, 1e5 ** 2].

This makes sense to me. But if you still don't agree please post back, maybe I am missing your point.

baolinhu · 2019-03-27T02:23:22Z

Firstly, You solved my problem.Thanks.
Then I will state my point of view.new_team_split_dts = [d for d in team_split_dts if d['area']>32**2] this code will decrease False Positive samples, like when a detection result is positive but its area less than 32**2 which will be ignored. It may be a little unreasonable (you should stand in the perspective of not knowing the data set when evaluating allRng, you should not add this a priori information with an area larger than 32**2).
P(precision) = TP / (TP + FP) will higher. As recall = TP/(TP + FN) is not affected by that, because (TP + FN) is the total number of ground truth instances not changed.
So I think the problem is the number of positive detection results counted is different for the two evaluations. Does it should be ignored when evaluating allRng?

matteorr · 2019-03-27T21:09:48Z

@baolinhu - Glad the issue is resolved.

To follow up one last time on what might be the "best" evaluation strategy, my interpretation is that since we know that COCO does not have ground truth keypoints for instances with area smaller that 32**2 it is better to ignore detections that have area too small, as they most likely will not be good because of the lack of training data. I agree this strategy might penalize algorithms that are making good keypoint predictions for small instances.

An interesting approach could be to ignore all detections with an area smaller than the minimum area with which an IoU of 0.5 is possible if the detection is perfectly overlapped with a ground truth of size 32**2.

In conclusion, I think there is no definitive "right" or "wrong" way of doing it. As long as you are aware about what are the consequences with either approach, and compare all algorithms using the same technique it shouldn't matter too much.

baolinhu · 2019-03-28T01:38:38Z

Yeah，I agree with your conclusion.Thanks for your patience.

DanBmh · 2021-10-04T13:32:51Z

coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints')
coco_analyze.params.areaRng = [[0 ** 2, 1e5 ** 2], [96 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2]]
After this change the results will match. I'm closing this issue, please put a thumbs up if you think its solved, or feel free to reopen if you have further comments.

Worked for me, but medium and large results are switched. In comparison with the linked code above it should be:
coco_analyze.params.areaRng = [[0 ** 2, 1e5 ** 2], [32 ** 2, 96 ** 2], [96 ** 2, 1e5 ** 2]]

matteorr closed this as completed Mar 23, 2019

DanBmh pushed a commit to DanBmh/coco-analyze that referenced this issue Oct 4, 2021

Fix AP discrepancy, see matteorr#17.

cc0c255

DanBmh mentioned this issue Oct 5, 2021

Some Updates #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When analyze coco2017,this code ap is a little higher than original ap? #17

When analyze coco2017,this code ap is a little higher than original ap? #17

baolinhu commented Mar 13, 2019

matteorr commented Mar 13, 2019

baolinhu commented Mar 13, 2019

Yishun99 commented Mar 15, 2019

matteorr commented Mar 15, 2019 •

edited

Loading

Yishun99 commented Mar 16, 2019

matteorr commented Mar 23, 2019 •

edited

Loading

baolinhu commented Mar 26, 2019 •

edited

Loading

matteorr commented Mar 26, 2019

baolinhu commented Mar 27, 2019 •

edited

Loading

matteorr commented Mar 27, 2019

baolinhu commented Mar 28, 2019

DanBmh commented Oct 4, 2021

When analyze coco2017,this code ap is a little higher than original ap? #17

When analyze coco2017,this code ap is a little higher than original ap? #17

Comments

baolinhu commented Mar 13, 2019

matteorr commented Mar 13, 2019

baolinhu commented Mar 13, 2019

Yishun99 commented Mar 15, 2019

matteorr commented Mar 15, 2019 • edited Loading

Yishun99 commented Mar 16, 2019

matteorr commented Mar 23, 2019 • edited Loading

baolinhu commented Mar 26, 2019 • edited Loading

matteorr commented Mar 26, 2019

baolinhu commented Mar 27, 2019 • edited Loading

matteorr commented Mar 27, 2019

baolinhu commented Mar 28, 2019

DanBmh commented Oct 4, 2021

matteorr commented Mar 15, 2019 •

edited

Loading

matteorr commented Mar 23, 2019 •

edited

Loading

baolinhu commented Mar 26, 2019 •

edited

Loading

baolinhu commented Mar 27, 2019 •

edited

Loading