This page describes the DensePose evaluation metrics used by COCO. The evaluation code provided here can be used to obtain results on the publicly available COCO DensePose validation set. It computes multiple metrics described below. To obtain results on the COCO DensePose test set, for which ground-truth annotations are hidden, generated results must be uploaded to the evaluation server. The exact same evaluation code, described below, is used to evaluate results on the test set.
Please note the changes in the evaluation metric of the 2019 challenge compared to 2018 (see description below).
The multi-person DensePose task involves simultaneous person detection and estimation of correspondences between image pixels that belong to a human body and a template 3D model. DensePose evaluation mimics the evaluation metrics used for object detection and keypoint estimation in the COCO challenge, namely average precision (AP) and average recall (AR) and their variants.
At the heart of these metrics is a similarity measure between ground truth objects and predicted objects. In the case of object detection, Intersection over Union (IoU) serves as this similarity measure (for both boxes and segments). Thesholding the IoU defines matches between the ground truth and predicted objects and allows computing precision-recall curves. In the case of keypoint detection Object Keypoint Similarity (OKS) is used.
To adopt AP/AR for dense correspondence, an analogous similarity measure called Geodesic Point Similarity (GPS) has been introduced, which plays the same role as IoU for object detection and OKS for keypoint estimation.
Geodesic Point Similarity
The geodesic point similarity (GPS) is based on geodesic distances on the template mesh between the collected ground truth points and estimated surface coordinates for the same image points as follows:
where is the geodesic distance between estimated ( ) and groundtruth () human body surface points and is a per-part normalization factor, defined as the mean geodesic distance between points on the part. Please note that due to the new per-part normalization the AP numbers do not match those reported in the paper, which are obtained via fixed K = 0.255.
This formulation has a limitation that it is estimated on a set of predefined annotated points and therefore does not penalize spurious detections (false positives). As a result, the metric erroneously favors predictions with all pixels classified as foreground. To account for this, we introduce an additional multiplicative term corresponding to the intersection over union (IoU) between the ground truth and the predicted foreground masks to obtain an improved masked-GPS.
Masked Geodesic Point Similarity
The masked geodesic point similarity (GPSm) is calculated as
The following metrics are used to characterize the performance of a dense pose estimation algorithm on COCO:
AP % AP averaged over GPSm values 0.5 : 0.05 : 0.95 (primary challenge metric) AP-50 % AP at GPSm=0.5 (loose metric) AP-75 % AP at GPSm=0.75 (strict metric) AP-m % AP for medium detections: 32² < area < 96² AP-l % AP for large detections: area > 96²
Evaluation code is available on the DensePose github, see densepose_cocoeval.py. Before running the evaluation code, please prepare your results in the format described on the results format page. The geodesic distances are pre-computed on a subsampled version of the SMPL model to allow faster evaluation. Geodesic distances are computed after finding the closest vertices to the estimated UV values in the subsampled mesh.