Mapillary Vistas Dataset (MVD) Evaluation Scripts
This repository contains scripts to run evaluations on MVD predictions (e.g. on training and on validation data), and additionally form the basis of our evaluation server. Consequently, successfully running these scripts on machine-generated results helps to validate the predictor's output format required by the evaluation server.
The main entry point is
The following command line options are mandatory:
--configprovide location of the config file distributed with your downloaded copy of MVD.
--ground-truth-labelsprovide path to the instance-specific ground truth label files distributed with MVD.
--prediction-labelsprovide path to your predictions. The expected files are 8bit grayscale encoding in PNG format and pixel values are treated as indices for the label list in the config file.
-p assume that all generated files are located within the same, specified folder location.
The most important optional command line parameters are:
--instancesprovide path to the instance-specific prediction folder. For every ground truth file, the script looks for a file with the same name in the
--instancesfolder, but with
.txtextension. Every prediction for an image is stored in a separate line with format
<instance mask> <label id> <confidence>. The path to
<instance mask>is relative to the text file,
<label_id>is an integer and
<confidence>is a floating point number.
--jobsdefines the number of processes to run script in parallel. It has the same syntax as
make. Without this flag, only one process is used. If no number is provided, the number of available CPU cores is used.
--plotIf activated, the confusion matrix and (in case of instance-specific evaluation), the precision-recall plots for every class and different thresholds are plotted and saved to files.
--plot-extensionspecifies the file extension of the plots. Use
svgfor scalable versions and
pngfor rasterized plots.
--plot-dirpath to the plotted files. If not specified, the current directory is used.
For full command line usage, please run
python mapillary_vistas/tools/evaluation.py --help
Semantic image segmentation
The main metric used for semantic image segmentation is the mean intersection over union (IoU) score , also known as averaged Jaccard index. IoU is defined as TP / (TP + FP + FN), where TP, FP and FN are true positives, false positives and false negatives, respectively.
For categories where we have instance-specific annotations available, we also provide instance-specific intersection over union (iIoU)  scores, applying re-scaling of TP and FN during IoU calculation. The scaling factor is the ratio of the current object size to the average object size of the respective class. Consequently, this metric reduces the bias towards object classes with dominant segmentation sizes. This metric is not to be confused with the metric used below for instance-specific segmenation assessment, as it does not require information about individual object instances.
Instance-specific semantic image segmentation
For instance-specific semantic image segmentation we calculate average precision (AP) scores, following . We analyze each overlapping prediction per available ground truth instance. The prediction with the highest confidence for a given, minimal overlap is assigned to the ground truth instance. Unassigned ground truth instances count as false negatives (FN), unassigned predictions as false positives (FP), respectively. Based on these assignments we generate a precision-recall curve for each label. The average precision corresponds to the area under curve of the plot. Similar to [2,4], we calculate the main metric, AP, by using thresholds from 50% to 95% in 5% steps and finally average the individual APs.
 M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn and A. Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. In International Journal of Computer Vision (IJCV), 2010.
 M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
 B. Hariharan, P. Arbeláez, R. B. Girshick, and J. Malik. Simultaneous detection and segmentation. In European Conference on Computer Vision (ECCV), 2014.
 T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV), 2014.