# Evaluation Prep

In an AI competition, participants submit their predictions on the test data, which are automatically evaluated. This notebook summarizes the steps that are required for making all the files used in the evaluation process. 

In [10]:
import os
import numpy as np
import random
import pandas as pd
import cv2

We make 3 different files:
* **answer.csv** - A csv file containing the building location mask for each image in the test dataset. The csv file has 4 columns:
    * 'img_id' - Name of the test image.
    * 'class' - Type of feature for which we find masks. In this case, the only class is 'building'.
    * 'prediction' - The location of the pixels corresponding the the given feature. The mask is in RLE format.
    * 'public' - Boolean value indicating whether the image is part of the public test set or private (hidden) test set.
* **sample_submission.csv** - An example csv file that shows how prediction files should be formatted.
    * It has all the same columns as the *answer.csv* file besides the 'public' column.
    * The rows are in the same order as in the *answer.csv* file.
* **evaluate.py** - Python script that takes the *answer.csv* file and prediction files rom participants and calculates the final score.

## answer.csv

#### RLE
We begin by loading the recordings.

In [5]:
testmask_path = '/workspace/Competition/map_segmentation/data/final/test/masks'
test_masks = os.listdir(testmask_path)

In [7]:
test_masks[0:10]

['ODJ0HKc0Vq.png',
 '6q78aBibSV.png',
 'iTrNBWkgm5.png',
 'eDizgkIT6N.png',
 'OKql19s74U.png',
 'slinDR5sma.png',
 '6n4FRv5pBF.png',
 'gdAZaWX3Mx.png',
 'RRo2o91Cyt.png',
 'aZVQc39P2l.png']

In [9]:
# Function to convert mask to RLE
def mask_to_coordinates(mask):
    flatten_mask = mask.flatten()
    if flatten_mask.max() == 0:
        return f'0 {len(flatten_mask)}'
    idx = np.where(flatten_mask!=0)[0]
    steps = idx[1:]-idx[:-1]
    new_coord = []
    step_idx = np.where(np.array(steps)!=1)[0]
    start = np.append(idx[0], idx[step_idx+1])
    end = np.append(idx[step_idx], idx[-1])
    length = end - start + 1
    for i in range(len(start)):
        new_coord.append(start[i])
        new_coord.append(length[i])
    new_coord_str = ' '.join(map(str, new_coord))
    return new_coord_str

We use the function above to convert our png image mask to a string containing RLE coordinates.

In [11]:
recid = []
rles = []

for imask in test_masks:
    maskpath = os.path.join(testmask_path, imask)
    mask = cv2.imread(maskpath, cv2.IMREAD_GRAYSCALE)
    rle = mask_to_coordinates(mask)
    recid.append(imask)
    rles.append(rle)

#### Public vs. Private

We now randomly select files to be either part of the public (30%) test set or private (70%) test set. 

In [12]:
isPublic = []
for i in range(len(recid)):
    samp = np.random.uniform(0,1)
    if samp < 0.3:
        isPublic.append(True)
    else:
        isPublic.append(False)

#### Classes

In [13]:
classes = list(np.repeat('building',len(recid)))

#### Combine lists to dataframe

In [14]:
answerdf = pd.DataFrame({'img_id':recid, 'class': classes, 'prediction':rles, 'public':isPublic})

In [15]:
answerdf.to_csv('answer.csv',index=False)

## sample_submission.csv

Making the sample_submission file is much simpler now that we already have the answer csv. We give a random mask as the predictions.

In [16]:
samples = []
for i in range(len(recid)):
    samples.append('1 1 3 1 6 3 13 7 23 1 27 2')

In [17]:
ssdf = pd.DataFrame({'img_id':recid, 'class':classes, 'prediction':samples})

In [18]:
ssdf.to_csv('sample_submission.csv', index=False)

## evaluate.py
This file is used to calculate the score of the participant predictions. In the case of this specific task, the performance metric is the mIoU.
* Calculates mIoU based on predictions.
* Both the public and private scores are calculated.