# Evaluation Prep

In an AI competition, participants submit their predictions on the test data, which are automatically evaluated. This notebook summarizes the steps that are required for making all the files used in the evaluation process.

In [2]:
import os
import numpy as np
import random
import pandas as pd

We make 3 different files:

* answer.csv - A csv file containing the building location mask for each image in the test dataset. The csv file has 4 columns:
    * 'rec_id' - Name of the test .npy file.
    * 'prediction' - The sleep stage (W,N1,N2,N3,R).
    * 'public' - Boolean value indicating whether the image is part of the public test set or private (hidden) test set.
* sample_submission.csv - An example csv file that shows how prediction files should be formatted.
    * It has all the same columns as the answer.csv file besides the 'public' column.
    * The rows are in the same order as in the answer.csv file.
* evaluate.py - Python script that takes the answer.csv file and prediction files rom participants and calculates the final score.

## answer.csv

We begin by loading the 'keydf.csv' file we made before.

In [3]:
keydf = pd.read_csv('/workspace/Competition/PSG/data/final/keydf.csv')
keydf.head()

Unnamed: 0,file_id,encoded_id,train
0,10_22339_W_0.npy,IsCItLdHYS.npy,True
1,10_22339_W_1.npy,AROuZa34WB.npy,True
2,10_22339_W_2.npy,ODJ0HKc0Vq.npy,False
3,10_22339_W_3.npy,X44xsz9e7A.npy,True
4,10_22339_W_4.npy,kadoREAeRh.npy,True


The answer file has 3 columns:
* 'rec_id' - The encoded id of the test data.
* 'stage' - The sleep stage of the corresponding .npy file, which can be extracted from the 'file_id'.
* 'public' - A boolean indicating whether or not the particular recording is part of the public test set. 30% of the test set should be public.

In [5]:
# Get just the test data
testdf = keydf[keydf.train==False].reset_index(drop=True)
# Extract stages
file_ids = list(testdf.file_id)
stages = [x.split('_')[2] for x in file_ids]
testdf['stage'] = stages
# Make and add public column
publics = []
for i in range(len(testdf)):
    s = np.random.uniform(0,1)
    if s < 0.3:
        publics.append(True)
    else:
        publics.append(False)
testdf['public'] = publics
# Drop unused columns
answerdf = testdf.drop(['file_id','train'], axis=1)
# Change column name
answerdf = answerdf.rename(columns={'encoded_id':'rec_id'})

In [6]:
answerdf.head()

Unnamed: 0,rec_id,stage,public
0,ODJ0HKc0Vq.npy,W,True
1,6q78aBibSV.npy,W,False
2,iTrNBWkgm5.npy,W,False
3,eDizgkIT6N.npy,W,False
4,OKql19s74U.npy,W,False


Finally, we save the file as 'answer.csv'.

In [7]:
eval_path = '/workspace/Competition/PSG/evaluate'
ans_path = os.path.join(eval_path, 'answer.csv')
answerdf.to_csv(ans_path, index=False)

## sample_submission.csv

The sample_submission.csv, as the name suggests, is a sample of what participant prediction submissions should look like. It must be in the same order as the 'answer.csv' file. It has 2 columns:
* 'rec_id' - The encoded id of the test data.
* 'stage' - The predicted stage. For the sample submission, we set all the stages to 'W'.

In [8]:
ss = answerdf.copy()
ss_stages = list(np.repeat('W',len(answerdf)))
ss['stage'] = ss_stages
ss = ss.drop(['public'],axis=1)

ss.head()

Unnamed: 0,rec_id,stage
0,ODJ0HKc0Vq.npy,W
1,6q78aBibSV.npy,W
2,iTrNBWkgm5.npy,W
3,eDizgkIT6N.npy,W
4,OKql19s74U.npy,W


In [9]:
sspath = os.path.join(eval_path, 'sample_submission.csv')
ss.to_csv(sspath, index=False)

## evaluate.py

* This script receives takes the 'answer.csv' file and the participant predictions (usually titled prediction.csv) and calculates the score. 
* For this competition task, the metric is the macro F1 score. 
* It calculates 3 scores: public, private, final.
    * The public score, which is calculated using the public test data (30% of all test data) is shown on the leaderboard before the competition closes.
    * The final score, which is calculated using all of the test data is shown on the leaderboard once the competition is over.
* The script can be found [here]().