## Jaccard Similarity

- Competition submissions are evaluated by the mean Intersection over Union (IoU, aka the Jaccard similarity) of the ground-truth entry matches and the predicted entry matches. The mean is taken sample-wise, meaning that an IoU score is calculated for each row in the submission file, and the final score is their average.

- The below is a demo of how this score will be calculated

In [5]:
import numpy as np
import pandas as pd
import string
import random


def id_generator(size = 16, chars = string.ascii_lowercase + string.digits):
    
    rndm_id = ''.join(random.choice(chars) for _ in range(size))
    return rndm_id

def jaccard_similarity(list1, list2):
    
    s1 = set(list1)
    s2 = set(list2)
    return float(len(s1.intersection(s2)) / len(s1.union(s2)))


# import sample submission file
sample_sub = pd.read_csv(r'C:\Users\caseyrya\Dropbox\foursquare_location_matching_data\sample_submission.csv')

# convert [matches] to list
sample_sub['matches_predicted'] = sample_sub['matches'].str.split(' ')

# generate sample "correct" list
ma = sample_sub['matches'].str.split(' ').tolist()
for i in range(0, len(sample_sub)):
    ma[i].append(id_generator())
sample_sub['matches_actual'] = ma

# generate sample scores
sample_sub['jaccard_score'] = [round(jaccard_similarity(sample_sub['matches_predicted'].tolist()[i], sample_sub['matches_actual'].tolist()[i]),3) for i in range(0, len(sample_sub))]

# final score
final_score = sample_sub.jaccard_score.mean()

print(f'Final submission score: {final_score}')
sample_sub

Final submission score: 0.5668


Unnamed: 0,id,matches,matches_predicted,matches_actual,jaccard_score
0,E_00001118ad0191,E_00001118ad0191,[E_00001118ad0191],"[E_00001118ad0191, ojjfsmg8uwzg466j]",0.5
1,E_000020eb6fed40,E_000020eb6fed40,[E_000020eb6fed40],"[E_000020eb6fed40, wure0g6tvqy8otoy]",0.5
2,E_00002f98667edf,E_00002f98667edf,[E_00002f98667edf],"[E_00002f98667edf, tpudn6o91bogjhkj]",0.5
3,E_001b6bad66eb98,E_001b6bad66eb98 E_0283d9f61e569d,"[E_001b6bad66eb98, E_0283d9f61e569d]","[E_001b6bad66eb98, E_0283d9f61e569d, 3goh6yoou...",0.667
4,E_0283d9f61e569d,E_0283d9f61e569d E_001b6bad66eb98,"[E_0283d9f61e569d, E_001b6bad66eb98]","[E_0283d9f61e569d, E_001b6bad66eb98, k87krybee...",0.667
