# Evaluation

After cleaning our test set with object detection we extracted features from all the data set images (using ResNet18 pre trained model). We used the feture vectors of the data to preform classification using K Nearest Neighbor (K-NN) algorithm. 

In this notebook we'll process the results from the classification process. 

In [86]:
# imports for code 
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
from tqdm import tqdm 

In [195]:
# load the following csv files as dataframe 
url_test ='https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/data/test/test.csv'
url_test_more_classes1 = 'https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/data/test/more_classes/test_more_classes1.csv'
url_test_more_classes2 = 'https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/data/test/more_classes/test_more_classes2.csv'
url_test_more_classes3 = 'https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/data/test/more_classes/test_more_classes3.csv'
url_train = 'https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/data/train/train.csv' 
url_clean_v3 = 'https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/data/test/clean_test_v3.csv'
url_clean_v3_v4 = 'https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/data/test/clean_test_v3_v4.csv'
url_pred = 'https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/feature_extraction/results_csv/predicted_class_embedded_test.csv'
url_dist = 'https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/feature_extraction/results_csv/dist_embedded_test.csv' 
url_nn = 'https://raw.githubusercontent.com/matankleiner/Identify-Known-Sites-in-Photo-Album/master/feature_extraction/results_csv/nearest_neighbor_embedded_test.csv'

test_df = pd.read_csv(url_test) 
test_more_classes1_df = pd.read_csv(url_test_more_classes1)
test_more_classes2_df = pd.read_csv(url_test_more_classes2)
test_more_classes3_df = pd.read_csv(url_test_more_classes3)
train_df = pd.read_csv(url_train)
clean_v3_df = pd.read_csv(url_clean_v3)
clean_v3_v4_df = pd.read_csv(url_clean_v3_v4)
pred_df = pd.read_csv(url_pred)
dist_df = pd.read_csv(url_dist)
nn_df = pd.read_csv(url_nn)

In [196]:
def change_df(df): 
    """
    Changing the dataframe so it will be easier to work with. 
    Param: 
        df (pd.DataFrame): The dataframe to change 
    Return: 
        df (pd.DataFrame): The chnaged dataframe 
    """
    df = df.drop("Unnamed: 0", axis=1)
    df.insert(0, "id", test_df["id"], True) 
    return df 

pred_df = change_df(pred_df)
pred_df = pred_df.rename(columns={"0": "prediction"})
dist_df = change_df(dist_df)
nn_df = change_df(nn_df)

### Prediction 

In [201]:
# due to the way the test set is organized we split it to 4 different test sets  
# convert the type of the test_df["landmarks"] from str to np.int64 
for i in range(test_df.shape[0]): 
    np.int64(test_df["landmarks"][i])
    np.int64(test_more_classes1_df["landmarks"][i])
    np.int64(test_more_classes2_df["landmarks"][i])
    np.int64(test_more_classes3_df["landmarks"][i])
    
# check if any of the given prediction is correct (in each one of the test sets)
pred_series1 = test_df["landmarks"] == pred_df["prediction"]
pred_series2 = test_more_classes1_df["landmarks"] == pred_df["prediction"]
pred_series3 = test_more_classes2_df["landmarks"] == pred_df["prediction"]
pred_series4 = test_more_classes3_df["landmarks"] == pred_df["prediction"]

correct_pred = len(pred_series1[pred_series1]) + len(pred_series2[pred_series2]) + \
               len(pred_series3[pred_series3]) + len(pred_series4[pred_series4])

print ("There are {} correct prediction which is {:.2f}% accuracy out of all the landmarks in the test set."\
       .format(correct_pred, correct_pred / test_df[test_df.landmarks != 0].shape[0] * 100))
print("\nThe accuracy out of all the images in the test set is {:.2f}%".format(correct_pred / test_df.shape[0] * 100))

There are 275 correct prediction which is 16.95% accuracy out of all the landmarks in the test set.

The accuracy out of all the images in the test set is 0.23%


Using feature vectors and K-NN classification we managed to predict **275 landmarks correctly** which is **16.95% accuracy** out of all the landmarks in the test set.

However, the test set is mainly out of domain images, so if we calculate our accuracy out of all the images in the test set (i.e, the given test set and the one we cleaned using object detection) it'll be only **0.23%**.

We'll calculate the accuracy on the cleaned test set versions. One of them was cleaned using YOLOv3 object detctor and the other was cleaned using YOLOv3 and YOLOv4. 

In [210]:
# check for the correct prediction in the clean_v3 test set 
pred_clean_v3 = pred_df[pred_df["id"].isin(clean_v3_df["id"])]
pred_clean_v3_1 = pred_series1[pred_series1.index.isin(pred_clean_v3.index)]
pred_clean_v3_2 = pred_series2[pred_series2.index.isin(pred_clean_v3.index)]
pred_clean_v3_3 = pred_series3[pred_series3.index.isin(pred_clean_v3.index)]
pred_clean_v3_4 = pred_series4[pred_series4.index.isin(pred_clean_v3.index)]

correct_pred_v3 = len(pred_clean_v3_1[pred_clean_v3_1]) + len(pred_clean_v3_2[pred_clean_v3_2]) + \
                  len(pred_clean_v3_3[pred_clean_v3_3]) + len(pred_clean_v3_4[pred_clean_v3_4])   
                                            
print ("There are {} correct prediction which is {:.2f}% accuracy out of all the landmarks in the clean test set "
        "using YOLO v3.".format(correct_pred_v3, correct_pred_v3/clean_v3_df[clean_v3_df.landmarks != "0"].shape[0]*100))
print("\nThe accuracy out of all the images in this clean test set is {:.2f}%"\
      .format(correct_pred_v3 / clean_v3_df.shape[0] * 100))

There are 275 correct prediction which is 17.08% accuracy out of all the landmarks in the clean test set using YOLO v3.

The accuracy out of all the images in this clean test set is 0.30%


In [209]:
# check for the correct prediction in the clean_v3_v4 test set 
pred_clean_v3_v4 = pred_df[pred_df["id"].isin(clean_v3_v4_df["id"])]
pred_clean_v3_v4_1 = pred_series1[pred_series1.index.isin(pred_clean_v3_v4.index)]
pred_clean_v3_v4_2 = pred_series2[pred_series2.index.isin(pred_clean_v3_v4.index)]
pred_clean_v3_v4_3 = pred_series3[pred_series3.index.isin(pred_clean_v3_v4.index)]
pred_clean_v3_v4_4 = pred_series4[pred_series4.index.isin(pred_clean_v3_v4.index)]

correct_pred_v3_v4 = len(pred_clean_v3_v4_1[pred_clean_v3_v4_1]) + len(pred_clean_v3_v4_2[pred_clean_v3_v4_2]) +\
                     len(pred_clean_v3_v4_3[pred_clean_v3_v4_3]) + len(pred_clean_v3_v4_4[pred_clean_v3_v4_4])   

print ("There are {} correct prediction which is {:.2f}% accuracy out of all the landmarks in the clean test set "
        "using YOLO v3 and\nYOLO v4.".format(correct_pred_v3_v4,\
        correct_pred_v3_v4 / clean_v3_v4_df[clean_v3_v4_df.landmarks != "0"].shape[0] * 100))
print("\nThe accuracy out of all the images in this clean test set is {:.2f}%"\
      .format(correct_pred_v3_v4 / clean_v3_v4_df.shape[0] * 100))

There are 273 correct prediction which is 17.13% accuracy out of all the landmarks in the clean test set using YOLO v3 and
YOLO v4.

The accuracy out of all the images in this clean test set is 0.32%


As we can see, using the clean data set improve the accuracy, out of all the landmarks and out of all the images. However, the improvence is not very significant. 

The best results we recieved are on the clean test set using YOLO v3 and YOLO v4. In this test set we predicted correctly **273 landmarks** which is which is **17.13% accuracy** out of all the landmarks in this test set and **0.32% accuracy** out of all the images in this test set.

### Re-ranking

We'll do some re-rankning to the given prediction. We'll try to determine what images are out of domain images based on the nearest neighbor information. 

#### Nearest Neighbor 

We'll check the class of each neihbor and we'll predict an image as an out of domain if all its neighbors class are different from each other. 

Since out of domain images are not suppose to look like any of the landmarks we assume that they will have nearest neighbors from different classes. 

In [197]:
# the nn_df hold the index of the matching neighbor in the train set, wo would like to replace it with the matching class 
col_to_replace0 = train_df.loc[nn_df["0"]]["landmark_id"]
nn_df['0'] = col_to_replace0.values
col_to_replace1 = train_df.loc[nn_df["1"]]["landmark_id"]
nn_df['1'] = col_to_replace1.values
col_to_replace2 = train_df.loc[nn_df["2"]]["landmark_id"]
nn_df['2'] = col_to_replace2.values
col_to_replace3 = train_df.loc[nn_df["3"]]["landmark_id"]
nn_df['3'] = col_to_replace3.values
col_to_replace4 = train_df.loc[nn_df["4"]]["landmark_id"]
nn_df['4'] = col_to_replace4.values

# now, each column k [k is in (0,1,2,3,4)] contain the class of the (k+1) nearest neighbor
nn_df

Unnamed: 0,id,0,1,2,3,4
0,e324e0f3e6d9e504,42422,79959,138982,93154,147263
1,d9e17c5f3e0c47b3,14968,41941,95885,117418,38746
2,1a748a755ed67512,5156,164193,164193,67109,84309
3,537bf9bdfccdafea,48328,69301,136675,158991,136675
4,13f4c974274ee08b,136675,202793,25369,187755,188686
...,...,...,...,...,...,...
117222,e351c3e672c25fbd,47663,190441,23777,23777,56062
117223,5426472625271a4d,54785,54785,54785,54785,113750
117224,7b6a585405978398,171111,112512,200128,21500,142109
117225,d885235ba249cf5d,162403,162403,162403,115930,136675


In [240]:
pd.options.mode.chained_assignment = None # disable a flase warning 
# create a copy of pred_df for re-ranking based on different classes for all neighbors
pred_df_re_ranking = pred_df.copy(deep=True) 

# create a boolean series that indicate if the class in one column is equal to the class in another column for all 
# columns in the nn_df (0,1,2,3,4) 
equal01 = nn_df["0"] == nn_df["1"]
equal02 = nn_df["0"] == nn_df["2"]
equal03 = nn_df["0"] == nn_df["3"]
equal04 = nn_df["0"] == nn_df["4"]
equal12 = nn_df["1"] == nn_df["2"]
equal13 = nn_df["1"] == nn_df["3"]
equal14 = nn_df["1"] == nn_df["4"]
equal23 = nn_df["2"] == nn_df["3"]
equal24 = nn_df["2"] == nn_df["4"]
equal34 = nn_df["3"] == nn_df["4"]

# if all the classes for an image are different from each other, predict it as an out of domain image with class "0". 
for i in range(pred_df.shape[0]): 
    if (equal01[i] or equal02[i] or equal03[i] or equal04[i] or equal12[i] or equal13[i] or equal14[i]\
         or equal23[i] or equal24[i] or equal34[i]):
        continue 
    else: 
        pred_df_re_ranking['prediction'][i] = 0

In [241]:
# in the above process we might lost correct prediction of landmarks, we would like to check it: 
pred_series_re_reanking = test_df["landmarks"] == pred_df_re_ranking["prediction"]
cntr = 0
for i in range(pred_series1.shape[0]):
    if (pred_series1[i] and pred_series_re_reanking[i]): 
        cntr = cntr + 1
print("In the re-ranking process we lost {} correct prediction.".format(len(pred_series1[pred_series1]) - cntr))

# check for the total correct predcition (in each one of the test sets)
correct_pred_rr = len(pred_series_re_reanking[pred_series_re_reanking]) + len(pred_series2[pred_series2]) + \
                  len(pred_series3[pred_series3]) + len(pred_series4[pred_series4])
landmark_pred = cntr + len(pred_series2[pred_series2]) + len(pred_series3[pred_series3]) + \
                len(pred_series4[pred_series4]) 

print ("\nThere are {} correct landmark prediction which is {:.2f}% accuracy out of all the landmarks in the test set."\
       .format(landmark_pred, landmark_pred / test_df[test_df.landmarks != 0].shape[0] * 100))
print("\nHowever, now we predicted correctly {} images the accuracy out of all the images (landmarks and out of domain)"
      " in the    test set is {:.2f}%".format(correct_pred_rr, correct_pred_rr / test_df.shape[0] * 100))

In the re-ranking process we lost 39 correct prediction.

There are 236 correct landmark prediction which is 14.55% accuracy out of all the landmarks in the test set.

However, now we predicted correctly 63769 images the accuracy out of all the images (landmarks and out of domain) in the    test set is 54.40%


As we can see, due to the re-ranking process we predicted correctly only **236 landmarks**, which is **14.55% accuracy** out of all the landmark. However we know predicted **63679 images** correctly, which is **54.4% accuracy** out of all the test set. 

We'll check now our accuracy on the clean_v3_v4 test set:

In [269]:
# check for the correct prediction in the clean_v3_v4 test set 
pred_clean_v3_v4_re_ranking = pred_series_re_reanking[pred_series_re_reanking.index.isin(pred_clean_v3_v4.index)]

correct_pred_v3_v4_re_ranking = len(pred_clean_v3_v4_re_ranking[pred_clean_v3_v4_re_ranking]) + \
                                len(pred_clean_v3_v4_2[pred_clean_v3_v4_2]) + \
                                len(pred_clean_v3_v4_3[pred_clean_v3_v4_3]) + \
                                len(pred_clean_v3_v4_4[pred_clean_v3_v4_4])   

print ("There are {} correct prediction which is {:.2f}% accuracy out of all the images in the clean test set "
        "using YOLO v3 and\nYOLO v4.".format(correct_pred_v3_v4_re_ranking,\
        correct_pred_v3_v4_re_ranking / clean_v3_v4_df.shape[0] * 100))

There are 51996 correct prediction which is 60.54% accuracy out of all the images in the clean test set using YOLO v3 and
YOLO v4.


On the clean test set we predicted correctly **51996 images** (some of the out of domain images were already discarded from this test set using object detection) which is **60.54%** accuracy. 

#### Distance

We'll check the average distance from each test set image to its 5 nearest neighbor. If the distance is greater than some value we'll predict this image as an out of domain image. We'll check it for the values 17, 18, 19, 20. 

Since out of domain images are not suppose to look like any of the landmarks we assume that the average distance between them to their nearest neighbors will be big.  

In [292]:
dist_df

Unnamed: 0,id,0,1,2,3,4
0,e324e0f3e6d9e504,17.308729,18.465883,18.467186,18.548554,18.653781
1,d9e17c5f3e0c47b3,15.190139,15.275702,15.514922,15.866928,15.901018
2,1a748a755ed67512,21.093424,21.181752,21.360185,21.501729,21.678210
3,537bf9bdfccdafea,17.647642,17.882982,17.921077,17.940326,18.001732
4,13f4c974274ee08b,15.005162,15.474619,15.555574,15.756343,15.842871
...,...,...,...,...,...,...
117222,e351c3e672c25fbd,15.190551,15.214958,15.485120,15.553071,15.700142
117223,5426472625271a4d,15.169560,16.422494,17.382131,17.822677,17.833657
117224,7b6a585405978398,11.895901,11.969579,12.129470,12.282135,12.533825
117225,d885235ba249cf5d,16.592096,16.824498,17.247619,17.345342,17.474634


In [280]:
avg_dist_gt_17 = (dist_df["0"] + dist_df["1"] + dist_df["2"] + dist_df["3"] + dist_df["4"]) / 5 > 17
avg_dist_gt_18 = (dist_df["0"] + dist_df["1"] + dist_df["2"] + dist_df["3"] + dist_df["4"]) / 5 > 18
avg_dist_gt_19 = (dist_df["0"] + dist_df["1"] + dist_df["2"] + dist_df["3"] + dist_df["4"]) / 5 > 19
avg_dist_gt_20 = (dist_df["0"] + dist_df["1"] + dist_df["2"] + dist_df["3"] + dist_df["4"]) / 5 > 20

pd.options.mode.chained_assignment = None # disable a flase warning 
# create a copy of pred_df_re_ranking for further re_ranking based on avg dist from neighbors
pred_df_rr_dist_17 = pred_df_re_ranking.copy(deep=True) 
pred_df_rr_dist_18 = pred_df_re_ranking.copy(deep=True) 
pred_df_rr_dist_19 = pred_df_re_ranking.copy(deep=True) 
pred_df_rr_dist_20 = pred_df_re_ranking.copy(deep=True) 

# if all the classes for an image are different from each other, predict it as an out of domain image with class "0". 
for i in range(pred_df.shape[0]): 
    if avg_dist_gt_17[i]:
        pred_df_rr_dist_17['prediction'][i] = 0
    if avg_dist_gt_18[i]:
        pred_df_rr_dist_18['prediction'][i] = 0
    if avg_dist_gt_19[i]:
        pred_df_rr_dist_19['prediction'][i] = 0
    if avg_dist_gt_20[i]:
        pred_df_rr_dist_20['prediction'][i] = 0  

In [285]:
# in the above process we might lost correct prediction of landmarks, we would like to check it: 
pred_series_rr_dist_17 = test_df["landmarks"] == pred_df_rr_dist_17["prediction"]
pred_series_rr_dist_18 = test_df["landmarks"] == pred_df_rr_dist_18["prediction"]
pred_series_rr_dist_19 = test_df["landmarks"] == pred_df_rr_dist_19["prediction"]
pred_series_rr_dist_20 = test_df["landmarks"] == pred_df_rr_dist_20["prediction"]

cntr_dist_17 = 0
cntr_dist_18 = 0
cntr_dist_19 = 0
cntr_dist_20 = 0
for i in range(pred_series1.shape[0]):
    if (pred_series1[i] and pred_series_rr_dist_17[i]): 
        cntr_dist_17 = cntr_dist_17 + 1
    if (pred_series1[i] and pred_series_rr_dist_18[i]): 
        cntr_dist_18 = cntr_dist_18 + 1
    if (pred_series1[i] and pred_series_rr_dist_19[i]): 
        cntr_dist_19 = cntr_dist_19 + 1
    if (pred_series1[i] and pred_series_rr_dist_20[i]): 
        cntr_dist_20 = cntr_dist_20 + 1
print("In the re-ranking process we lost {}, {}, {}, {} correct prediction for average neighbor distance"
      " that is greater than 17, 18, 19, 20 accordingly."\
      .format(len(pred_series1[pred_series1]) - cntr_dist_17, len(pred_series1[pred_series1]) - cntr_dist_18,\
              len(pred_series1[pred_series1]) - cntr_dist_19, len(pred_series1[pred_series1]) - cntr_dist_20))

# check for the total correct predcition (in each one of the test sets)
correct_pred_dist_17 = len(pred_series_rr_dist_17[pred_series_rr_dist_17]) + len(pred_series2[pred_series2]) + \
                    len(pred_series3[pred_series3].index) + len(pred_series4[pred_series4])
landmark_pred_dist_17 = cntr_dist_17 + len(pred_series2[pred_series2]) + len(pred_series3[pred_series3]) + \
                     len(pred_series4[pred_series4]) 
correct_pred_dist_18 = len(pred_series_rr_dist_18[pred_series_rr_dist_18]) + len(pred_series2[pred_series2]) + \
                    len(pred_series3[pred_series3].index) + len(pred_series4[pred_series4])
landmark_pred_dist_18 = cntr_dist_18 + len(pred_series2[pred_series2]) + len(pred_series3[pred_series3]) + \
                     len(pred_series4[pred_series4])
correct_pred_dist_19 = len(pred_series_rr_dist_19[pred_series_rr_dist_19]) + len(pred_series2[pred_series2]) + \
                    len(pred_series3[pred_series3].index) + len(pred_series4[pred_series4])
landmark_pred_dist_19 = cntr_dist_19 + len(pred_series2[pred_series2]) + len(pred_series3[pred_series3]) + \
                     len(pred_series4[pred_series4])
correct_pred_dist_20 = len(pred_series_rr_dist_20[pred_series_rr_dist_20]) + len(pred_series2[pred_series2]) + \
                    len(pred_series3[pred_series3].index) + len(pred_series4[pred_series4])
landmark_pred_dist_20 = cntr_dist_20 + len(pred_series2[pred_series2]) + len(pred_series3[pred_series3]) + \
                     len(pred_series4[pred_series4])

print("\nThere are {}, {}, {}, {} correct landmark which is {:.2f}%, {:.2f}%, {:.2f}%, {:.2f}%"
      " accuracy out of all the landmarks in  the test set for average neighbor distance that is greater than"
      " 17, 18, 19, 20 accordingly."\
     .format(landmark_pred_dist_17, landmark_pred_dist_18, landmark_pred_dist_19, landmark_pred_dist_20,
             landmark_pred_dist_17 / test_df[test_df.landmarks != 0].shape[0] * 100,\
             landmark_pred_dist_18 / test_df[test_df.landmarks != 0].shape[0] * 100,\
             landmark_pred_dist_19 / test_df[test_df.landmarks != 0].shape[0] * 100,\
             landmark_pred_dist_20 / test_df[test_df.landmarks != 0].shape[0] * 100))

print("\nHowever, now we predicted correctly {}, {}, {}, {} images and the accuracy out of all the images (landmarks and out of"
      " domain) in the test is {:.2f}%, {:.2f}%, {:.2f}%, {:.2f}% for average neighbor distance that is greater than"
      " 17, 18, 19, 20 accordingly."\
      .format(correct_pred_dist_17, correct_pred_dist_18, correct_pred_dist_19, correct_pred_dist_20, 
              correct_pred_dist_17 / test_df.shape[0] * 100, correct_pred_dist_18 / test_df.shape[0] * 100, 
             correct_pred_dist_19 / test_df.shape[0] * 100, correct_pred_dist_20 / test_df.shape[0] * 100))

In the re-ranking process we lost 50, 46, 41, 39 correct prediction for average neighbor distance that is greater than 17, 18, 19, 20 accordingly.

There are 225, 229, 234, 236 correct landmark which is 13.87%, 14.12%, 14.43%, 14.55% accuracy out of all the landmarks in  the test set for average neighbor distance that is greater than 17, 18, 19, 20 accordingly.

However, now we predicted correctly 83914, 76636, 71037, 67408 images and the accuracy out of all the images (landmarks and out of domain) in the test is 71.58%, 65.37%, 60.60%, 57.50% for average neighbor distance that is greater than 17, 18, 19, 20 accordingly.


In [291]:
# check for the correct prediction in the clean_v3_v4 test set 
pred_clean_v3_v4_rr_dist_17 = pred_series_rr_dist_17[pred_series_rr_dist_17.index.isin(pred_clean_v3_v4.index)]
pred_clean_v3_v4_rr_dist_18 = pred_series_rr_dist_18[pred_series_rr_dist_18.index.isin(pred_clean_v3_v4.index)]
pred_clean_v3_v4_rr_dist_19 = pred_series_rr_dist_19[pred_series_rr_dist_19.index.isin(pred_clean_v3_v4.index)]
pred_clean_v3_v4_rr_dist_20 = pred_series_rr_dist_20[pred_series_rr_dist_20.index.isin(pred_clean_v3_v4.index)]


correct_pred_v3_v4_rr_dist_17 = len(pred_clean_v3_v4_rr_dist_17[pred_clean_v3_v4_rr_dist_17]) + \
                                len(pred_clean_v3_v4_2[pred_clean_v3_v4_2]) + \
                                len(pred_clean_v3_v4_3[pred_clean_v3_v4_3]) + \
                                len(pred_clean_v3_v4_4[pred_clean_v3_v4_4])
correct_pred_v3_v4_rr_dist_18 = len(pred_clean_v3_v4_rr_dist_18[pred_clean_v3_v4_rr_dist_18]) + \
                                len(pred_clean_v3_v4_2[pred_clean_v3_v4_2]) + \
                                len(pred_clean_v3_v4_3[pred_clean_v3_v4_3]) + \
                                len(pred_clean_v3_v4_4[pred_clean_v3_v4_4])   
correct_pred_v3_v4_rr_dist_19 = len(pred_clean_v3_v4_rr_dist_19[pred_clean_v3_v4_rr_dist_19]) + \
                                len(pred_clean_v3_v4_2[pred_clean_v3_v4_2]) + \
                                len(pred_clean_v3_v4_3[pred_clean_v3_v4_3]) + \
                                len(pred_clean_v3_v4_4[pred_clean_v3_v4_4])   
correct_pred_v3_v4_rr_dist_20 = len(pred_clean_v3_v4_rr_dist_20[pred_clean_v3_v4_rr_dist_20]) + \
                                len(pred_clean_v3_v4_2[pred_clean_v3_v4_2]) + \
                                len(pred_clean_v3_v4_3[pred_clean_v3_v4_3]) + \
                                len(pred_clean_v3_v4_4[pred_clean_v3_v4_4])   

print ("There are {}, {}, {}, {} correct prediction which is {:.2f}%, {:.2f}%, {:.2f}%, {:.2f}% accuracy out of"
       " all the     images in the clean test set using YOLO v3 and YOLO v4 for average neighbor"
       " distance that is greater than 17, 18, 19, 20    accordingly."\
       .format(correct_pred_v3_v4_rr_dist_17, correct_pred_v3_v4_rr_dist_18, correct_pred_v3_v4_rr_dist_19,\
               correct_pred_v3_v4_rr_dist_20, correct_pred_v3_v4_rr_dist_17 / clean_v3_v4_df.shape[0] * 100, \
               correct_pred_v3_v4_rr_dist_18 / clean_v3_v4_df.shape[0] * 100,\
               correct_pred_v3_v4_rr_dist_19 / clean_v3_v4_df.shape[0] * 100,\
               correct_pred_v3_v4_rr_dist_20 / clean_v3_v4_df.shape[0] * 100,))

There are 64696, 60120, 56512, 54222 correct prediction which is 75.33%, 70.00%, 65.80%, 63.13% accuracy out of all the     images in the clean test set using YOLO v3 and YOLO v4 for average neighbor distance that is greater than 17, 18, 19, 20    accordingly.


### Summary

| ~ | Landmarks | Landmarks Acc (%) | Total Images | Total Images Acc (%) |
| :---: | :---: | :---: | :---: | :---: |  
| Test Set | 275 | 16.95  | 275 | 0.23 |
| Clean Test Set YOLO v3 | 275 | 17.08  | 275 | 0.3 |
| Clean Test Set YOLO v3 and v4 | 273 | 17.13  | 273 | 0.32 |
| Test Set Reranking with NN | 236 | 14.55 | 63769 | 54.4 |
| Clean Test Set YOLO v3 and v4 Reranking with NN | - | - | 51996 | 60.54 |
| Test Set Reranking with NN and Dist (17) | 225 | 13.87 | 83914 | 71.58 |
| Clean Test Set YOLO v3 and v4 Reranking with NN and Dist (17) | - | - | 64696 | 75.33 |
| Test Set Reranking with NN and Dist (18) | 229 | 14.12 | 76636 | 65.37 |
| Clean Test Set YOLO v3 and v4 Reranking with NN and Dist (18) | - | -  | 60120 | 70 |
| Test Set Reranking with NN and Dist (19) | 234 | 14.43 | 67408 | 60.6 |
| Clean Test Set YOLO v3 and v4 Reranking with NN and Dist (19) | - | - | 56512 | 65.8 |
| Test Set Reranking with NN and Dist (20) | 236 | 14.55| 71037 | 57.5 |
| Clean Test Set YOLO v3 and v4 Reranking with NN and Dist (20) | - | - | 54222 | 63.13 |
