# Using pre trained YOLO model as a landmark classifier  

As we mentioned in the test.ipynb notebook (in the data directory), the test set we use is full of out of domain images. We would like to discard as many of them as possible. In order to do that we will use an object detector and further processing to create our own landmark classifier. 

We will use YOLO (You Only Look Once) pre trained model as an object detctor. YOLO is a state of the art object detctor that achived great results on various data sets. We used [darknet](https://github.com/AlexeyAB/darknet) implemntation which allow to use pre-trained models easily. 

After cloning the darknet repo, we changed the Makefile in order it to use the GPU and preform faster (as described in the [readme]("https://github.com/AlexeyAB/darknet/blob/master/README.md") file). 

We chose to use the YOLOv3 that was pre trained on the [Open Images dataset]("https://storage.googleapis.com/openimages/web/index.html"). Open Images dataset is a big, diverse data set with ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives. We used it's detection part that contain 15,851,536 boxes on 600 categories. The bounding box annotation made by people and not by computers and therefore are higly accurate.  

Out of the 600 categories we chose 5 categroies that could indicate that the image contain a landmark: Tower, Fountain, Skyscraper, Building and Castle.  

Another 8 categories could inidicate that the image may contain a landmark Bronze sculpture, Sculpture, Lighthouse, House, Tree, Palm tree, Watercraft, Hiking equipment. 

All the other classes can indicate that it is not a landmark. 

Open Images dataset examples:

Landmarks: 

<img src="example_images/openimages_landmark_1.png" alt="Drawing" style="width: 400px;"/>
<img src="example_images/openimages_landmark_2.png" alt="Drawing" style="width: 400px;"/>

Maybe landmarks: 

<img src="example_images/openimages_maybe_landmark_1.png" alt="Drawing" style="width: 400px;"/>
<img src="example_images/openimages_maybe_landmark_2.png" alt="Drawing" style="width: 400px;"/>

Non-landmarks: 

<img src="example_images/openimages_non_landmark_1.png" alt="Drawing" style="width: 400px;"/>
<img src="example_images/openimages_non_landmark_2.png" alt="Drawing" style="width: 400px;"/>

We passed all the test set images inside the yolo-darknet implementation. The network produced a json file as a result. In this file each image is connected to its filename and the objects that detected in it. For each detected object there will be the corresponding class_id, name and the realtive coordinates, 

Some examples of the network result (on test set images) as images with bounding box: 

<img src="example_images/predictions1.jpg" alt="Drawing" style="width: 400px;"/>
<img src="example_images/predictions2.jpg" alt="Drawing" style="width: 400px;"/>
<img src="example_images/predictions3.jpg" alt="Drawing" style="width: 400px;"/>
<img src="example_images/predictions9.jpg" alt="Drawing" style="width: 400px;"/>

The results may not be perfect but they are good and the best that we can achive with that method. 

We will further process the results file in oreder to clean the test set as much as possible. 

In [2]:
# imports for code 
import json
import pandas as pd 
import re 

In [8]:
# load the results file from the yolo v3 object detector
f = open('result_yolov3_openimages.json')  
data = json.load(f) 

# pars the results file so every image will be connected to its filename and objects that detected in it.
# for each object that was detected there will be the corresponding class name, confidence level 
# and size (realtive to the image)
for i in range(len(data)): 
    file_name = re.search('[a-z & 0-9]{16}', data[i]['filename']).group(0)
    data[i]['filename'] = file_name
    if len(data[i]['objects']) != 0: 
        for j in range(len(data[i]['objects'])): 
            data[i]['objects'][j].pop('class_id', None)
            new_key1 = "class_name"
            old_key1 = "name"
            data[i]['objects'][j][new_key1] = data[i]['objects'][j].pop(old_key1)
            relative_size = data[i]['objects'][j]['relative_coordinates']['width'] * data[i]['objects'][j]['relative_coordinates']['height']
            data[i]['objects'][j].pop('relative_coordinates', None)
            data[i]['objects'][j]['realtive_size'] = relative_size
            new_key2 = "confidence_val"
            old_key2 = "confidence"
            data[i]['objects'][j][new_key2] = data[i]['objects'][j].pop(old_key2)

In [9]:
# check if the objects that detcted in the test set images can be landmarks  
landmark = ['Tower', 'Fountain', 'Skyscraper', 'Building', 'Castle']
maybe_landmark_1 = ['Bronze sculpture', 'Sculpture', 'Lighthouse', 'House']
maybe_landmark_2 = ['Tree', 'Palm tree', 'Watercraft', 'Hiking equipment']
keep = []
throw = []

for i in range(len(data)): 
    if len(data[i]['objects']) != 0: 
        for j in range(len(data[i]['objects'])): 
            # if any of the dtected objects can be defined as one of the landmark list we'll keep it   
            if data[i]['objects'][j]['class_name'] in landmark: 
                keep.append(data[i])
            # if any of the detected objects can be defined as one of the maybe_landmark lists we will check its 
            # confidence value and realtive size.
            # for classes in maybe_landmarks_1 list we'll use large threshold, because those classes are of big objects.
            # for classes in maybe_landmarks_2 list we'll use small threshold, because those classes are of small objects. 
            elif data[i]['objects'][j]['class_name'] in maybe_landmark_1:
                if data[i]['objects'][j]['confidence_val'] > 0.5 and data[i]['objects'][j]['realtive_size'] > 0.6:
                    keep.append(data[i])
            elif data[i]['objects'][j]['class_name'] in maybe_landmark_2:
                if data[i]['objects'][j]['confidence_val'] > 0.5 and data[i]['objects'][j]['realtive_size'] < 0.2:
                    keep.append(data[i])
            # if the objects detected cannot be defined as landmark or maybe_landmark we will check its confidence value 
            # and its realtive size (to make sure its the major part of the image) to make sure we want to throw it as 
            # this image is not of a landmark. 
            else: 
                if data[i]['objects'][j]['confidence_val'] > 0.3 and data[i]['objects'][j]['realtive_size'] > 0.6: 
                    throw.append(data[i])

In [30]:
# remove duplicates rows
keep_df_tmp = pd.DataFrame(keep)
keep_df = keep_df_tmp.drop_duplicates(subset=['frame_id'])
keep_df = keep_df.reset_index()
keep_df = keep_df.drop('index', axis=1)

throw_df_tmp = pd.DataFrame(throw)
throw_df = throw_df_tmp.drop_duplicates(subset=['frame_id'])
throw_df = throw_df.reset_index()
throw_df = throw_df.drop('index', axis=1)

# remove from "throw" rows that are also in "keep"
keep_series = keep_df['frame_id']
throw_df = throw_df[~throw_df["frame_id"].isin(keep_series)]
throw_df = throw_df.reset_index()
throw_df = throw_df.drop('index', axis=1)

In [4]:
test_df = pd.read_csv("C:/Users/Matan/Desktop/projectB/data/test/test.csv") 
test_df

Unnamed: 0,id,landmarks
0,e324e0f3e6d9e504,0
1,d9e17c5f3e0c47b3,0
2,1a748a755ed67512,0
3,537bf9bdfccdafea,0
4,13f4c974274ee08b,0
...,...,...
117222,e351c3e672c25fbd,190441
117223,5426472625271a4d,0
117224,7b6a585405978398,0
117225,d885235ba249cf5d,0


In [8]:
test_df['id'][0][0]

'e'

In [11]:
test_df.shape[0]

117227