## References:
* https://github.com/tensorflow/hub/blob/master/examples/colab/object_detection.ipynb
* https://towardsdatascience.com/non-maximum-suppression-nms-93ce178e177c
* https://github.com/justcallmewilliam/iccv19-silco/tree/master/cl_utils/mAP_lib
* https://www.kaggle.com/vikramtiwari/baseline-predictions-using-inception-resnet-v2

# Object Detection Using TF HUB

# Content in this kernel
1. Faster R-CNN
2. Intersection Over Union (IOU)
3. Precision
4. Recall
5. Non-maximum Suppression (NMS)

# Faster R-CNN
It first extract feature maps from the input image using ConvNet and then pass those maps through a RPN which returns object proposals. Finally, these maps are classified and the bounding boxes are predicted.


![](http://cdn.analyticsvidhya.com/wp-content/uploads/2018/10/Faster-rcnn.png)

## Steps followed by a Faster R-CNN algorithm to detect objects in an image:
1. Take an input image and pass it to the ConvNet which returns feature maps for the image.
2. Apply Region Proposal Network (RPN) on these feature maps and get object proposals.
3. Apply ROI pooling layer to bring down all the proposals to the same size.
4. Finally, pass these proposals to a fully connected layer in order to classify any predict the bounding boxes for the image.

# Intersection Over Union (IOU)

Intersection Over Union (IOU) is measure based on Jaccard Index that evaluates the overlap between two bounding boxes. It requires a ground truth bounding box and a predicted bounding box . By applying the IOU we can tell if a detection is valid (True Positive) or not (False Positive).

IOU is given by the overlapping area between the predicted bounding box and the ground truth bounding box divided by the area of union between them:
![](http://camo.githubusercontent.com/70d881e53ef692bc1c7c1cb3265d7b30a8818701/687474703a2f2f6c617465782e636f6465636f67732e636f6d2f6769662e6c617465783f25354374657874253742494f552537442532302533442532302535436672616325374225354374657874253742617265612537442532302535436c656674253238425f70253230253543636170253230425f2537426774253744253543726967687425323925374425374225354374657874253742617265612537442532302535436c656674253238425f70253230253543637570253230425f25374267742537442535437269676874253239253744)

The image below illustrates the IOU between a ground truth bounding box (in green) and a detected bounding box (in red).
![](http://raw.githubusercontent.com/rafaelpadilla/Object-Detection-Metrics/master/aux_images/iou.png)

# True Positive, False Positive, False Negative and True Negative

Some basic concepts used by the metrics:
1. **True Positive (TP)**: A correct detection. Detection with IOU ≥ threshold
2. **False Positive (FP)**: A wrong detection. Detection with IOU < threshold
3. **False Negative (FN)**: A ground truth not detected
4. **True Negative (TN)**: Does not apply. It would represent a corrected misdetection. In the object detection task there are many possible   bounding boxes that should not be detected within an image. Thus, TN would be all possible bounding boxes that were corrrectly not detected (so many possible boxes within an image). That's why it is not used by the metrics.

threshold: depending on the metric, it is usually set to 50%, 75% or 95%.

# Precision
Precision is the ability of a model to identify only the relevant objects. It is the percentage of correct positive predictions and is given by:


![](http://camo.githubusercontent.com/b1b6fdbeef01e93c1369e9d3e28fd7932e322852/687474703a2f2f6c617465782e636f6465636f67732e636f6d2f6769662e6c617465783f507265636973696f6e25323025334425323025354366726163253742545025374425374254502b465025374425334425354366726163253742545025374425374225354374657874253742616c6c253230646574656374696f6e73253744253744)

# Recall
Recall is the ability of a model to find all the relevant cases (all ground truth bounding boxes). It is the percentage of true positive detected among all relevant ground truths and is given by:

![](http://camo.githubusercontent.com/3e4ced65f38c8177e5fed382ba409f357ecab0b6/687474703a2f2f6c617465782e636f6465636f67732e636f6d2f6769662e6c617465783f526563616c6c25323025334425323025354366726163253742545025374425374254502b464e25374425334425354366726163253742545025374425374225354374657874253742616c6c25323067726f756e64253230747275746873253744253744)

 # Non-Maximum Suppression (NMS):
  Non-Maximum Suppression (NMS), a post-processing algorithm responsible for merging all detections that belong to the same object.
  
  ## Input:
   A list of Proposal boxes B, corresponding confidence scores S and overlap threshold N.
  ## Output:
  A list of filtered proposals D.
  ## Algorithm:
  
1. Select the proposal with highest confidence score, remove it from B and add it to the final proposal list D. (Initially D is empty).
2. Now compare this proposal with all the proposals — calculate the IOU (Intersection over Union) of this proposal with every other proposal. If the IOU is greater than the threshold N, remove that proposal from B.
3. Again take the proposal with the highest confidence from the remaining proposals in B and remove it from B and add it to D.
4. Once again calculate the IOU of this proposal with all the proposals in B and eliminate the boxes which have high IOU than threshold.
5. This process is repeated until there are no more proposals left in B.

![](http://miro.medium.com/max/1400/1*6d_D0ySg-kOvfrzIRwHIiA.png)


In [None]:
import os
import pandas as pd
import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
from six import BytesIO
import numpy as np
import xml.etree.ElementTree as et
import ast
import tqdm
from itertools import chain
from xml.dom import minidom
from PIL import Image
from PIL import ImageColor
from PIL import ImageDraw
from PIL import ImageFont
from PIL import ImageOps
import cv2
import glob
import time

### Path of Image directory

In [None]:
path1='/kaggle/input/open-images-object-detection-rvc-2020/test/'

In [None]:
sample = pd.read_csv("/kaggle/input/open-images-object-detection-rvc-2020/sample_submission.csv")
sample.head()

In [None]:
sample.shape

In [None]:
ids = []
for i in range(len(sample)):
    ids.append(sample['ImageId'][i])

In [None]:
ids[0:5]

In [None]:
img_data=[]
for i in range(len(sample)):
    img_data.append(glob.glob('/kaggle/input/open-images-object-detection-rvc-2020/test/{0}.jpg'.format(ids[i])))

In [None]:
img_data[0:5]

In [None]:
img_data=list(chain.from_iterable(img_data))

In [None]:
img_data[0:5]

In [None]:
def get_prediction_string(result):
    with tf.device('/device:GPU:0'):
        df = pd.DataFrame(columns=['Ymin','Xmin','Ymax', 'Xmax','Score','Label','Class_label','Class_name'])
        min_score=0.01
        for i in range(result['detection_boxes'].shape[0]):
           if (result["detection_scores"][i]) >= min_score:
              df.loc[i]= tuple(result['detection_boxes'][i])+(result["detection_scores"][i],)+(result["detection_class_labels"][i],)+(result["detection_class_names"][i],)+(result["detection_class_entities"][i],)
        return df

In [None]:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

In [None]:
module_handle = "https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1"
with tf.device('/device:GPU:0'):
    with tf.Graph().as_default():
        detector = hub.Module(module_handle)
        image_string_placeholder = tf.placeholder(tf.string)
        decoded_image = tf.image.decode_jpeg(image_string_placeholder)
        decoded_image_float = tf.image.convert_image_dtype(
            image=decoded_image, dtype=tf.float32)
        module_input = tf.expand_dims(decoded_image_float, 0)
        result = detector(module_input, as_dict=True)
        init_ops = [tf.global_variables_initializer(), tf.tables_initializer()]

        session = tf.Session()
        session.run(init_ops)

In [None]:
def nms(dets, thresh):
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    scores = dets[:, 4]

    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        yy1 = np.maximum(y1[i], y1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        yy2 = np.minimum(y2[i], y2[order[1:]])

        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        ovr = inter / (areas[i] + areas[order[1:]] - inter)

        inds = np.where(ovr <= thresh)[0]
        order = order[inds + 1]

    return keep

## First 20 Images

In [None]:
image_paths = img_data[0:20]

In [None]:
images = []
for f in image_paths:
    images.append(np.asarray(Image.open(f)))

In [None]:
!mkdir deepak

In [None]:
image_id = sample['ImageId']
def format_prediction_string(image_id, result):
    prediction_strings = []
    
    for i in range(len(result['Score'])):
        class_name = result['Class_label'][i].decode("utf-8")
        YMin,XMin,YMax,XMax = result['Ymin'][i],result['Xmin'][i],result['Ymax'][i],result['Xmax'][i]
        score = result['Score'][i]
        
        prediction_strings.append(
            f"{class_name} {score} {XMin} {YMin} {XMax} {YMax}"
        )
        
    prediction_string = " ".join(prediction_strings)

    return {
        "PredictionString": prediction_string
    }

In [None]:
k =-1
predictions = []
with tf.device('/device:GPU:0'):
    for image_path in image_paths:
        k=k+1
        img_path = img_data[k]
        img = cv2.imread(img_path)
        with tf.gfile.Open(image_path, "rb") as binfile:
            image_string = binfile.read()

        inference_start_time = time.time()
        result_out, image_out = session.run(
            [result, decoded_image],
            feed_dict={image_string_placeholder: image_string})
        df1=get_prediction_string(result_out)
        z1=nms(df1.values,0.68)
        z=df1.iloc[z1]
        z=z.reset_index()
        predictions.append(format_prediction_string(image_id, z))
        data1=z
        COLORS = np.random.uniform(0, 255, size=(len(z['Class_name']), 3))
        for m in range(len(data1)):
            if data1['Score'][m] >=0.01:
                img_class=data1.iloc[m].Class_name
                img_xmax, img_ymax =images[k].shape[1],images[k].shape[0]
                bbox_x_max, bbox_x_min = data1.Xmax[m] * img_xmax, data1.Xmin[m] * img_xmax
                bbox_y_max ,bbox_y_min = data1.Ymax[m] * img_ymax, data1.Ymin[m] * img_ymax
                xmin = int(bbox_x_min)
                ymin = int(bbox_y_min)
                xmax = int(bbox_x_max)
                ymax = int(bbox_y_max)
                width = xmax - xmin
                height = ymax - ymin
                label = str(data1['Class_name'][m])
                color = COLORS[m]
                cv2.rectangle(img, (xmin, ymax), (xmax, ymin), color, 2)
                path1 = '/kaggle/working/deepak/'+str(k)+'.jpg'
                img_path = path1
                cv2.imwrite(path1, img)
                cv2.putText(img, label, (xmax,ymin), cv2.FONT_HERSHEY_SIMPLEX, 0.9,color, 2)

In [None]:
def load_images(folder):
    images = []
    for filename in os.listdir(folder):
        img = Image.open(os.path.join(folder, filename))
        if img is not None:
            images.append(img)
    return images

In [None]:
z = load_images("/kaggle/working/deepak")

In [None]:
z[0]

In [None]:
z[3]

In [None]:
z[4]

In [None]:
z[6]

In [None]:
z[9]

In [None]:
z[10]

In [None]:
z[11]

In [None]:
z[15]

In [None]:
z[18]

## For Submission

In [None]:
pred_df = pd.DataFrame(predictions)
pred_df.head()

In [None]:
sample['PredictionString']= pred_df['PredictionString']

In [None]:
sample.head()

## If you like this notebook,please upvote.