Dopo aver estratto le possibili regioni che possono contenere un oggetto utilizzando l'algoritmo di Selective Search proposto nel notebook SelectiveSearch.ipynb, è possibile sottoporre le regioni stesse ad un classificatore al fine di individuare l'oggetto contenuto nel box proposto. <br>
In pratica SelectiveSearch sostituisce Image Pyramid e sliding windows per estrare i ROI da classificare

In [66]:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input
from tensorflow.keras.applications import imagenet_utils
from tensorflow.keras.preprocessing.image import img_to_array
import numpy as np
import argparse
import cv2

Come modello di CNN per la classificazione verrà usato ResNet50 pre-trained sull'imagenet dataset

In [67]:
#IMAGE = "data/gatto2.jpg"
IMAGE = "data/cani.jpg"
METHOD = "fast"
#METHOD = "quality"
CONF = 0.95

labelFilters = None
# if the label filter is not empty, break it into a list
if labelFilters is not None:
    labelFilters = labelFilters.lower().split(",")

Per ridurre il numero di bounding boxes è necessario applicare l'algoritmo di NMS (Non Maxima Suppression) per ridurre i bouding box sovrapposti con minor probabilità.<br>
L'obiettivo è quello di ottenere un solo "bounding box" per ciascun oggetto presente nell'immagine iniziale.

### todo : creare modulo di utility

In [68]:
def non_max_suppression(boxes, probs=None, overlapThresh=0.3):
    # if there are no boxes, return an empty list
    if len(boxes) == 0:
        return []

    # if the bounding boxes are integers, convert them to floats -- this
    # is important since we'll be doing a bunch of divisions
    if boxes.dtype.kind == "i":
        boxes = boxes.astype("float")

    # initialize the list of picked indexes
    pick = []

    # grab the coordinates of the bounding boxes
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]

    # compute the area of the bounding boxes and grab the indexes to sort
    # (in the case that no probabilities are provided, simply sort on the
    # bottom-left y-coordinate)
    area = (x2 - x1 + 1) * (y2 - y1 + 1)
    idxs = y2

    # if probabilities are provided, sort on them instead
    if probs is not None:
        idxs = probs

    # sort the indexes
    idxs = np.argsort(idxs)

    # keep looping while some indexes still remain in the indexes list
    while len(idxs) > 0:
        # grab the last index in the indexes list and add the index value
        # to the list of picked indexes
        last = len(idxs) - 1
        i = idxs[last]
        pick.append(i)

        # find the largest (x, y) coordinates for the start of the bounding
        # box and the smallest (x, y) coordinates for the end of the bounding
        # box
        xx1 = np.maximum(x1[i], x1[idxs[:last]])
        yy1 = np.maximum(y1[i], y1[idxs[:last]])
        xx2 = np.minimum(x2[i], x2[idxs[:last]])
        yy2 = np.minimum(y2[i], y2[idxs[:last]])

        # compute the width and height of the bounding box
        w = np.maximum(0, xx2 - xx1 + 1)
        h = np.maximum(0, yy2 - yy1 + 1)

        # compute the ratio of overlap
        overlap = (w * h) / area[idxs[:last]]

        # delete all indexes from the index list that have overlap greater
        # than the provided overlap threshold
        idxs = np.delete(idxs, np.concatenate(([last],
            np.where(overlap > overlapThresh)[0])))

    # return only the bounding boxes that were picked
    return boxes[pick].astype("int")

Imposto la funzione per richiamare l'algoritmo di SelectiveSearch che riceve in input una immagine e restituisce le possibili regioni (region proposal) dove potrebbero trovarsi gli oggetti.
Queste regioni (bounding box) verranno successivamente passate ad un classificatore e successivamente verrà applicato l'algoritmo NMS.

In [64]:
def selective_search(image, method="fast"):
    # initialize OpenCV's selective search implementation and set the input image
    ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
    ss.setBaseImage(image)
    # check to see if we are using the *fast* but *less accurate* version of selective search
    if METHOD == "fast":
        ss.switchToSelectiveSearchFast()
    # otherwise we are using the *slower* but *more accurate* version
    else:
        ss.switchToSelectiveSearchQuality()
    # run selective search on the input image
    rects = ss.process()
    # return the region proposal bounding boxes
    return rects

Si procede a caricare sia il modello pre-trained di ResNet50 che l'immagine da trattare

In [69]:
# load ResNet from disk (with weights pre-trained on ImageNet)
print("[INFO] loading ResNet...")
model = ResNet50(weights="imagenet")
# load the input image from disk and grab its dimensions
image = cv2.imread(IMAGE)
(H, W) = image.shape[:2]

[INFO] loading ResNet...


All'immagine di input viene applicato l'algoritmo di SelectiveSearch al fine di ottenere una lista di region proposal

In [70]:
# run selective search on the input image
print("[INFO] performing selective search with '{}' method...".format(METHOD))
rects = selective_search(image, METHOD)
print("[INFO] {} regions found by selective search".format(len(rects)))


# initialize the list of region proposals that we'll be classifying
# along with their associated bounding boxes
proposals = []
boxes = []

[INFO] performing selective search with 'fast' method...
[INFO] 1741 regions found by selective search


Partendo delle regioni estratte dall'algoritmo di SS, vengono popolate 2 liste:
- proposals : contiene le immagini di dimensione sufficientemente larga (> 10% della dimensione originale)
- boxes : contiene le corrispondenti coordinate dei bounding box

In [71]:
# loop over the region proposal bounding box coordinates generated by
# running selective search
for (x, y, w, h) in rects:
    # if the width or height of the region is less than 10% of the
    # image width or height, ignore it (i.e., filter out small
    # objects that are likely false-positives)
    if w / float(W) < 0.1 or h / float(H) < 0.1:
        continue
    # extract the region from the input image, convert it from BGR to
    # RGB channel ordering, and then resize it to 224x224 (the input
    # dimensions required by our pre-trained CNN)
    roi = image[y:y + h, x:x + w]
    roi = cv2.cvtColor(roi, cv2.COLOR_BGR2RGB)
    roi = cv2.resize(roi, (224, 224))
    # further preprocess by the ROI
    roi = img_to_array(roi)
    roi = preprocess_input(roi)
    # update our proposals and bounding boxes lists
    proposals.append(roi)
    boxes.append((x, y, w, h))

Le immagine estratte dal ciclo precedente devono essere classificate dal modello CNN

In [72]:
# convert the proposals list into NumPy array and show its dimensions
proposals = np.array(proposals)
print("[INFO] proposal shape: {}".format(proposals.shape))

# classify each of the proposal ROIs using ResNet and then decode the
# predictions
print("[INFO] classifying proposals...")
preds = model.predict(proposals)
preds = imagenet_utils.decode_predictions(preds, top=1)

# initialize a dictionary which maps class labels (keys) to any
# bounding box associated with that label (values)
labels = {}

[INFO] proposal shape: (654, 224, 224, 3)
[INFO] classifying proposals...


Partendo dalle prediction del classificatore, si procede a filtrare le classi a cui si è interessati e 

In [73]:
# loop over the predictions
for (i, p) in enumerate(preds):
    # grab the prediction information for the current region proposal
    (imagenetID, label, prob) = p[0]
    # only if the label filters are not empty *and* the label does not
    # exist in the list, then ignore it
    if labelFilters is not None and label not in labelFilters:
        continue
    # filter out weak detections by ensuring the predicted probability
    # is greater than the minimum probability
    if prob >= CONF:
        # grab the bounding box associated with the prediction and
        # convert the coordinates
        (x, y, w, h) = boxes[i]
        box = (x, y, x + w, y + h)
        # grab the list of predictions for the label and add the
        # bounding box + probability to the list
        L = labels.get(label, [])
        L.append((box, prob))
        labels[label] = L

In [75]:
labels

{'German_shepherd': [((297, 22, 400, 116), 0.9543077),
  ((0, 0, 327, 413), 0.97552335),
  ((0, 0, 336, 413), 0.9632842),
  ((122, 0, 545, 393), 0.9749853),
  ((172, 233, 241, 335), 0.95126635),
  ((72, 0, 620, 413), 0.9518344),
  ((122, 0, 620, 383), 0.9705039),
  ((0, 0, 369, 413), 0.9812852),
  ((307, 0, 482, 140), 0.9608264),
  ((86, 0, 620, 413), 0.9768231),
  ((0, 0, 359, 413), 0.9566941),
  ((126, 22, 541, 284), 0.9503739),
  ((122, 22, 485, 394), 0.95828134),
  ((0, 0, 486, 413), 0.95313793),
  ((79, 0, 620, 413), 0.96051115),
  ((0, 0, 503, 413), 0.97353584),
  ((0, 0, 367, 413), 0.9654198),
  ((126, 22, 493, 284), 0.98624516),
  ((0, 0, 500, 413), 0.9784501)],
 'Leonberg': [((304, 141, 440, 305), 0.96334976),
  ((282, 145, 405, 309), 0.955411),
  ((306, 88, 545, 250), 0.96593565)]}

Partendo dal dictionary "labels" che contiene, per ogni regione selezionata:
- la classe individuata (come chiave)
- le coordinate e la probabilità assegnata 

viene applicato l'algoritmo di NMS al fine di eliminare i box spvrapposti ed ottenere un unico box per ogni classe.
Nel loop successivo vengono mostrati, per ogni label (classe individuata) i box prima e dopo l'applicazione dell'algoritmo di NMS

In [76]:
# loop over the labels for each of detected objects in the image
for label in labels.keys():
    # clone the original image so that we can draw on it
    print("[INFO] showing results for '{}'".format(label))
    clone = image.copy()
    # loop over all bounding boxes for the current label
    for (box, prob) in labels[label]:
        # draw the bounding box on the image
        (startX, startY, endX, endY) = box
        cv2.rectangle(clone, (startX, startY), (endX, endY),
            (0, 255, 0), 2)
    # show the results *before* applying non-maxima suppression, then
    # clone the image again so we can display the results *after*
    # applying non-maxima suppression
    cv2.imshow("Before", clone)
    clone = image.copy()
    # extract the bounding boxes and associated prediction
    # probabilities, then apply non-maxima suppression
    boxes = np.array([p[0] for p in labels[label]])
    proba = np.array([p[1] for p in labels[label]])
    boxes = non_max_suppression(boxes, proba)
    # loop over all bounding boxes that were kept after applying
    # non-maxima suppression
    for (startX, startY, endX, endY) in boxes:
        # draw the bounding box and label on the image
        cv2.rectangle(clone, (startX, startY), (endX, endY),
            (0, 255, 0), 2)
        y = startY - 10 if startY - 10 > 10 else startY + 10
        cv2.putText(clone, label, (startX, y),
            cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 255, 0), 2)
    # show the output after apply non-maxima suppression
    cv2.imshow("After", clone)
    cv2.waitKey(0)

[INFO] showing results for 'German_shepherd'
[INFO] showing results for 'Leonberg'


I risultati ottenuto non sono ancora pienamente soddisfacenti in quanto l'algoritmo individua anche parecchie classi non desiderate (noise) che, eventualmente, dovrebbero essere filtrate dal risultato finale