For this example we'll need to use OpenCV with Darknet (the CNN defined for Yolo v3 algorithm). Since OpenCV already has darknet in last version, we'll use a Jupyter notebook to execute our examples.

In Yolo v3 home page (https://pjreddie.com/darknet/yolo/) we can find tutorials to configure and run both Yolo and Darknet. We are going to use pre-trained model of darknet that uses COCO dataset (http://cocodataset.org/). Darknet has been trained with 80 different classes

Firstly, we need to download the Yolo v3 configuration files 

1. The Yolo v3 configuration file (yolo.cfg)
2. The weights for darknet (yolo.weights)
3. The names of trained classes (coco.names)

Then we have to use python. I recommend you using Jupyter notebook (as I will do in this guide).  We need to import opencv and numpy. If we don't have opencv installed, we can install from notebook (!pip install opencv-python)


In [18]:

import cv2
import numpy



Then we load yolo v3 in cv2.dnn.readNet. We need to use weights and cfg to create the CNN in memory, and load class names as an array. You have to remember that files should be in the folder where you are executing the notebook.


In [19]:

net = cv2.dnn.readNet('yolov3.weights','yolov3.cfg')
classes  = []
with open('coco.names','r') as f:
    classes = [line.strip() for line in f.readlines()]



We get the ouputs of the darknet CNN


In [20]:


layer_names = net.getLayerNames()
outputlayers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]


We generate an array of random colors, one for each class with uniform distribution from numpy.


In [21]:

colors= numpy.random.uniform(0,255,size=(len(classes),3))


Now we load the test image 


In [22]:
img = cv2.imread("dog.jpg")


Since Yolo v3 works with small images (416x416 in this example) we may need reduce our input image:


In [23]:
img = cv2.resize(img,None,fx=0.4,fy=0.3)


We obtain dimensions of input image, and extract the three bands, we will uses 'blob' in the CNN


In [24]:
height,width,channels = img.shape
blob = cv2.dnn.blobFromImage(img,0.00392,(416,416),(0,0,0),True,crop=False)



We can inspect our image:


In [25]:
cv2.imshow("input image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()


We can inspect scaled band of the input image:


In [26]:
for b in blob:
    for n, img_blob in enumerate(b):
        cv2.imshow(str( n), img_blob)
cv2.waitKey(0)
cv2.destroyAllWindows()


Now it's time to execute the CNN


In [27]:
net.setInput(blob)
outs = net.forward(outputlayers)
print(outs[1])


[[0.02118016 0.02388134 0.04664548 ... 0.         0.         0.        ]
 [0.01726332 0.01875127 0.38844633 ... 0.         0.         0.        ]
 [0.02118462 0.01801873 0.07650209 ... 0.         0.         0.        ]
 ...
 [0.9731635  0.9751837  0.05149293 ... 0.         0.         0.        ]
 [0.9797624  0.9754399  0.30574453 ... 0.         0.         0.        ]
 [0.97900224 0.9831845  0.0811379  ... 0.         0.         0.        ]]


Now, in order to correctly generate output, we need to interpret output
Values coming from CNN are normalized in range [0..1]. We'll receive one line per 
each possible detected object, with the following data:
 - bounding box center x
 - bounding box center y
 - bounding box width
 - bounding box height
 - rest of columns are the confidences for each class id
Thus we could use a python code to build bounding boxes of identified objects and 
put the names of classes:


In [28]:
#Showing info on screen/ get confidence score of algorithm in detecting an object in blob
import numpy as np

class_ids=[]
confidences=[]
boxes=[]
for out in outs:
    for detection in out:
        scores = detection[5:]
        class_id = np.argmax(scores)
        confidence = scores[class_id]
        if confidence > 0.5:
            #object detected
            center_x= int(detection[0]*width)
            center_y= int(detection[1]*height)
            w = int(detection[2]*width)
            h = int(detection[3]*height)
            #rectangle co-ordinaters
            x=int(center_x - w/2)
            y=int(center_y - h/2)
            cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2)
            
            boxes.append([x,y,w,h]) #put all rectangle areas
            confidences.append(float(confidence)) 
            #how confidence was that object detected and show that percentage
            class_ids.append(class_id) #name of the object tha was detected

indexes = cv2.dnn.NMSBoxes(boxes,confidences,0.4,0.6)



Finally, we can paint on the image:


In [None]:
for i in range(len(boxes)):
    if i in indexes:
        x, y, w, h = boxes[i]
        label = classes[class_ids[i]].strip()  # <-- corrected here
        confidence = confidences[i]
        color = colors[i]

        # Draw bounding box
        cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)

        # Prepare text for display
        text = f"{label}: {confidence:.2f}"

        # Compute text size
        (text_width, text_height), baseline = cv2.getTextSize(text, cv2.FONT_HERSHEY_TRIPLEX, 0.5, 1)

        # Background rectangle for readability
        cv2.rectangle(img, (x, y - text_height - baseline), (x + text_width, y), color, thickness=cv2.FILLED)

        # Draw label text
        cv2.putText(img, text, (x, y - baseline),
                    cv2.FONT_HERSHEY_TRIPLEX,
                    0.5,
                    (255, 255, 255),
                    thickness=1)

cv2.imshow("Image", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
