# **Yolo v3 practice**
**Objective:** Understand implementation of Yolo v3 and execute a pre-trained model on Image and Video. 




---



*Author: Antônio Luis (Phd student Puc-Rio)*


In [None]:
!nvidia-smi

!pip install keras

Thu Oct 20 18:42:19 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   48C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
!pip install keras

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


### Import dependencies

In [None]:
from shutil import copy2
import argparse
import os
import numpy as np
!pip install keras
from keras.layers import Conv2D, Input, BatchNormalization, LeakyReLU, ZeroPadding2D, UpSampling2D
from keras.layers.merge import add, concatenate
from keras.models import Model
import struct
import cv2
from numpy import expand_dims
from keras.models import load_model
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from matplotlib import pyplot
from matplotlib.patches import Rectangle
from IPython.display import Image


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


ModuleNotFoundError: ignored

### Verify if you have GPU

In [None]:
!nvidia-smi

### Setup - Connect do Gdrive

In [None]:
%cd ..
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
# this creates a symbolic link so that now the path /content/gdrive/My\ Drive/ is equal to /mydrive
!ln -s /content/gdrive/My\ Drive/ /mydrive

### Setup - working folder

1.   Create folder 'yolov3' in your Drive
2.   Download yolov3 weights (pre-trained on MS-COCO)



In [None]:
!mkdir /mydrive/yolov3

In [None]:
cd /mydrive/yolov3

### Download pre-trained weights

In [None]:
# get yolov3 pretrained coco dataset weights
!wget https://pjreddie.com/media/files/yolov3.weights

# PART 1  - Yolo V3 implementation

### Define model functions

1.   WeightReader to read weight from file
2.   make_yolov3_model to build model
3.   _conv_block to add conv block to model



In [None]:
class WeightReader:
    def __init__(self, weight_file):
        with open(weight_file, 'rb') as w_f:
            major,    = struct.unpack('i', w_f.read(4))
            minor,    = struct.unpack('i', w_f.read(4))
            revision, = struct.unpack('i', w_f.read(4))

            if (major*10 + minor) >= 2 and major < 1000 and minor < 1000:
                w_f.read(8)
            else:
                w_f.read(4)

            transpose = (major > 1000) or (minor > 1000)
            
            binary = w_f.read()

        self.offset = 0
        self.all_weights = np.frombuffer(binary, dtype='float32')
        
    def read_bytes(self, size):
        self.offset = self.offset + size
        return self.all_weights[self.offset-size:self.offset]

    def load_weights(self, model):
        for i in range(106):
            try:
                conv_layer = model.get_layer('conv_' + str(i))
                print("loading weights of convolution #" + str(i))

                if i not in [81, 93, 105]:
                    norm_layer = model.get_layer('bnorm_' + str(i))

                    size = np.prod(norm_layer.get_weights()[0].shape)

                    beta  = self.read_bytes(size) # bias
                    gamma = self.read_bytes(size) # scale
                    mean  = self.read_bytes(size) # mean
                    var   = self.read_bytes(size) # variance            

                    weights = norm_layer.set_weights([gamma, beta, mean, var])  

                if len(conv_layer.get_weights()) > 1:
                    bias   = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape))
                    kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
                    
                    kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
                    kernel = kernel.transpose([2,3,1,0])
                    conv_layer.set_weights([kernel, bias])
                else:
                    kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
                    kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
                    kernel = kernel.transpose([2,3,1,0])
                    conv_layer.set_weights([kernel])
            except ValueError:
                print("no convolution #" + str(i))     
    
    def reset(self):
        self.offset = 0

![yolov3-darknet](https://miro.medium.com/max/2000/1*d4Eg17IVJ0L41e7CTWLLSg.png)

In [None]:
def make_yolov3_model():
    input_image = Input(shape=(None, None, 3))

    # Layer  0 => 4
    x = _conv_block(input_image, [{'filter': 32, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},
                                  {'filter': 64, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 1},
                                  {'filter': 32, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},
                                  {'filter': 64, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])

    # Layer  5 => 8
    x = _conv_block(x, [{'filter': 128, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 5},
                        {'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},
                        {'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])

    # Layer  9 => 11
    x = _conv_block(x, [{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},
                        {'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])

    # Layer 12 => 15
    x = _conv_block(x, [{'filter': 256, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 12},
                        {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},
                        {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 14}])

    # Layer 16 => 36
    for i in range(7):
        x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},
                            {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])
        
    skip_36 = x
        
    # Layer 37 => 40
    x = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 37},
                        {'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},
                        {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])

    # Layer 41 => 61
    for i in range(7):
        x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},
                            {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])
        
    skip_61 = x
        
    # Layer 62 => 65
    x = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 62},
                        {'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},
                        {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])

    # Layer 66 => 74
    for i in range(3):
        x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},
                            {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])
        
    # Layer 75 => 79
    x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},
                        {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},
                        {'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},
                        {'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},
                        {'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], skip=False)

    # Layer 80 => 82
    yolo_82 = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 80},
                              {'filter':  255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], skip=False)

    # Layer 83 => 86
    x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], skip=False)
    x = UpSampling2D(2)(x)
    x = concatenate([x, skip_61])

    # Layer 87 => 91
    x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},
                        {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},
                        {'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},
                        {'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},
                        {'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], skip=False)

    # Layer 92 => 94
    yolo_94 = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 92},
                              {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], skip=False)

    # Layer 95 => 98
    x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True,   'layer_idx': 96}], skip=False)
    x = UpSampling2D(2)(x)
    x = concatenate([x, skip_36])

    # Layer 99 => 106
    yolo_106 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 99},
                               {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 100},
                               {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 101},
                               {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 102},
                               {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 103},
                               {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 104},
                               {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], skip=False)

    model = Model(input_image, [yolo_82, yolo_94, yolo_106])    
    return model

In [None]:
def _conv_block(inp, convs, skip=True):
    x = inp
    count = 0
    
    for conv in convs:
        if count == (len(convs) - 2) and skip:
            skip_connection = x
        count += 1
        
        if conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # peculiar padding as darknet prefer left and top
        x = Conv2D(conv['filter'], 
                   conv['kernel'], 
                   strides=conv['stride'], 
                   padding='valid' if conv['stride'] > 1 else 'same', # peculiar padding as darknet prefer left and top
                   name='conv_' + str(conv['layer_idx']), 
                   use_bias=False if conv['bnorm'] else True)(x)
        if conv['bnorm']: x = BatchNormalization(epsilon=0.001, name='bnorm_' + str(conv['layer_idx']))(x)
        if conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)

    return add([skip_connection, x]) if skip else x


### Define and save model with pre-trained weights



In [None]:
# define the model
model = make_yolov3_model()

In [None]:
# load the model weights
weight_reader = WeightReader('/mydrive/yolov3/yolov3.weights') 

In [None]:
# set the model weights into the model
weight_reader.load_weights(model)

In [None]:
# save the model to file
model.save('model.h5')

## Load and Execute model

In [None]:
# load yolov3 model
model = load_model('model.h5')

### Download test image

In [None]:
# download test image
!wget https://www.dropbox.com/s/i0j17woc0s90xve/TRAFFIC.jpg

In [None]:
# download test image
!wget https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2019/03/zebra.jpg

In [None]:
Image('zebra.jpg')

In [None]:
Image('TRAFFIC.jpg')

In [None]:
# load and prepare an image
def load_image_pixels(filename, shape):
  '''
  Load image, reshape to intended shape (416x416 for yolo)
  and normalize images to pixel values in 0-1.
  Save original width and height of image to reshape back to
  original size the picture with bounding boxes later.
  '''
  # load the image to get its shape
  image = load_img(filename)
  width, height = image.size
  # load the image with the required size
  image = load_img(filename, target_size=shape)
  # convert to numpy array
  image = img_to_array(image)
  # scale pixel values to [0, 1]
  image = image.astype('float32')
  image /= 255.0
  # add a dimension so that we have one sample
  image = expand_dims(image, 0)
  return image, width, height

In [None]:

# define the expected input shape for the model
input_w, input_h = 416, 416

#CHANGE CODE HERE
##############################################################
# define our new photo
photo_filename = 'zebra.jpg' #'TRAFFIC.jpg'      ######################## <-----  EXERCISE CHANGE HERE 
##############################################################
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))

### Make prediction

![alt text](https://miro.medium.com/max/1200/0*3A8U0Hm5IKmRa6hu.png)

In [None]:
# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])

Note that the shape of the detection kernel is 1 x 1 x (B x (5 + C) ). Here B is the number of bounding boxes a cell on the feature map can predict, “5” is for the 4 bounding box attributes and one object confidence, and C is the number of classes. In YOLO v3 trained on COCO, B = 3 and C = 80, so the kernel size is 1 x 1 x (3x85) = 1 x 1 x 255


### Extract and select bounding boxes

In [None]:
class BoundBox:
	def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):
		self.xmin = xmin
		self.ymin = ymin
		self.xmax = xmax
		self.ymax = ymax
		self.objness = objness
		self.classes = classes
		self.label = -1
		self.score = -1
 
	def get_label(self):
		if self.label == -1:
			self.label = np.argmax(self.classes)
 
		return self.label
 
	def get_score(self):
		if self.score == -1:
			self.score = self.classes[self.get_label()]
 
		return self.score
 
def _sigmoid(x):
	return 1. / (1. + np.exp(-x))
 
def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
	grid_h, grid_w = netout.shape[:2]
	nb_box = 3
	netout = netout.reshape((grid_h, grid_w, nb_box, -1))
	nb_class = netout.shape[-1] - 5
	boxes = []
	netout[..., :2]  = _sigmoid(netout[..., :2])
	netout[..., 4:]  = _sigmoid(netout[..., 4:])
	netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
	netout[..., 5:] *= netout[..., 5:] > obj_thresh
 
	for i in range(grid_h*grid_w):
		row = i / grid_w
		col = i % grid_w
		for b in range(nb_box):
			# 4th element is objectness score
			objectness = netout[int(row)][int(col)][b][4]
			if(objectness.all() <= obj_thresh): continue
			# first 4 elements are x, y, w, and h
			x, y, w, h = netout[int(row)][int(col)][b][:4]
			x = (col + x) / grid_w # center position, unit: image width
			y = (row + y) / grid_h # center position, unit: image height
			w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
			h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
			# last elements are class probabilities
			classes = netout[int(row)][col][b][5:]
			box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
			boxes.append(box)
	return boxes
 
def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):
	new_w, new_h = net_w, net_h
	for i in range(len(boxes)):
		x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
		y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
		boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
		boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
		boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
		boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)
 
def _interval_overlap(interval_a, interval_b):
	x1, x2 = interval_a
	x3, x4 = interval_b
	if x3 < x1:
		if x4 < x1:
			return 0
		else:
			return min(x2,x4) - x1
	else:
		if x2 < x3:
			 return 0
		else:
			return min(x2,x4) - x3
 
def bbox_iou(box1, box2):
	intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
	intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])
	intersect = intersect_w * intersect_h
	w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
	w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
	union = w1*h1 + w2*h2 - intersect
	return float(intersect) / union
 


![alt text](https://miro.medium.com/max/1400/1*mEIEF1xvFAWHJeJWxnGLaw.png)

### Do Non-maximal supression

![alt text](https://miro.medium.com/max/1400/1*6d_D0ySg-kOvfrzIRwHIiA.png)

In [None]:
def do_nms(boxes, nms_thresh):
	if len(boxes) > 0:
		nb_class = len(boxes[0].classes)
	else:
		return
	for c in range(nb_class):
		sorted_indices = np.argsort([-box.classes[c] for box in boxes])
		for i in range(len(sorted_indices)):
			index_i = sorted_indices[i]
			if boxes[index_i].classes[c] == 0: continue
			for j in range(i+1, len(sorted_indices)):
				index_j = sorted_indices[j]
				if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
					boxes[index_j].classes[c] = 0
 
 
# get all of the results above a threshold
def get_boxes(boxes, labels, thresh):
	v_boxes, v_labels, v_scores = list(), list(), list()
	# enumerate all boxes
	for box in boxes:
		# enumerate all possible labels
		for i in range(len(labels)):
			# check if the threshold for this label is high enough
			if box.classes[i] > thresh:
				v_boxes.append(box)
				v_labels.append(labels[i])
				v_scores.append(box.classes[i]*100)
				# don't break, many labels may trigger for one box
	return v_boxes, v_labels, v_scores
 
# draw all results
def draw_boxes(filename, v_boxes, v_labels, v_scores):
	# load the image
	data = pyplot.imread(filename)
	# plot the image
	pyplot.imshow(data)
	# get the context for drawing boxes
	ax = pyplot.gca()
	# plot each box
	for i in range(len(v_boxes)):
		box = v_boxes[i]
		# get coordinates
		y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
		# calculate width and height of the box
		width, height = x2 - x1, y2 - y1
		# create the shape
		rect = Rectangle((x1, y1), width, height, fill=False, color='white')
		# draw the box
		ax.add_patch(rect)
		# draw text and score in top left corner
		label = "%s (%.3f)" % (v_labels[i], v_scores[i])
		pyplot.text(x1, y1, label, color='white')
	# show the plot
	#pyplot.savefig('figura_zebra.png')
	pyplot.show()
 


### Define config parameters and run the model *****





In [None]:
# define the anchors
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
#CHANGE CODE HERE
########################################################################
# define the probability threshold for detected objects
class_threshold = 0.6
nms_threshold = 0.5
#############################################################################
boxes = list()
pyplot.rcParams["figure.figsize"] = (40,10)
for i in range(len(yhat)):
	# decode the output of the network
	boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
# correct the sizes of the bounding boxes for the shape of the image
correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)
# suppress non-maximal boxes
do_nms(boxes, nms_threshold)
# define the labels
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",
	"boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
	"bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
	"backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
	"sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
	"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",
	"apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
	"chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",
	"remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
	"book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
# summarize what we found
for i in range(len(v_boxes)):
	print(v_labels[i], v_scores[i])
# draw what we found
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)

In [None]:
print("Number of objects detected in the scene:" ,len(v_boxes))

# **PART 2: *Yolo* v3 - on Video (in Pytorch)**

On this part we are going to use a different implementation, a Pytorch implementation from Ayoosh Kathuria

Example:


Clone the repo we are going to use 

In [None]:
!git clone https://github.com/ayooshkathuria/pytorch-yolo-v3

Enter the repo we just cloned

In [None]:
cd pytorch-yolo-v3/

### Import dependencies (we use many from the repo)

In [None]:
from __future__ import division
import time
import torch 
import torch.nn as nn
from torch.autograd import Variable
from util import *
from darknet import Darknet
from preprocess import prep_image, inp_to_image, letterbox_image
import random 
import pickle as pkl
from google.colab.patches import cv2_imshow #cv2.imshow causes collab to crash
                                            # then we use cv2_imshow

**Download or Upload** your own video

In [None]:
#Upload video on yolov3 folder on your Google Drive

#CHANGE CODE HERE
#########################################################
video_name = "chasing_animals360p.mp4"

# Just uncomment the 3 lines below

#orig_folder = "/mydrive/yolov3/" + video_name
#dest_folder = "/content/pytorch-yolo-v3"
#copy2(orig_folder,dest_folder)
##########################################################

In [None]:
# Or just download from a link right into your pytorch-yolo-v3 folder

#CHANGE CODE HERE
######################################
#Uncomment and 
!wget "https://www.dropbox.com/s/951sqbtge8dzcsr/chasing_animals360p.mp4"
#######################################

In [None]:
from IPython.display import HTML
from base64 import b64encode

video_path = video_name ######## put the name of your video file HERE

mp4 = open(video_path,'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()

HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)

Auxiliary functions

In [None]:

def get_test_input(input_dim, CUDA):
    img = cv2.imread("dog-cycle-car.png")
    img = cv2.resize(img, (input_dim, input_dim)) 
    img_ =  img[:,:,::-1].transpose((2,0,1))
    img_ = img_[np.newaxis,:,:,:]/255.0
    img_ = torch.from_numpy(img_).float()
    img_ = Variable(img_)
    
    if CUDA:
        img_ = img_.cuda()
    
    return img_

def prep_image(img, inp_dim):
    """
    Prepare image for inputting to the neural network. 
    
    Returns a Variable 
    """

    orig_im = img
    dim = orig_im.shape[1], orig_im.shape[0]
    img = (letterbox_image(orig_im, (inp_dim, inp_dim)))
    img_ = img[:,:,::-1].transpose((2,0,1)).copy()
    img_ = torch.from_numpy(img_).float().div(255.0).unsqueeze(0)
    return img_, orig_im, dim

def write(x, img):
    c1 = tuple(x[1:3].int())
    c2 = tuple(x[3:5].int())
    cls = int(x[-1]) if x[-1] < len(classes) else -1
    label = "{0}".format(classes[cls]) if cls != -1 else 'no detection'
    if label != 'no detection':
      color = random.choice(colors)
      cv2.rectangle(img, (int(c1[0]), int(c1[1])), (int(c2[0]), int(c2[1])),color, 1)
      t_size = cv2.getTextSize(label, cv2.FONT_HERSHEY_PLAIN, 1 , 1)[0]
      c2 = c1[0] + t_size[0] + 3, c1[1] + t_size[1] + 4
      cv2.rectangle(img, (int(c1[0]), int(c1[1])), (int(c2[0]), int(c2[1])),color, -1)
      cv2.putText(img, label, (int(c1[0]), int(c1[1] + t_size[1] + 4)), cv2.FONT_HERSHEY_PLAIN, 1, [225,255,255], 1);
    return img

Defining setup parameters for YOLO v3 - Video

In [None]:
# Input arguments

#CHANGE CODE HERE
####################################
video = video_name     ######## put the name of your video file HERE

confidence = 0.6 # Object Confidence to filter predictions - float32

nms_thresh = 0.4 # Non-maximal supression threshold - float32

save_fps = 30. # Important to save the processed video with the right fps
####################################

dataset = "coco" # "Dataset on which the network has been trained"
cfgfile = "cfg/yolov3.cfg" # Config file

weightsfile = "/mydrive/yolov3/yolov3.weights" # pre-trained weights 

reso = "416" #Input resolution of the network. Increase to increase accuracy.
             #Decrease to increase speed


# Video save codec 
##change this for different video types if you get errors
fourcc = cv2.VideoWriter_fourcc(*'MP4V')
#Other options
# cv2.VideoWriter_fourcc(*'MP4V')
#cv2.VideoWriter_fourcc('M','P','E','G')
#cv2.VideoWriter_fourcc('M', 'J', 'P', 'G') 
#cv2.VideoWriter_fourcc(*'DIVX') -> para .avi
# cv2.VideoWriter_fourcc(*'MP4V')             

### Run video prediction and save video

In [None]:
start = 0

CUDA = torch.cuda.is_available()

num_classes = 80

CUDA = torch.cuda.is_available()

bbox_attrs = 5 + num_classes

print("Loading network.....")
model = Darknet(cfgfile)
model.load_weights(weightsfile)
print("Network successfully loaded")

model.net_info["height"] = reso
inp_dim = int(model.net_info["height"])
assert inp_dim % 32 == 0 
assert inp_dim > 32

if CUDA:
    model.cuda()
    
model(get_test_input(inp_dim, CUDA), CUDA)

model.eval()

videofile = video


cap = cv2.VideoCapture(videofile)
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
print("Video resolution : ", height, width)

assert cap.isOpened(), 'Cannot capture source'

#Video save
video_yolo = video[:-4] + "_yolov3.mp4" #[:-4] is to remove extension part of video name
videoSaved = cv2.VideoWriter(video_yolo, apiPreference=cv2.CAP_FFMPEG,
                             fourcc = fourcc,fps =save_fps, frameSize=(width, height),	
                             isColor= True)

frames = 0
start = time.time()    
while cap.isOpened():
    
    ret, frame = cap.read()
    if ret:
        

        img, orig_im, dim = prep_image(frame, inp_dim)
        
        im_dim = torch.FloatTensor(dim).repeat(1,2)                        
        
        
        if CUDA:
            im_dim = im_dim.cuda()
            img = img.cuda()
        
        with torch.no_grad():   
            output = model(Variable(img), CUDA)
        output = write_results(output, confidence, num_classes, nms = True, nms_conf = nms_thresh)

        if type(output) == int:
            frames += 1
            print("FPS of the video is {:5.2f}".format( frames / (time.time() - start)))
            cv2_imshow("frame", orig_im)
            key = cv2.waitKey(1)
            if key & 0xFF == ord('q'):
                break
            continue

        
        im_dim = im_dim.repeat(output.size(0), 1)
        scaling_factor = torch.min(inp_dim/im_dim,1)[0].view(-1,1)
        
        output[:,[1,3]] -= (inp_dim - scaling_factor*im_dim[:,0].view(-1,1))/2
        output[:,[2,4]] -= (inp_dim - scaling_factor*im_dim[:,1].view(-1,1))/2
        
        output[:,1:5] /= scaling_factor

        for i in range(output.shape[0]):
            output[i, [1,3]] = torch.clamp(output[i, [1,3]], 0.0, im_dim[i,0])
            output[i, [2,4]] = torch.clamp(output[i, [2,4]], 0.0, im_dim[i,1])
        
        classes = load_classes('data/coco.names')
        colors = pkl.load(open("pallete", "rb"))
        
        list(map(lambda x: write(x, orig_im), output))
        
        # video save line
        videoSaved.write(orig_im)
        
        #################TO PLOT FRAMES IN REAL TIME UNCOMMENT LINE BELOW#########
        
        #cv2_imshow(orig_im)     # Uncomment at your own risk!!! - May freeze colab
        if frames%100 == 0:
          print('{} frames processed'.format(frames))

        frames += 1
        
        ########WHEN DISPLAYNG FRAMES IN REAL TIME UNCOMMENT LINE BELOW TO PRINT FPS#####
        
        #print("FPS of the video is {:5.2f}".format( frames / (time.time() - start)))

        
    else:
        break
videoSaved.release()
end = time.time()
print("Video processed with YOLO v3 saved with as " + video_yolo)
print('Total time elapsed: {:.3f} segundos'.format((end - start)))

## Now open your Google Drive and play your processed video!







# Exercise 

Apply Yolo v3 on videos and change hyperparameters (**confidence_threshold** and
**nms_threshold**) to see the results.

[Link to report the results](https://docs.google.com/document/d/1C6VzzsZnC5rm0ThX0un-0ASTwSphYJ1GKL8ssAJyoL4/edit?usp=sharing)


List of videos to apply Yolo v3


1. Indian War (30s) - 1080p : [original link](https://www.youtube.com/watch?v=DcV7d3py-Mw) /[download link](https://www.dropbox.com/s/mmncfyccvgevm87/IndianWar1080p.mp4)
					       				
                        
2. Soccer tactics (2min) 360p 	: [original link](https://www.youtube.com/watch?v=c83yE-s_Wf0) /[download link](https://www.dropbox.com/s/lf99d8lzwqlj2zs/soccer_tactics360p.mp4)

3. Girl soccer fight (17s) - 720p : [original link](https://www.youtube.com/watch?v=oWJLump8Jjk/)/ [download link](https://www.dropbox.com/s/0o1bourc6oyil5a/soccer_fight720p.mp4)
                

4. Girl soccer fight (17s) - 360p:	 [original link](https://www.youtube.com/watch?v=oWJLump8Jjk/)/ [download link](https://www.dropbox.com/s/35jw9yy0hle0wbw/soccer_fight360p.mp4)			
		
5. Animals chasing people (2min 25s) 360p :  [original link](https://www.youtube.com/watch?v=F1svRmDlsL4)/ [download link]( https://www.dropbox.com/s/951sqbtge8dzcsr/chasing_animals360p.mp4)	     

6. Hungry monkeys (40s) 720p:				 [original link](https://www.youtube.com/watch?v=22JgHBb-0dg)/ [download link](https://www.dropbox.com/s/isxbju7yabzlnin/hungry_monkeys720p.mp4)  
									
7. Super animals (30s) 720p:		 [original link]( https://www.youtube.com/watch?v=PaMPdz-3Agg)/ [download link](https://www.dropbox.com/s/cuku94guronf5tf/super_animals720p.mp4) 	

8. Heavy Traffic Stock Video (9s) 720p: [original link](https://www.youtube.com/watch?v=5YbLr0HRCiw&ab_channel=MotionArray) / [download link](https://www.dropbox.com/s/db4wnplsq1n420s/Heavy_Traffic_Stock_Video.mp4)

9. Indian Traffic (3min 28s) 720p: [original link](https://www.youtube.com/watch?v=KnPiP9PkLAs) / [download link](https://www.dropbox.com/s/25hts0d0di9uaw2/indian_traffic.mp4)

OBS: Right click and choose "stats for nerds"  to see frame rate of video you want to download (normally its 25~30)

In [None]:
#Obs: Download the video inside the "pytorch-yolo-v3" folder using the command wget as shown below:
!wget "https://www.dropbox.com/s/lf99d8lzwqlj2zs/soccer_tactics360p.mp4"

# References
---

Yolo v3 for Images practice was based on: 

>  [Machine Learning Mastery Tutorial](https://https://machinelearningmastery.com/how-to-perform-object-detection-with-yolov3-in-keras/)



> [Github repo from experiencor](https://https://github.com/experiencor/keras-yolo3)

Yolo v3 for Video practice was based on:
> [Github repo from Ayoosh Kathuria  (Pytorch)](https://github.com/ayooshkathuria/pytorch-yolo-v3)


All pictures taken from:

1. [What’s new in YOLO v3? - Ayoosh Kathuria](https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b)


2. [YOLO v3 theory explained - Analytics Vidhya](https://medium.com/analytics-vidhya/yolo-v3-theory-explained-33100f6d193)




# ...BUT how do I train a Yolo v3 on a custom dataset?




1. Check [Ai Guy video Tutorial](https://www.youtube.com/watch?v=10joRJt39Ns) and try [his colab notebook](https://colab.research.google.com/drive/1Mh2HP_Mfxoao6qNFbhfV3u28tG8jAVGk)


2.  Other good resource (but much less complete) is [Pysource video Tutorial](https://www.youtube.com/watch?v=_FNfRtXEbr4)


Both are tutorials done using [the Darknet official (supported) repo](https://github.com/AlexeyAB/darknet) so are strongly recommended for practitioners







