# Object Detection con YOLOv3 in PyTorch

In questo notebook useremo un'implementazione PyTorch di YOLOv3. La documentazione originale si può trovare sul sito https://pjreddie.com/darknet/yolo/.


## 1. Carichiamo i pesi
L'autore mette a disposizione l'insieme dei pesi preaddestrati sul [Common Object Contest (COCO) datase](cocodataset.org).

Per prima cosa otteniamo i pesi.

In [1]:
!wget https://pjreddie.com/media/files/yolov3.weights -O ./yolov3.weights

--2022-04-07 14:42:03--  https://pjreddie.com/media/files/yolov3.weights
Risoluzione di pjreddie.com (pjreddie.com)... 128.208.4.108
Connessione a pjreddie.com (pjreddie.com)|128.208.4.108|:443... connesso.
Richiesta HTTP inviata, in attesa di risposta... 200 OK
Lunghezza: 248007048 (237M) [application/octet-stream]
Salvataggio in: «./yolov3.weights»


2022-04-07 14:42:28 (10,3 MB/s) - «./yolov3.weights» salvato [248007048/248007048]



## Inizializziamo il modello

Il codice che implementa l'achitettura si trova nella directory `yolo_pytorch`. Date un'occhiata all'implementazione e cercate di capire com'è strutturata l'architettura. 


L'implementazione utilizzata è basata su quella di Ultralytics, disponibile su https://github.com/ultralytics/yolov3.



In [2]:
import os
import sys
import random

os.environ['CUDA_VISIBLE_DEVICES'] = '3'

In [4]:
import yolo_pytorch.models as models
from yolo_pytorch.utils.utils import *


import torch
import torchvision.transforms as transforms

import cv2
from PIL import Image, ImageFont, ImageDraw, ImageEnhance

print("Using PyTorch", torch.__version__)

# Set up model
model_config = 'yolo_pytorch/yolov3.cfg'
img_size = 416
weights = "./yolov3.weights"

model = models.Darknet(model_config, img_size)
# Use GPU if available
if torch.cuda.is_available():
    model.cuda()
models.load_darknet_weights(model, weights)
print(model)

  Referenced from: /Users/beppe/opt/anaconda3/envs/ptc/lib/python3.9/site-packages/torchvision/image.so
  Expected in: /Users/beppe/opt/anaconda3/envs/ptc/lib/python3.9/site-packages/torch/lib/libtorch_cpu.dylib
  warn(f"Failed to load image Python extension: {e}")


Using PyTorch 1.9.0.post2
Darknet(
  (module_list): ModuleList(
    (0): Sequential(
      (conv_0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (batch_norm_0): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky_0): LeakyReLU(negative_slope=0.1)
    )
    (1): Sequential(
      (conv_1): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (batch_norm_1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky_1): LeakyReLU(negative_slope=0.1)
    )
    (2): Sequential(
      (conv_2): Conv2d(64, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (batch_norm_2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (leaky_2): LeakyReLU(negative_slope=0.1)
    )
    (3): Sequential(
      (conv_3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (batch_norm_3): BatchN

## Usiamo il modello per individuare gli oggetti

il modello prende in input un'immagine 416x416 e restituisce una lista di descrittori. Incapsuliamo tutto in un detector. 



### Create functions to detect and display objects
We'll create a couple of functions:

- **detect_objects**: Submits an image to the model and returns predicted object locations
- **show_objects**: Displays the image with a bounding box fo each detected object.

In [5]:
class Detector:
    def __init__(self,model):
    
        self.model = model
        self.classes = load_classes('yolo_pytorch/coco.names')

                
        # Get bounding-box colors
        cmap = plt.get_cmap('tab20b')
        self.bbox_colors = [cmap(i) for i in np.linspace(0, 1, len(self.classes))]

    
    def detect_objects(self, img):
    
        # Set the model to evaluation mode
        self.model.eval()
    
        # Get scaled width and height
        ratio = min(img_size/img.size[0], img_size/img.size[1])
        imw = round(img.size[0] * ratio)
        imh = round(img.size[1] * ratio)

        # Transform the image for prediction
        img_transforms = transforms.Compose([
            transforms.Resize((imh, imw)),
            transforms.Pad((max(int((imh-imw)/2),0), 
                            max(int((imw-imh)/2),0), 
                            max(int((imh-imw)/2),0), 
                            max(int((imw-imh)/2),0)),
                           (128,128,128)),
            transforms.ToTensor(),
         ])
    
        # convert image to a Tensor
        image_tensor = img_transforms(img).float()
        if torch.cuda.is_available():
            image_tensor = image_tensor.cuda()
        image_tensor = image_tensor.unsqueeze_(0)
    
        # Use the model to detect objects in the image
        with torch.no_grad():
            detections = self.model(image_tensor)
            # Eliminate duplicates with non-max suppression
            detections = non_max_suppression(detections, 0.8, 0.4)
        return detections[0]

    
    def generate_bb(self, img):
        
        original_size = img.size  # W x H
        pad_x = max(img.size[0] - img.size[1], 0) * (img_size / max(img.size))
        pad_y = max(img.size[1] - img.size[0], 0) * (img_size / max(img.size))
        unpad_h = img_size - pad_y
        unpad_w = img_size - pad_x
        
        detections = self.detect_objects(img)

        if detections is not None:
            # process each instance of each class that was found
            unique_labels = detections[:, -1].cpu().unique()
            n_cls_preds = len(unique_labels)
            # browse detections and draw bounding boxes
            for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections:
                
                # etichetta di classe
                predicted_class = self.classes[int(cls_pred)]

                color = self.bbox_colors[int(np.where(unique_labels == int(cls_pred))[0])]
                cur_color = (int(color[0] * 255), int(color[1] * 255), int(color[2] * 255))
            
                # L'etichetta da attaccare
                label = '{}\n{:.2f}'.format(predicted_class, cls_conf)

                # compute coord for ImageDraw
                # dimensioni del box
                box_h = ((y2 - y1) / unpad_h) * original_size[0]
                box_w = ((x2 - x1) / unpad_w) * original_size[1]
                y1 = ((y1 - pad_x // 2) / unpad_h) * original_size[0]
                x1 = ((x1 - pad_y // 2) / unpad_w) * original_size[1]
                
#            cv2.rectangle(img, (x1, y1), (x1 + box_w, y1 + box_h),cur_color, 2)
#            cv2.putText(img,label, (x1, y1), font, fontScale,cur_color,2)
            
                draw = ImageDraw.Draw(img)
                draw.rectangle(((x1, y1), (x1 + box_w, y1 + box_h)), fill=None, outline=cur_color)
            
                # font = ImageFont.truetype("font_path123")
                font = ImageFont.load_default()
                draw.text((x1, y1), label, font=font, outline=cur_color)

        return img, len(detections)

### Use the functions with test images
Now we're ready to get some predictions from our test images.

In [6]:
from io import BytesIO
from IPython.display import clear_output, display, Image as IPImage
from PIL import Image

def showarray(a, fmt='jpeg'):
    f = BytesIO()
    Image.fromarray(a).save(f, fmt)
    display(IPImage(data=f.getvalue()))

In [13]:
import time

cam = cv2.VideoCapture(0)
#cam = cv2.VideoCapture('P1033731.mp4')
cam = cv2.VideoCapture('small.mp4')

detector = Detector(model)

try:
    while(True):
        t1 = time.time()
        # Capture frame-by-frame
        ret, frame = cam.read()
        # to display the image
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        img = Image.fromarray(frame)
        
        frame, len_detections = detector.generate_bb(img)
        # Display the image with bounding boxes
        #showarray(frame)
        display(img)
        
        t2 = time.time()
        # How many objects did we detect?
        print('Found {} objects, {} FPS'.format(len_detections, (1/(t2-t1))))
        # Display the frame until new frame is available

        clear_output(wait=True)
except KeyboardInterrupt:
    cam.release()
    print("Stream stopped")

Stream stopped
