# Entrenamiento de modelo detector con YOLOv7 (PyTorch)

Este cuaderno contiene los pasos para entrenar un modelo YOLOv7 en PyTorch y exportar para inferencia con Triton Server.

**Entradas**:

- Dataset para detección de objetos con YOLOv7

**Salidas**:

- Modelo YOLOv7 entrenado en PyTorch (`.pt`).
- Modelo exportado para inferencia en Triton Server.

**Resumen del procedimiento**

1. Descarga de YOLOv7 e instalación de dependencias.
2. Descarga y preparación de dataset.
3. Entrenamiento.
4. Evaluación.
5. Inferencia.
6. Despliegue con Triton Server.

**Referencias**

- [Official YOLOv7](https://github.com/WongKinYiu/yolov7)

## 1. Descarga de YOLOv7 e instalación de dependencias

Este cuaderno asume que se ejecuta en un ambiente donde YOLOv7 Aún no ha sido instalado. 

Se descargará en el subdirectorio `yolov7`. La mayorìa de los pasos que siguen se ejecutan adentro del mismo.

In [None]:
import os
if not os.path.exists('yolov7'):
    !git clone https://github.com/WongKinYiu/yolov7.git

In [None]:
# Asegurarse de estar adentro de yolov7
cwd=%pwd
if cwd.split('/')[-1] != 'yolov7':
    %cwd yolov7

In [None]:
!pip install -r requirements.txt

## 2. Descarga y preparación de dataset

### Dataset de ejemplo: Pothole

Fuente: [Fine Tuning YOLOv7 on Custom Dataset](https://learnopencv.com/fine-tuning-yolov7-on-custom-dataset/)

In [None]:
# Asegurarse de estar adentro de yolov7
cwd=%pwd
if cwd.split('/')[-1] != 'yolov7':
    %cwd yolov7

In [None]:
import os
if not os.path.exists('pothole_dataset.zip'):
    !wget https://learnopencv.s3.us-west-2.amazonaws.com/pothole_dataset.zip
    !unzip -q pothole_dataset.zip

Estructura de directorios de dataset de entrada.    

In [None]:
!tree -d pothole_dataset | head -n 20

In [None]:
!ls pothole_dataset/images/train | head -n 5

In [None]:
!ls pothole_dataset/labels/train | head -n 5

In [None]:
!cat pothole_dataset/labels/train/G0010033.txt

Formato: 

~~~
class, x_center, y_center, width, height
~~~

### Definición del dataset

Debe ir en `yolov7/data`.

In [None]:
# Asegurarse de estar adentro de yolov7
cwd=%pwd
if cwd.split('/')[-1] != 'yolov7':
    %cwd yolov7

In [None]:
%%writefile data/pothole.yaml
train: ../pothole_dataset/images/train
val: ../pothole_dataset/images/valid
test: ../pothole_dataset/images/test

# Classes
nc: 1  # number of classes
names: ['pothole']  # class names

In [None]:
!cat data/pothole.yaml

## 2. Configuración del modelo

In [None]:
# Asegurarse de estar adentro de yolov7
cwd=%pwd
if cwd.split('/')[-1] != 'yolov7':
    %cwd yolov7

In [None]:
!ls cfg/training

In [None]:
!cat cfg/training/yolov7.yaml

<div class="alert alert-warning">
    <b>Atención</b>: editar manualmente este archivo luego de copiar. Cómo mínimo, se debe establecer el número de clases.
</div>

In [None]:
!cp cfg/training/yolov7.yaml cfg/training/yolov7-pothole.yaml

## 3. Entrenamiento

In [None]:
# Asegurarse de estar adentro de yolov7
cwd=%pwd
if cwd.split('/')[-1] != 'yolov7':
    %cwd yolov7

Descargar pesos iniciales.

In [None]:
import os
if not os.path.exists('yolov7_training.pt'): 
    !wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt

Verificar disponibilidad de GPU(s).

In [None]:
!nvidia-smi

Si se dispone de recursos se puede aumentar el número de workers y tamaño de batch. En una laptop con RTX3070MQ funcionó sólo con `workers=1` y `batch-size=4`.

Se puede monitorear el entrenamiento con Tensorboard.

En una terminal aparte:

~~~bash
tensorboard --logdir runs/train
~~~

Tensorboard:  http://localhost:6006/

In [None]:
!python train.py --epochs 100 \
                 --workers 1 \
                 --device 0 \
                 --batch-size 4 \
                 --data data/pothole.yaml \
                 --img 640 640 \
                 --cfg cfg/training/yolov7_pothole.yaml \
                 --weights 'yolov7_training.pt' \
                 --name yolov7_pothole \
                 --hyp data/hyp.scratch.custom.yaml

In [None]:
!ls runs/train/yolov7_pothole

## 4. Evaluación

In [None]:
!python test.py --data data/pothole.yaml \
               --img 640 \
               --batch 32 \
               --conf 0.001 \
               --iou 0.65 \
               --device 0 \
               --weights runs/train/yolov7_pothole8/weights/best.pt \
               --name yolov7_640_val

In [None]:
!ls runs/test/yolov7_640_val

In [None]:
from IPython.display import display, Image
display(Image(filename='runs/test/yolov7_640_val/confusion_matrix.png',width=600,height=600))

In [None]:
display(Image(filename='runs/test/yolov7_640_val/test_batch0_labels.jpg',width=1024,height=1024))

In [None]:
display(Image(filename='runs/test/yolov7_640_val/test_batch1_labels.jpg',width=1024,height=1024))

## 5. Inferencia para prototipado rápido (PyTorch) 

In [None]:
# De https://github.com/pytorch/pytorch/issues/18325: torch.load() requires model module in the same folder #3678
%cd yolov7
%pwd

In [None]:
!ls runs/train/yolov7_pothole8/weights

In [None]:
import torch
from torchvision import transforms
import numpy as np

# Atención: torch.load() requires model module in the same folder #3678
MODEL_WEIGHTS_PATH="runs/train/yolov7_pothole8/weights/best.pt"

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
weigths = torch.load(MODEL_WEIGHTS_PATH)

In [None]:
model = weigths['model']
model = model.half().to(device)
_ = model.eval()

In [None]:
import numpy as np
import cv2
import matplotlib.pyplot as plt
%matplotlib inline 

TEST_IMG_PATH='pothole_dataset/images/test/img-294_jpg.rf.a16953e9091e3eecfc338ed3044ef294.jpg'
img = cv2.imread(TEST_IMG_PATH) 
img =  cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.imshow(img);

In [None]:
image = img.copy()
image = transforms.ToTensor()(image)
image = torch.tensor(np.array([image.numpy()]))
image = image.to(device)
image = image.half()
with torch.no_grad():
    output, _ = model(image)

## 6. Despliegue con Triton Server

Se siguen los pasos del [github oficial](https://github.com/WongKinYiu/yolov7/tree/main/deploy/triton-inference-server).


**Advertencia sobre compatibilidad entre TensorRT y CUDA**

Elegir la versión de TensorRT que se corresponda con la versión de CUDA disponible en el host. De lo contrario aparecen errores.

0. Determinar la versión de CUDA en el sistema.

~~~bash
nvidia-smi
~~~

1. Exportar a ONNX. 

Nota: instalar antes todas las dependencias:

~~~bash
pip install onnx onnx-simplifier onnx-graphsurgeon
~~~

~~~bash
cd yolov7
python export.py --weights runs/train/yolov7_pothole8/weights/best.pt --grid --end2end --dynamic-batch --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640
~~~

2. Exportar a TensorRT con docker de Triton. Elegir el que corresponda, por ejemplo, para CUDA 11.6 es 22.02.

~~~bash
docker run -it --rm --gpus=all nvcr.io/nvidia/tensorrt:22.02-py3
~~~

3. Copiar al docker.

~~~bash
docker ps
CONTAINER ID   IMAGE                               COMMAND                  CREATED         STATUS         PORTS     NAMES
14c431abcf03   nvcr.io/nvidia/tensorrt:22.06-py3   "/opt/nvidia/nvidia_…"   2 minutes ago   Up 2 minutes             dreamy_northcutt
~~~

~~~bash
cd yolov7
docker cp runs/train/yolov7_pothole8/weights/best.onnx dreamy_northcutt:/workspace/
~~~

4. Convertir a TensorRT

~~~bash
mv best.onnx yolov7.onnx
./tensorrt/bin/trtexec --onnx=yolov7.onnx --minShapes=images:1x3x640x640 --optShapes=images:8x3x640x640 --maxShapes=images:8x3x640x640 --fp16 --workspace=4096 --saveEngine=yolov7-fp16-1x8x8.engine --timingCacheFile=timing.cache
~~~

5. Servicio con Triton

Crear estructura de directorios:

~~~bash
cd yolo7_custom_dataset
mv yolov7/yolov7-fp16-1x8x8.engine triton-deploy/models/yolov7/1/model.plan
mkdir -pv triton-deploy/models/yolov7/1
touch triton-deploy/models/yolov7/config.pbtxt
~~~

Editar `config.pbtxt`:

~~~
name: "yolov7"
platform: "tensorrt_plan"
max_batch_size: 8
dynamic_batching { }
~~~

**Advertencia**: elegir la versión de Triton Server, se debe usar la misma versión de TensorRT que se usó para exportar. Ver [matriz de compatibilidad](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html)

~~~bash
docker run --gpus all --rm --ipc=host --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd)/models:/models nvcr.io/nvidia/tritonserver:22.02-py3 tritonserver --model-repository=/models --strict-model-config=false --log-verbose 1
~~~

### 6.1 Inferencia - Cliente gRPC

In [None]:
#!pip install tritonclient[all] opencv-python

In [1]:
import tritonclient.grpc as grpcclient
from tritonclient.utils import InferenceServerException

TRITON_SERVER_URL='localhost:8001'

# Create server context
try:
    triton_client = grpcclient.InferenceServerClient(
        url=TRITON_SERVER_URL,
        verbose=False,
        ssl=False,
        root_certificates=None,
        private_key=None,
        certificate_chain=None)
except Exception as e:
    print("context creation failed: " + str(e))

True

True

In [5]:
TRITON_MODEL_NAME="yolov7"


True

In [7]:
metadata = triton_client.get_model_metadata(TRITON_MODEL_NAME)
metadata

name: "yolov7"
versions: "1"
platform: "tensorrt_plan"
inputs {
  name: "images"
  datatype: "FP32"
  shape: -1
  shape: 3
  shape: 640
  shape: 640
}
outputs {
  name: "num_dets"
  datatype: "INT32"
  shape: -1
  shape: 1
}
outputs {
  name: "det_boxes"
  datatype: "FP32"
  shape: -1
  shape: 100
  shape: 4
}
outputs {
  name: "det_scores"
  datatype: "FP32"
  shape: -1
  shape: 100
}
outputs {
  name: "det_classes"
  datatype: "INT32"
  shape: -1
  shape: 100
}

In [8]:
config = triton_client.get_model_config(TRITON_MODEL_NAME)
config

config {
  name: "yolov7"
  platform: "tensorrt_plan"
  version_policy {
    latest {
      num_versions: 1
    }
  }
  max_batch_size: 8
  input {
    name: "images"
    data_type: TYPE_FP32
    dims: 3
    dims: 640
    dims: 640
  }
  output {
    name: "num_dets"
    data_type: TYPE_INT32
    dims: 1
  }
  output {
    name: "det_boxes"
    data_type: TYPE_FP32
    dims: 100
    dims: 4
  }
  output {
    name: "det_scores"
    data_type: TYPE_FP32
    dims: 100
  }
  output {
    name: "det_classes"
    data_type: TYPE_INT32
    dims: 100
  }
  instance_group {
    name: "yolov7"
    count: 1
    gpus: 0
    kind: KIND_GPU
  }
  default_model_filename: "model.plan"
  dynamic_batching {
    preferred_batch_size: 8
  }
  optimization {
    input_pinned_memory {
      enable: true
    }
    output_pinned_memory {
      enable: true
    }
  }
  backend: "tensorrt"
}

In [10]:
INPUT_NAMES = ["images"]
OUTPUT_NAMES = ["num_dets", "det_boxes", "det_scores", "det_classes"]

WIDTH=640
HEIGHT=640

inputs = []
outputs = []
inputs.append(grpcclient.InferInput(INPUT_NAMES[0], [1, 3, WIDTH, HEIGHT], "FP32"))
outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[0]))
outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[1]))
outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[2]))
outputs.append(grpcclient.InferRequestedOutput(OUTPUT_NAMES[3]))

In [26]:
import cv2 
import numpy as np

INPUT="yolov7/pothole_dataset/images/test/img-238_jpg.rf.f146df7999e374dbeaba65f92c518159.jpg"
input_image = cv2.imread(INPUT)

In [27]:
input_image.shape

(720, 720, 3)

In [29]:
img2 = cv2.resize(input_image, (640,640))
img2.shape

(640, 640, 3)

In [29]:
class BoundingBox:
    def __init__(self, classID, confidence, x1, x2, y1, y2, image_width, image_height):
        self.classID = classID
        self.confidence = confidence
        self.x1 = x1
        self.x2 = x2
        self.y1 = y1
        self.y2 = y2
        self.u1 = x1 / image_width
        self.u2 = x2 / image_width
        self.v1 = y1 / image_height
        self.v2 = y2 / image_height

    def box(self):
        return (self.x1, self.y1, self.x2, self.y2)

    def width(self):
        return self.x2 - self.x1

    def height(self):
        return self.y2 - self.y1

    def center_absolute(self):
        return (0.5 * (self.x1 + self.x2), 0.5 * (self.y1 + self.y2))

    def center_normalized(self):
        return (0.5 * (self.u1 + self.u2), 0.5 * (self.v1 + self.v2))

    def size_absolute(self):
        return (self.x2 - self.x1, self.y2 - self.y1)

    def size_normalized(self):
        return (self.u2 - self.u1, self.v2 - self.v1)

In [17]:
def preprocess(img, input_shape, letter_box=True):
    if letter_box:
        img_h, img_w, _ = img.shape
        new_h, new_w = input_shape[0], input_shape[1]
        offset_h, offset_w = 0, 0
        if (new_w / img_w) <= (new_h / img_h):
            new_h = int(img_h * new_w / img_w)
            offset_h = (input_shape[0] - new_h) // 2
        else:
            new_w = int(img_w * new_h / img_h)
            offset_w = (input_shape[1] - new_w) // 2
        resized = cv2.resize(img, (new_w, new_h))
        img = np.full((input_shape[0], input_shape[1], 3), 127, dtype=np.uint8)
        img[offset_h:(offset_h + new_h), offset_w:(offset_w + new_w), :] = resized
    else:
        img = cv2.resize(img, (input_shape[1], input_shape[0]))

    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.transpose((2, 0, 1)).astype(np.float32)
    img /= 255.0
    return img

In [27]:
def postprocess(num_dets, det_boxes, det_scores, det_classes, img_w, img_h, input_shape, letter_box=True):
    boxes = det_boxes[0, :num_dets[0][0]] / np.array([input_shape[0], input_shape[1], input_shape[0], input_shape[1]], dtype=np.float32)
    scores = det_scores[0, :num_dets[0][0]]
    classes = det_classes[0, :num_dets[0][0]].astype(np.int)

    old_h, old_w = img_h, img_w
    offset_h, offset_w = 0, 0
    if letter_box:
        if (img_w / input_shape[1]) >= (img_h / input_shape[0]):
            old_h = int(input_shape[0] * img_w / input_shape[1])
            offset_h = (old_h - img_h) // 2
        else:
            old_w = int(input_shape[1] * img_h / input_shape[0])
            offset_w = (old_w - img_w) // 2

    boxes = boxes * np.array([old_w, old_h, old_w, old_h], dtype=np.float32)
    if letter_box:
        boxes -= np.array([offset_w, offset_h, offset_w, offset_h], dtype=np.float32)
    boxes = boxes.astype(np.int)

    detected_objects = []
    for box, score, label in zip(boxes, scores, classes):
        detected_objects.append(BoundingBox(label, score, box[0], box[2], box[1], box[3], img_w, img_h))
    return detected_objects

In [18]:
input_image_buffer = preprocess(input_image, [WIDTH, HEIGHT])
input_image_buffer = np.expand_dims(input_image_buffer, axis=0)

inputs[0].set_data_from_numpy(input_image_buffer)

results = triton_client.infer(model_name=TRITON_MODEL_NAME,
                              inputs=inputs,
                              outputs=outputs,
                              client_timeout=10)

In [23]:
statistics = triton_client.get_inference_statistics(model_name=TRITON_MODEL_NAME)
statistics

model_stats {
  name: "yolov7"
  version: "1"
  last_inference: 1663625727226
  inference_count: 3
  execution_count: 3
  inference_stats {
    success {
      count: 3
      ns: 1724882595
    }
    fail {
    }
    queue {
      count: 3
      ns: 436640
    }
    compute_input {
      count: 3
      ns: 37089336
    }
    compute_infer {
      count: 3
      ns: 1686269196
    }
    compute_output {
      count: 3
      ns: 606645
    }
    cache_hit {
    }
  }
  batch_stats {
    batch_size: 1
    compute_input {
      count: 3
      ns: 37089336
    }
    compute_infer {
      count: 3
      ns: 1686269196
    }
    compute_output {
      count: 3
      ns: 606645
    }
  }
}

In [20]:
results

<tritonclient.grpc.InferResult at 0x7f80547d5fa0>

In [30]:
num_dets = results.as_numpy(OUTPUT_NAMES[0])
det_boxes = results.as_numpy(OUTPUT_NAMES[1])
det_scores = results.as_numpy(OUTPUT_NAMES[2])
det_classes = results.as_numpy(OUTPUT_NAMES[3])
detected_objects = postprocess(num_dets, det_boxes, det_scores, det_classes, input_image.shape[1], input_image.shape[0], [WIDTH, HEIGHT])

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  classes = det_classes[0, :num_dets[0][0]].astype(np.int)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  boxes = boxes.astype(np.int)


In [32]:
len(detected_objects)

1

In [36]:
detected_objects[0].box()

(198, 313, 485, 433)

### 6.2 Inferencia - Integración en Videoanalytics

In [37]:
!git clone https://github.com/nhorro/videoanalytics

Cloning into 'videoanalytics'...
remote: Enumerating objects: 824, done.[K
remote: Counting objects: 100% (23/23), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 824 (delta 3), reused 19 (delta 2), pack-reused 801[K
Receiving objects: 100% (824/824), 12.24 MiB | 1.56 MiB/s, done.
Resolving deltas: 100% (407/407), done.


In [3]:
!pip install networkx

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting networkx
  Downloading networkx-2.8.6-py3-none-any.whl (2.0 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m470.2 kB/s[0m eta [36m0:00:00[0mm eta [36m0:00:01[0m[36m0:00:01[0m
[?25hInstalling collected packages: networkx
Successfully installed networkx-2.8.6


In [4]:
import sys
sys.path.append("videoanalytics/src/")

from videoanalytics.pipeline import Pipeline
from videoanalytics.pipeline.sources import VideoReader
from videoanalytics.pipeline.sinks import VideoWriter

In [86]:
import cv2
import tritonclient.grpc as grpcclient
from tritonclient.utils import InferenceServerException

from videoanalytics.utils.boundingboxes import format_boxes
from videoanalytics.pipeline import Sink
import numpy as np

class YOLOv7DetectorTriton(Sink):
    '''
    YOLOv7 object detector Triton Client implementation.
    This component **READS** the following entries in the global context:
    +-------------------+-----------------------------------------------------+
    | Variable name     | Description                                         |
    +===================+============+==========+=============================+
    | FRAME             | Numpy array representing the frame.                 |
    +-------------------+-----------------------------------------------------+
    This component **UPDATES** the following entries in the global context:
    +-------------------+-----------------------------------------------------+
    | Variable name     | Description                                         |
    +===================+============+==========+=============================+
    | DETECTIONS        | List holding numpy array with bounding boxes.       |
    +-------------------+-----------------------------------------------------+
    Args:
        name(str): the component unique name.
        context (dict): The global context.         
        triton_server_uri (str): Triton server endpoint. 
        allowed_classes (list): set of allowed classes. This option is to restrict
                                the detections to a subset of classes relevant to
                                the application domain. If None, all classes are allowed. 
                                Note: Ignored in this version.
        yolo_input_size (int): size in pixels of the input cell. The input image is 
                               resized using opencv. 
        yolo_max_output_size_per_class (int): maximum number of detections per class. 
                                              Note: Ignored in this version.
        yolo_max_total_size (int): maximum number of detections. 
                                   Note: Ignored in this version.
        context_name(str): variable name used for storing detections in context
    '''    
    def __init__(self,name,context, model_name="yolov7", url='localhost:8001', context_name="DETECTIONS"):
        super().__init__(name, context)
        
        self.context_name=context_name
        self.model_name=model_name
        self.yolo_input_size=640
        
        self.letter_box=False # No soportado por ahora
        
        # Create server context        
        self.triton_client = grpcclient.InferenceServerClient(
            url=url,
            verbose=False,
            ssl=False,
            root_certificates=None,
            private_key=None,
            certificate_chain=None)
        
        # Health check
        assert(self.triton_client.is_server_live())
        assert(self.triton_client.is_server_ready())
        assert(self.triton_client.is_model_ready(self.model_name))
        
        self.INPUT_NAMES = ["images"]
        self.OUTPUT_NAMES = ["num_dets", "det_boxes", "det_scores", "det_classes"]

        self.inputs = []
        self.outputs = []
        self.inputs.append(grpcclient.InferInput(self.INPUT_NAMES[0], [1, 3, self.yolo_input_size, self.yolo_input_size], "FP32"))
        self.outputs.append(grpcclient.InferRequestedOutput(self.OUTPUT_NAMES[0]))
        self.outputs.append(grpcclient.InferRequestedOutput(self.OUTPUT_NAMES[1]))
        self.outputs.append(grpcclient.InferRequestedOutput(self.OUTPUT_NAMES[2]))
        self.outputs.append(grpcclient.InferRequestedOutput(self.OUTPUT_NAMES[3]))
        
    def setup(self):
        pass
    
    def __preprocess(self):
        if self.letter_box:            
            img_h, img_w, _ = self.context["FRAME"].shape
            new_h, new_w =   self.yolo_input_size,  self.yolo_input_size 
            offset_h, offset_w = 0, 0
            if (new_w / img_w) <= (new_h / img_h):
                new_h = int(img_h * new_w / img_w)
                offset_h = (self.yolo_input_size - new_h) // 2
            else:
                new_w = int(img_w * new_h / img_h)
                offset_w = (self.yolo_input_size - new_w) // 2
            resized = cv2.resize(self.context["FRAME"], (new_w, new_h))
            img = np.full((self.yolo_input_size, self.yolo_input_size, 3), 127, dtype=np.uint8)
            img[offset_h:(offset_h + new_h), offset_w:(offset_w + new_w), :] = resized
        else:
            img = cv2.resize(self.context["FRAME"], (self.yolo_input_size, self.yolo_input_size))
        #img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = img.transpose((2, 0, 1)).astype(np.float32)
        img /= 255.0
        return img
            
    def process(self):
        input_image_buffer = self.__preprocess()
        input_image_buffer = np.expand_dims(input_image_buffer, axis=0)
        
        self.inputs[0].set_data_from_numpy(input_image_buffer)
        results = self.triton_client.infer(model_name=self.model_name,
                                           inputs=self.inputs,
                                           outputs=self.outputs)
                
        num_objects = results.as_numpy(self.OUTPUT_NAMES[0])[0][0]
        bboxes = results.as_numpy(self.OUTPUT_NAMES[1])[0]
        scores = np.squeeze(results.as_numpy(self.OUTPUT_NAMES[2]))        
        classes = np.squeeze(results.as_numpy(self.OUTPUT_NAMES[3]))        
        bboxes = bboxes[0:int(num_objects)].copy()       
        scores = scores[0:int(num_objects)].copy()        
        classes = classes[0:int(num_objects)].copy()
        
        
                        
        # 6. Convertir BBs de normalized ymin, xmin, ymax, xmax ---> xmin, ymin, width, height
        original_h, original_w, _ = self.context["FRAME"].shape
        
        for box in bboxes:
            box[2]-=box[0]
            box[3]-=box[1]
            box[0]*=original_w/self.yolo_input_size
            box[1]*=original_h/self.yolo_input_size
            box[2]*=original_w/self.yolo_input_size
            box[3]*=original_h/self.yolo_input_size
        #bboxes = format_boxes(bboxes, original_h, original_w)
        #print(bboxes)
        
        # 7. FIXME: encontrar una forma mejor de representar las detecciones
        self.context[self.context_name] = [bboxes, scores, classes, num_objects]
        
    def shutdown(self):
        pass  

In [91]:
# Specific components for object detection
from videoanalytics.pipeline.sinks.object_detection import DetectionsAnnotator, DetectionsCSVWriter

# Input
INPUT_VIDEO = "test_video/video3.mp4"
OUTPUT_VIDEO = "output3.mp4"
START_FRAME = 0
MAX_FRAMES = None

# Classes names for Detections Annotator
DETECTOR_CLASSES_FILENAME = "classes.txt"

# Output


# 1. Create the context
context = {}

# 2. Create the pipeline
pipeline = Pipeline()

# 3. Add components
pipeline.add_component( VideoReader( "input",context,
                                     video_path=INPUT_VIDEO,
                                     start_frame=START_FRAME,
                                     max_frames=MAX_FRAMES))

# 3.2 Detector
pipeline.add_component( YOLOv7DetectorTriton("detector",context) )

# 3.4 Annotate detections in output video
pipeline.add_component( DetectionsAnnotator("annotator",context,
                                             class_names_filename=DETECTOR_CLASSES_FILENAME,
                                             show_label=True) )

pipeline.add_component(VideoWriter("writer",context,filename=OUTPUT_VIDEO))

# 4. Define connections
pipeline.set_connections([
    ("input", "detector"),
    ("detector", "annotator"),
    ("annotator", "writer")
])
                       
# 5. Execute
pipeline.execute()

# 6. Report (optional)
print(pipeline.get_metrics())

OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'


  0%|          | 0/100.0 [00:00<?, ?it/s]

{'input_avg_dt': 0.0004873317557927262, 'detector_avg_dt': 0.014562267780698643, 'annotator_avg_dt': 5.002672315050293e-06, 'writer_avg_dt': 0.0027257958645516243}
