# Práctica 4: Reconocimiento de Matrículas

Este notebook implementa un prototipo de reconocimiento de matrículas de vehículos en video. Los objetivos de esta práctica incluyen la detección y seguimiento de personas y vehículos, el reconocimiento de matrículas visibles en los vehículos, y la exportación de los resultados en un video y un archivo CSV.

## Objetivos

La práctica se enfoca en desarrollar un sistema de detección y reconocimiento de objetos que cumpla con los siguientes requisitos:

- Detección y seguimiento: Identificación y rastreo de personas y vehículos presentes en el video.
- Reconocimiento de matrículas: Detección de matrículas en los vehículos y reconocimiento del texto usando OCR.
- Conteo total de clases: Recuento acumulativo de cada tipo de objeto detectado.
- Exportación de resultados: Generación de un video que visualice los resultados y exportación de un archivo CSV con el detalle de las detecciones.

## Preparación del entorno

In [1]:
import cv2
import time
import math
import csv
from collections import defaultdict, Counter
from ultralytics import YOLO
import easyocr

In [2]:
def initialize_model(model_path):
    """Initialize the YOLO model for detection."""
    return YOLO(model_path)

def initialize_reader():
    """Initialize the EasyOCR reader."""
    return easyocr.Reader(['en'])  

def initialize_video_writer(cap, output_video_path):
    """Set up the video writer for the processed video."""
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Codec
    fps = cap.get(cv2.CAP_PROP_FPS)
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    return cv2.VideoWriter(output_video_path, fourcc, fps, (frame_width, frame_height))

def write_csv_header(csv_file_path):
    """Prepare CSV file for logging."""
    with open(csv_file_path, mode='w', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(['frame', 'object_type', 'confidence', 'tracking_id', 'x1', 'y1', 'x2', 'y2',
                         'license_plate_confidence', 'mx1', 'my1', 'mx2', 'my2', 'license_plate_text'])

def put_text(frame, text, position, color=(0, 255, 0), font_scale=0.6, thickness=2, bg_color=(0, 0, 0)):
    """Helper function to put text with background on the frame."""
    text_size = cv2.getTextSize(text, cv2.FONT_HERSHEY_SIMPLEX, font_scale, thickness)[0]
    text_x, text_y = position
    box_coords = ((text_x, text_y - text_size[1] - 5), (text_x + text_size[0] + 5, text_y + 5))
    cv2.rectangle(frame, box_coords[0], box_coords[1], bg_color, cv2.FILLED)
    cv2.putText(frame, text, position, cv2.FONT_HERSHEY_SIMPLEX, font_scale, color, thickness)

In [3]:
# Parameters 
video_path = 'C0142.mp4'  # Path to input video
model_path = 'yolo11n.pt'  # Path to YOLO model
license_plate_detector_model_path = 'runs/detect/license_plate_detector/weights/best.pt'  # Path to license plate detector model

output_video_path = 'output_video.mp4'  # Path to save the annotated output video
csv_file_path = 'detection_tracking_log.csv'  # Path to save the CSV log file
show_video = True  # Set to True to display the video while processing
classes_to_detect = [0, 1, 2, 3, 5]  # Class IDs to detect (e.g., [0, 2] for person and car)

model = initialize_model(model_path)
license_plate_detector = YOLO(license_plate_detector_model_path)
reader = initialize_reader()

# Define class names and colors for display
class_names = {
    0: "person",
    1: "bicycle",
    2: "car",
    3: "motorbike",
    5: "bus"
}
class_colors = {
    0: (255, 255, 255),
    1: (0, 255, 0),
    2: (0, 0, 255),
    3: (255, 255, 0),
    5: (0, 255, 255)
}

# Dictionary to store the best plate and its confidence for each track_id
vehicle_plates = {}

# Persistent total count of each class across all frames
total_class_count = Counter()
# Track unique IDs for each class to count only once
seen_ids = defaultdict(set)
frame_number = 0  # Initialize frame counter

blur_enabled = True # Set to True to blur faces
paused = False

In [4]:
# Open the video file and set up output for processed video
cap = cv2.VideoCapture(video_path)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Codec
fps = cap.get(cv2.CAP_PROP_FPS)
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out = initialize_video_writer(cap, output_video_path)
write_csv_header(csv_file_path)

# Loop through each frame
while cap.isOpened():
    if not paused:
        ret, frame = cap.read()
        if not ret:
            break

        start_time = time.time()
        frame_number += 1

        # Run YOLO detection and tracking
        results = model.track(frame, persist=True, classes=classes_to_detect)
        current_frame_count = Counter()

        # Process detections
        for result in results:
            boxes = result.boxes

            for box in boxes:
                x1, y1, x2, y2 = map(int, box.xyxy[0])
                cls = int(box.cls[0])
                confidence = round(float(box.conf[0]), 2)

                if box.id is not None:
                    track_id = int(box.id[0].tolist())
                    if track_id not in seen_ids[cls]:
                        seen_ids[cls].add(track_id)
                        total_class_count[class_names[cls]] += 1

                    # License plate recognition for cars
                    license_plate_text = ""
                    plate_confidence = None
                    mx1, my1, mx2, my2 = None, None, None, None

                    # Check if the detected object is a car, then detect license plate within its bounding box
                    if class_names[cls] in ["car", "motorbike", "bus"]:
                        vehicle_img = frame[y1:y2, x1:x2]  # Crop the vehicle area to search for license plate
                        
                        # Check if the cropped image is large enough for license plate detection
                        min_plate_size = 80
                        if vehicle_img.shape[0] < min_plate_size or vehicle_img.shape[1] < min_plate_size:
                            continue
                        
                        # Check if the confidence is high enough for license plate detection
                        if confidence < 0.7:
                            continue
                        
                        # Run license plate detector model on the cropped vehicle image
                        plate_results = license_plate_detector.predict(vehicle_img)

                        # Process license plate detection results
                        if plate_results and len(plate_results[0].boxes) > 0:
                            for plate_box in plate_results[0].boxes:
                                # Get bounding box coordinates for the license plate, adjusted to the frame's coordinates
                                px1, py1, px2, py2 = map(int, plate_box.xyxy[0])
                                px1, py1, px2, py2 = px1 + x1, py1 + y1, px2 + x1, py2 + y1  # Adjust to the car's bounding box position
                                                            
                                # Draw bounding box for license plate
                                background_color = (255, 255, 255)  # White background for contrast
                                cv2.rectangle(frame, (px1, py1), (px2, py2), background_color, 2)
                                    
                                # Extract the license plate text using OCR
                                license_plate_roi = frame[py1:py2, px1:px2]
                                
                                # Resize based on the plate size
                                plate_height, plate_width = license_plate_roi.shape[:2]
                                scale_factor = 100.0 / plate_height
                                resized_plate = cv2.resize(
                                    license_plate_roi, None, fx=scale_factor, fy=scale_factor,
                                    interpolation=cv2.INTER_CUBIC)

                                # Convert to grayscale
                                gray_plate = cv2.cvtColor(resized_plate, cv2.COLOR_BGR2GRAY)

                                # Apply CLAHE
                                clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8))
                                equalized_plate = clahe.apply(gray_plate)

                                # Denoise the image
                                denoised_plate = cv2.fastNlMeansDenoising(equalized_plate, None, 10, 7, 21)

                                # Adaptive thresholding with adjusted parameters
                                thresh_plate = cv2.adaptiveThreshold(
                                    denoised_plate, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
                                    cv2.THRESH_BINARY_INV, 11, 2)

                                # Morphological operations
                                kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3, 3))
                                morph_plate = cv2.morphologyEx(thresh_plate, cv2.MORPH_CLOSE, kernel)
                                morph_plate = cv2.morphologyEx(morph_plate, cv2.MORPH_OPEN, kernel)
                                morph_plate = cv2.bitwise_not(morph_plate)

                                plate_ocr_results = reader.readtext(morph_plate, allowlist='0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ')
                                
                                if plate_ocr_results:
                                    license_plate_text = plate_ocr_results[0][-2]
                                    plate_confidence = round(plate_ocr_results[0][-1], 2)
                                    
                                    # Check if confidence is above threshold
                                    if plate_confidence >= 0.2:
                                        # Update the vehicle_plates dictionary
                                        if (track_id not in vehicle_plates) or (plate_confidence > vehicle_plates[track_id]['confidence']):
                                            vehicle_plates[track_id] = {
                                                'plate': license_plate_text,
                                                'confidence': plate_confidence
                                            }
                                            # Save the processed license plate image in /plates folder
                                            cv2.imwrite(f'plates/{frame_number}_{track_id}_{license_plate_text}.png', morph_plate)

                                        # Save coordinates for CSV logging
                                        mx1, my1, mx2, my2 = px1, py1, px2, py2
                                    
                                assigned_plate = vehicle_plates.get(track_id, None)
                                if assigned_plate:
                                    # Draw the assigned plate on the frame
                                    background_color = (255, 255, 255)  # White background for contrast
                                    high_contrast_color = (0, 0, 0)  # Black text
                                    put_text(frame, f"Plate: {assigned_plate['plate']}", (x1, y2 + 40), color=high_contrast_color, bg_color=background_color)

                                    # Update license_plate_text and plate_confidence for CSV logging
                                    license_plate_text = assigned_plate['plate']
                                    plate_confidence = assigned_plate['confidence']
                                else:
                                    # If no plate assigned yet, set to empty
                                    license_plate_text = ""
                                    plate_confidence = None
                    
                    # Draw bounding box and label for the detected object
                    color = class_colors.get(cls, (0, 0, 0))
                    cv2.rectangle(frame, (x1, y1), (x2, y2), color, 3)
                    put_text(frame, f"{class_names[cls]} {confidence}", (x1, y1 - 10), color=color)
                    put_text(frame, f"ID: {track_id}", (x1, y2 + 20), color=color)
                                    
                    # Anonimización condicional de personas
                    if class_names[cls] == "person" and blur_enabled:
                        person_roi = frame[y1:y2, x1:x2]
                        blurred_person = cv2.GaussianBlur(person_roi, (51, 51), 30)
                        frame[y1:y2, x1:x2] = blurred_person
                        
                    # Write to CSV
                    with open(csv_file_path, mode='a', newline='') as file:
                        writer = csv.writer(file)
                        writer.writerow([frame_number, class_names[cls], confidence, track_id, x1, y1, x2, y2,
                                        plate_confidence, mx1, my1, mx2, my2, license_plate_text])

                    current_frame_count[class_names[cls]] += 1

        # Display counts and FPS
        y_offset = 30
        for cls, count in total_class_count.items():
            put_text(frame, f"Total {cls}: {count}", (10, y_offset))
            y_offset += 20

        for cls, count in current_frame_count.items():
            put_text(frame, f"Frame {cls}: {count}", (10, y_offset), color=(255, 255, 255))
            y_offset += 20

        fps_calc = 1.0 / (time.time() - start_time)
        put_text(frame, f"FPS: {fps_calc:.2f}", (10, y_offset), color=(255, 255, 255))

        # Write frame to output video
        out.write(frame)

    # Optionally display the frame
    if show_video:
        cv2.imshow('Detection and Tracking', frame)
        key = cv2.waitKey(1 if not paused else 0) & 0xFF
        if key == 27: # Tecla Esc
            break
        elif key == ord(' '):  # Tecla Espacio
            paused = not paused
        elif key == ord('b'):  # Tecla para alternar desenfoque
            blur_enabled = not blur_enabled  # Cambia el estado de desenfoque
            print(f"Desenfoque {'habilitado' if blur_enabled else 'deshabilitado'}")
        
# Release resources
cap.release()
out.release()
cv2.destroyAllWindows()


0: 384x640 1 car, 1 bus, 49.4ms
Speed: 2.5ms preprocess, 49.4ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 544x640 1 Licenseplate, 62.2ms
Speed: 2.0ms preprocess, 62.2ms inference, 0.3ms postprocess per image at shape (1, 3, 544, 640)

0: 384x640 1 car, 1 bus, 36.0ms
Speed: 1.8ms preprocess, 36.0ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 544x640 (no detections), 49.8ms
Speed: 1.5ms preprocess, 49.8ms inference, 0.2ms postprocess per image at shape (1, 3, 544, 640)

0: 384x640 1 car, 1 bus, 63.2ms
Speed: 1.9ms preprocess, 63.2ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 512x640 (no detections), 55.7ms
Speed: 2.0ms preprocess, 55.7ms inference, 0.2ms postprocess per image at shape (1, 3, 512, 640)

0: 384x640 1 car, 1 bus, 44.3ms
Speed: 1.5ms preprocess, 44.3ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 512x640 (no detections), 51.2ms
Speed: 1.8ms preprocess, 51.2ms inference, 0.2

2024-11-06 20:10:06.990 python[66752:63548878] +[IMKClient subclass]: chose IMKClient_Legacy
2024-11-06 20:10:06.990 python[66752:63548878] +[IMKInputSession subclass]: chose IMKInputSession_Legacy


0: 384x640 1 car, 1 bus, 45.7ms
Speed: 2.0ms preprocess, 45.7ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 480x640 (no detections), 90.0ms
Speed: 4.9ms preprocess, 90.0ms inference, 0.2ms postprocess per image at shape (1, 3, 480, 640)

0: 384x640 1 car, 1 bus, 43.6ms
Speed: 2.0ms preprocess, 43.6ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 480x640 (no detections), 138.1ms
Speed: 1.7ms preprocess, 138.1ms inference, 0.2ms postprocess per image at shape (1, 3, 480, 640)

0: 384x640 1 car, 1 bus, 54.3ms
Speed: 2.0ms preprocess, 54.3ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 448x640 (no detections), 71.4ms
Speed: 1.8ms preprocess, 71.4ms inference, 0.2ms postprocess per image at shape (1, 3, 448, 640)

0: 384x640 1 car, 1 bus, 111.0ms
Speed: 13.4ms preprocess, 111.0ms inference, 1.2ms postprocess per image at shape (1, 3, 384, 640)

0: 448x640 1 Licenseplate, 70.2ms
Speed: 1.9ms preprocess, 70.2ms inference,

2024-11-06 20:10:11.382 python[66752:63548878] _TIPropertyValueIsValid called with 16 on nil context!
2024-11-06 20:10:11.382 python[66752:63548878] imkxpc_getApplicationProperty:reply: called with incorrect property value 16, bailing.
2024-11-06 20:10:11.382 python[66752:63548878] Text input context does not respond to _valueForTIProperty:



0: 384x640 1 car, 1 bus, 37.1ms
Speed: 1.5ms preprocess, 37.1ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 448x640 1 Licenseplate, 123.8ms
Speed: 1.4ms preprocess, 123.8ms inference, 0.4ms postprocess per image at shape (1, 3, 448, 640)

0: 384x640 1 car, 1 bus, 55.3ms
Speed: 3.1ms preprocess, 55.3ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 448x640 1 Licenseplate, 58.2ms
Speed: 1.7ms preprocess, 58.2ms inference, 0.4ms postprocess per image at shape (1, 3, 448, 640)

0: 384x640 1 car, 1 bus, 40.9ms
Speed: 1.4ms preprocess, 40.9ms inference, 0.9ms postprocess per image at shape (1, 3, 384, 640)

0: 480x640 1 Licenseplate, 45.5ms
Speed: 1.3ms preprocess, 45.5ms inference, 0.3ms postprocess per image at shape (1, 3, 480, 640)

0: 384x640 1 car, 1 bus, 54.5ms
Speed: 1.7ms preprocess, 54.5ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 480x640 (no detections), 56.7ms
Speed: 1.7ms preprocess, 56.7ms inference, 0.2

2024-11-06 20:12:44.895 python[66752:63548878] _TIPropertyValueIsValid called with 16 on nil context!
2024-11-06 20:12:44.895 python[66752:63548878] imkxpc_getApplicationProperty:reply: called with incorrect property value 16, bailing.
2024-11-06 20:12:44.895 python[66752:63548878] Text input context does not respond to _valueForTIProperty:



0: 448x640 1 Licenseplate, 122.3ms
Speed: 8.5ms preprocess, 122.3ms inference, 0.4ms postprocess per image at shape (1, 3, 448, 640)

0: 384x640 4 cars, 64.9ms
Speed: 1.9ms preprocess, 64.9ms inference, 1.0ms postprocess per image at shape (1, 3, 384, 640)

0: 512x640 1 Licenseplate, 81.9ms
Speed: 2.3ms preprocess, 81.9ms inference, 0.4ms postprocess per image at shape (1, 3, 512, 640)

0: 448x640 (no detections), 65.7ms
Speed: 1.4ms preprocess, 65.7ms inference, 0.2ms postprocess per image at shape (1, 3, 448, 640)

0: 384x640 4 cars, 56.1ms
Speed: 1.9ms preprocess, 56.1ms inference, 1.9ms postprocess per image at shape (1, 3, 384, 640)

0: 544x640 1 Licenseplate, 74.1ms
Speed: 1.9ms preprocess, 74.1ms inference, 0.4ms postprocess per image at shape (1, 3, 544, 640)

0: 448x640 1 Licenseplate, 61.4ms
Speed: 3.4ms preprocess, 61.4ms inference, 0.4ms postprocess per image at shape (1, 3, 448, 640)

0: 384x640 4 cars, 32.7ms
Speed: 1.6ms preprocess, 32.7ms inference, 0.6ms postprocess p

2024-11-06 20:14:24.753 python[66752:63548878] _TIPropertyValueIsValid called with 16 on nil context!
2024-11-06 20:14:24.753 python[66752:63548878] imkxpc_getApplicationProperty:reply: called with incorrect property value 16, bailing.
2024-11-06 20:14:24.753 python[66752:63548878] Text input context does not respond to _valueForTIProperty:



0: 384x640 3 persons, 5 cars, 38.5ms
Speed: 2.4ms preprocess, 38.5ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 5 cars, 39.0ms
Speed: 1.9ms preprocess, 39.0ms inference, 0.6ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 5 cars, 39.5ms
Speed: 1.6ms preprocess, 39.5ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 5 cars, 40.9ms
Speed: 1.7ms preprocess, 40.9ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 5 cars, 36.3ms
Speed: 1.5ms preprocess, 36.3ms inference, 0.7ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 5 cars, 32.4ms
Speed: 1.5ms preprocess, 32.4ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 5 cars, 34.6ms
Speed: 1.4ms preprocess, 34.6ms inference, 0.8ms postprocess per image at shape (1, 3, 384, 640)

0: 384x640 3 persons, 5 cars, 66.0ms
Speed: 15.0ms pre

### Resultados

Esta sección se presentan los resultados obtenidos. Cargaremos el archivo CSV para revisar el recuento total de cada tipo de objeto detectado, así como los detalles de las detecciones de matrículas.

In [5]:
# Cargar el archivo CSV de resultados
import pandas as pd

results_df = pd.read_csv('detection_tracking_log.csv')
print("Resumen de detecciones por clase:")
print(results_df['object_type'].value_counts())

print("\nEjemplo de datos de detección de matrículas:")
display(results_df[results_df['object_type'] == 'car'].head())

Resumen de detecciones por clase:
object_type
person       2929
car          2024
bus           131
bicycle        40
motorbike      19
Name: count, dtype: int64

Ejemplo de datos de detección de matrículas:


Unnamed: 0,frame,object_type,confidence,tracking_id,x1,y1,x2,y2,license_plate_confidence,mx1,my1,mx2,my2,license_plate_text
618,392,car,0.78,13,1809,413,1919,548,,,,,,
622,393,car,0.79,13,1787,405,1919,541,,,,,,
626,394,car,0.79,13,1762,389,1919,541,,,,,,
630,395,car,0.84,13,1737,363,1919,538,,,,,,
634,396,car,0.87,13,1715,349,1919,535,,,,,,


## Conclusión

En esta práctica se ha desarrollado un prototipo funcional que permite:

- Detectar y seguir personas y vehículos en video.
- Detectar y leer matrículas en vehículos mediante un modelo YOLO y OCR.
- Exportar los resultados visuales en un video y los datos de detección en un archivo CSV.

Este prototipo constituye una herramienta útil para el análisis automatizado de video en aplicaciones de monitoreo y seguridad, con posibilidad de mejoras futuras en el rendimiento y precisión del OCR de matrículas.