### How It Works:
#### YOLO Loading:
The **yolov3.weights and yolov3.cfg files** are loaded using OpenCV’s cv2.dnn.readNet() function. The coco.names file is used to map class IDs to class names.

1. Blob Creation:

The frame from the video is converted into a blob format (a format that YOLO requires for input). It normalizes the pixel values and resizes the frame to 416x416 for input into the neural network.

2. YOLO Forward Pass:

The forward pass is run on the YOLO model, generating predictions for each detected object in the frame.

3. Filtering Predictions:

Only predictions with a confidence level higher than 0.5 are considered, and only the "person" class (class_id = 0) is used for face detection.

4. Non-Maxima Suppression (NMS):

This is used to remove redundant boxes that overlap too much with lower confidence.

5. Bounding Boxes:

Bounding boxes are drawn on the detected "person" in the video feed, displaying the label (in this case "person") and the confidence score.

## Requirements:
1. OpenCV (pip install opencv-python)
2. OpenCV DNN module support
3. YOLOv3 weights, configuration files, and COCO data (as mentioned earlier)

In [4]:
import cv2
import numpy as np

# Load YOLO
net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
layer_names = net.getLayerNames()

# The corrected way to extract the output layers
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]

In [6]:
# Load COCO class labels
with open("coco.data", "r") as f:
    classes = [line.strip() for line in f.readlines()]

# Load the video from webcam (use 'video.mp4' for video file)
cap = cv2.VideoCapture(0)

while True:
    # Capture frame-by-frame
    ret, frame = cap.read()
    if not ret:
        print("Failed to capture video")
        break

    height, width, channels = frame.shape

    # Prepare the image for YOLO
    blob = cv2.dnn.blobFromImage(frame, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
    net.setInput(blob)
    outs = net.forward(output_layers)

    # Information to show on the screen
    class_ids = []
    confidences = []
    boxes = []

    # Loop over each detection from YOLO
    for out in outs:
        for detection in out:
            scores = detection[5:]  # Skip the first 5 values (box coordinates + confidence)
            class_id = np.argmax(scores)
            confidence = scores[class_id]

            # Filter out weak detections by ensuring the confidence is above a threshold
            if confidence > 0.5 and class_id == 0:  # Class '0' is 'person' in COCO dataset
                # Object detected
                center_x = int(detection[0] * width)
                center_y = int(detection[1] * height)
                w = int(detection[2] * width)
                h = int(detection[3] * height)

                # Rectangle coordinates
                x = int(center_x - w / 2)
                y = int(center_y - h / 2)

                boxes.append([x, y, w, h])
                confidences.append(float(confidence))
                class_ids.append(class_id)

    # Apply non-maxima suppression to eliminate redundant overlapping boxes
    indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

    # Draw the resulting bounding boxes
    if len(indexes) > 0:
        for i in indexes.flatten():
            x, y, w, h = boxes[i]
            label = str(classes[class_ids[i]])
            confidence = confidences[i]
            color = (0, 255, 0)  # Green bounding box
            cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
            cv2.putText(frame, f"{label} {confidence:.2f}", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 1, color, 2)

    # Display the resulting frame
    cv2.imshow('YOLO Face Detection', frame)

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the video capture object and close display windows
cap.release()
cv2.destroyAllWindows()