The location of keypoints are usually represented as a set of 2D [x, y] or 3D [x, y, visible] coordinates.

YOLOv8 pose models use the -pose suffix, i.e. yolov8n-pose.pt. These models are trained on COCO keypoints dataset and are suitable for a varity of pose estimation tasks.

In [4]:
from ultralytics import YOLO

In [5]:
# load a model
model = YOLO('yolov8n-pose.pt') # load a pretrained model(recommanded for training)

Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolov8n-pose.pt to 'yolov8n-pose.pt'...


100%|█████████████████████████████████████████████████████████████████████████████| 6.52M/6.52M [00:01<00:00, 5.20MB/s]


In [6]:
# Predict with the model
# result = model('https://ultralytics.com/images/bus.jpg') # predict on an image

source = 'https://ultralytics.com/images/bus.jpg'
model.predict(source, save=True, imgsz=320, conf=0.5)


Downloading https://ultralytics.com/images/bus.jpg to 'bus.jpg'...


100%|████████████████████████████████████████████████████████████████████████████████| 134k/134k [00:00<00:00, 678kB/s]


image 1/1 F:\ai\AI-Development\Yolo v8\3. Keypoint Detection in YOLOv8 - Pose or YOLOv8 pose estimation\bus.jpg: 320x256 2 persons, 103.9ms
Speed: 3.9ms preprocess, 103.9ms inference, 104.2ms postprocess per image at shape (1, 3, 320, 256)
Results saved to [1mruns\pose\predict[0m


[ultralytics.engine.results.Results object with attributes:
 
 boxes: ultralytics.engine.results.Boxes object
 keypoints: ultralytics.engine.results.Keypoints object
 masks: None
 names: {0: 'person'}
 obb: None
 orig_img: array([[[119, 146, 172],
         [121, 148, 174],
         [122, 152, 177],
         ...,
         [161, 171, 188],
         [160, 170, 187],
         [160, 170, 187]],
 
        [[120, 147, 173],
         [122, 149, 175],
         [123, 153, 178],
         ...,
         [161, 171, 188],
         [160, 170, 187],
         [160, 170, 187]],
 
        [[123, 150, 176],
         [124, 151, 177],
         [125, 155, 180],
         ...,
         [161, 171, 188],
         [160, 170, 187],
         [160, 170, 187]],
 
        ...,
 
        [[183, 182, 186],
         [179, 178, 182],
         [180, 179, 183],
         ...,
         [121, 111, 117],
         [113, 103, 109],
         [115, 105, 111]],
 
        [[165, 164, 168],
         [173, 172, 176],
         [187, 186,

In [7]:
# predicting on video
source='video.mp4'

model.predict(source, save=True, imgsz=320, conf=0.5)



errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

video 1/1 (frame 1/1806) F:\ai\AI-Development\Yolo v8\3. Keypoint Detection in YOLOv8 - Pose or YOLOv8 pose estimation\video.mp4: 192x320 2 persons, 125.2ms
video 1/1 (frame 2/1806) F:\ai\AI-Development\Yolo v8\3. Keypoint Detection in YOLOv8 - Pose or YOLOv8 pose estimation\video.mp4: 192x320 2 persons, 38.8ms
video 1/1 (frame 3/1806) F:\ai\AI-Development\Yolo v8\3. Keypoint Detection in YOLOv8 - Pose or YOLOv8 pose estimation\video.mp4: 192x320 2 persons, 26.5ms
video 1/1 (frame 4/1806) F:\ai\AI-Development\Yolo v8\3. Keypoint Detection in YOLOv8 - Pose or YOLOv8 

[ultralytics.engine.results.Results object with attributes:
 
 boxes: ultralytics.engine.results.Boxes object
 keypoints: ultralytics.engine.results.Keypoints object
 masks: None
 names: {0: 'person'}
 obb: None
 orig_img: array([[[ 98,  72, 113],
         [ 98,  72, 113],
         [ 97,  71, 112],
         ...,
         [ 95,  61,  80],
         [ 95,  61,  80],
         [ 95,  61,  80]],
 
        [[ 98,  72, 113],
         [ 98,  72, 113],
         [ 97,  71, 112],
         ...,
         [ 95,  61,  80],
         [ 95,  61,  80],
         [ 95,  61,  80]],
 
        [[ 97,  71, 112],
         [ 97,  71, 112],
         [ 96,  70, 111],
         ...,
         [ 95,  61,  80],
         [ 95,  61,  80],
         [ 95,  61,  80]],
 
        ...,
 
        [[  5, 133, 129],
         [  5, 133, 129],
         [  5, 133, 129],
         ...,
         [166, 164, 142],
         [166, 164, 142],
         [166, 164, 142]],
 
        [[  5, 133, 129],
         [  5, 133, 129],
         [  5, 133,

In [8]:
# To use web cam for this use source as 0
source = 0
model.predict(source, save=True, imgsz=320, conf=0.5)


1/1: 0... Success  (inf frames of shape 640x480 at 30.00 FPS)


errors for large sources or long-running streams and videos. See https://docs.ultralytics.com/modes/predict/ for help.

Example:
    results = model(source=..., stream=True)  # generator of Results objects
    for r in results:
        boxes = r.boxes  # Boxes object for bbox outputs
        masks = r.masks  # Masks object for segment masks outputs
        probs = r.probs  # Class probabilities for classification outputs

0: 256x320 1 person, 539.7ms
0: 256x320 1 person, 13.0ms
0: 256x320 1 person, 11.8ms
0: 256x320 1 person, 20.3ms
0: 256x320 1 person, 20.0ms
0: 256x320 1 person, 15.9ms
0: 256x320 1 person, 10.8ms
0: 256x320 1 person, 13.3ms
0: 256x320 1 person, 13.9ms
0: 256x320 1 person, 13.0ms
0: 256x320 1 person, 17.2ms
0: 256x320 1 person, 15.0ms
0: 256x320 1 person, 30.0ms
0: 256x320 1 person, 28.2ms
0: 256x320 1 person, 12.1ms
0: 256x320 1 person, 21.0ms
0: 256x320 1 person, 20.8ms
0: 256x320 1 person, 34.0ms
0: 2


KeyboardInterrupt



In [2]:
# opening web cam interface

import cv2
from ultralytics import YOLO

# load the YOLOv8 model
model = YOLO('yolov8n-pose.pt')

# open the video file
# video_path = 'path/to/video.mp4'

video_path = 0  # for web cam
cap = cv2.VideoCapture(video_path)

# Loop through the cideo frames
while cap.isOpened():
    # Read a frame from the video
    success, frame = cap.read()

    if success:
        # Run YOLOv8 inference on the frame
        results = model(frame, save=True)

        # Visualize the results on the frame
        annotated_frame = results[0].plot()

        # display the annotated frame
        cv2.imshow('YOLOv8 Interference', annotated_frame)

        # Break the loop if 'q' is pressed
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    else:
        # break the loop if the end of the video is reached
        break

# reload the video capture object and close the display window
cap.release()
cv2.destroyAllWindows()


0: 480x640 1 person, 14.5ms
Speed: 2.7ms preprocess, 14.5ms inference, 3.3ms postprocess per image at shape (1, 3, 480, 640)
Results saved to [1mruns\pose\predict9[0m

0: 480x640 1 person, 13.9ms
Speed: 1.7ms preprocess, 13.9ms inference, 2.3ms postprocess per image at shape (1, 3, 480, 640)
Results saved to [1mruns\pose\predict9[0m

0: 480x640 1 person, 10.2ms
Speed: 1.6ms preprocess, 10.2ms inference, 1.8ms postprocess per image at shape (1, 3, 480, 640)
Results saved to [1mruns\pose\predict9[0m

0: 480x640 1 person, 11.1ms
Speed: 2.7ms preprocess, 11.1ms inference, 2.7ms postprocess per image at shape (1, 3, 480, 640)
Results saved to [1mruns\pose\predict9[0m

0: 480x640 1 person, 12.2ms
Speed: 2.5ms preprocess, 12.2ms inference, 1.7ms postprocess per image at shape (1, 3, 480, 640)
Results saved to [1mruns\pose\predict9[0m

0: 480x640 1 person, 8.3ms
Speed: 1.5ms preprocess, 8.3ms inference, 2.0ms postprocess per image at shape (1, 3, 480, 640)
Results saved to [1mruns\