[![Labellerr](https://storage.googleapis.com/labellerr-cdn/%200%20Labellerr%20template/notebook.webp)](https://www.labellerr.com)

# **Real-Time Pull-Up Counter using YOLO Pose Estimation**

---

[![labellerr](https://img.shields.io/badge/Labellerr-BLOG-black.svg)](https://www.labellerr.com/blog/<BLOG_NAME>)
[![Youtube](https://img.shields.io/badge/Labellerr-YouTube-b31b1b.svg)](https://www.youtube.com/@Labellerr)
[![Github](https://img.shields.io/badge/Labellerr-GitHub-green.svg)](https://github.com/Labellerr/Hands-On-Learning-in-Computer-Vision)

## Overview

This notebook demonstrates an end-to-end computer vision pipeline for automated fitness tracking using a YOLO-based Pose Estimation model. The workflow covers dataset preparation (converting JSON annotations to YOLO format), model training, and a real-time inference logic that utilizes geometric heuristics to verify form and count pull-up repetitions accurately.

#### Real-World Applications:
* **Smart Fitness Apps:** AI-powered virtual coaching for automated rep counting and form correction.
* **Gym & Performance Analytics:** Objective tracking of athlete progress and consistency in training facilities.
* **Physical Therapy & Rehab:** Monitoring patient range of motion (ROM) and recovery milestones automatically.
* **Virtual Competitions:** Automated verification of repetitions for remote fitness challenges to prevent cheating.
* **Home Workout Assistants:** Hands-free tracking for users exercising without a personal trainer.

## Import Libraries

This section imports all the required libraries used throughout the project for computer vision, visualization, deep learning, and structured coding.


In [11]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from ultralytics import YOLO

In [4]:
!git clone https://github.com/Labellerr/yolo_finetune_utils.git


fatal: destination path 'yolo_finetune_utils' already exists and is not an empty directory.


## Random Frame Extraction from Video

Extracts a fixed number of high-quality frames from one or more videos to create an image dataset for annotation and training.

### üîπ Purpose
- Convert raw manufacturing videos into individual image frames  
- Perform random sampling to avoid frame bias  
- Prepare data for annotation and YOLO training  


In [None]:
from yolo_finetune_utils.frame_extractor import extract_random_frames

extract_random_frames(
    paths=['Updated Pull Ups - Made with Clipchamp.mp4'],
    total_images=25,
    out_dir="frames",
    jpg_quality=100,
    seed=42
)

Found export-#3zxrWUBsgGneicfNw5Yj.zip. Extracting...
‚úÖ Success! Files extracted to: dataset/
You can now proceed to Step 2 (Training).


## Data Preprocessing: JSON to YOLO Conversion

### **Script Purpose**
This script is a critical ETL (Extract, Transform, Load) tool designed to prepare your custom dataset for training a YOLO11 Pose Estimation model. 

The raw data comes in a JSON format (typical of labeling tools like Labellerr or Labelbox), where keypoints are stored as absolute pixel coordinates with string labels (e.g., "left shoulder"). YOLO cannot read this format directly.

**What this script does:**
1.  **Structure Mapping:** It maps specific body part names (Nose, Shoulders, Elbows, Wrists) to the strict numerical index order required by the model.
2.  **Normalization:** It converts absolute pixel coordinates ($x, y$) into normalized values ($0.0 - 1.0$) relative to the image size. This ensures the model works regardless of image resolution.
3.  **Bounding Box Generation:** YOLO Pose requires a bounding box around every person. Since the input data only contains points, this script automatically calculates the smallest box that fits all points plus a small padding.
4.  **File Generation:** It splits the single JSON file into individual `.txt` files for every image, formatted exactly as YOLO expects:
    * Format: `<class_id> <box_x> <box_y> <box_w> <box_h> <kpt1_x> <kpt1_y> <vis> ...`

In [None]:
import json
import os

input_json_file = 'export-#ckMv3PRDGGdTI19hYySq.json' 
output_folder = 'labels'  
os.makedirs(output_folder, exist_ok=True)

KEYPOINT_ORDER = [
    "nose",
    "left shoulder",
    "right shoulder",
    "left elbow",
    "right elbow",
    "left wrist",
    "right wrist"
]

def convert_to_yolo():
    with open(input_json_file, 'r') as f:
        data = json.load(f)

    print(f"Processing {len(data)} images...")

    for entry in data:
        file_name = entry['file_name']
        
        img_w = entry['file_metadata']['image_width']
        img_h = entry['file_metadata']['image_height']
        
        keypoints_map = {} 

        if 'latest_answer' in entry and entry['latest_answer']:
            annotations = entry['latest_answer']
            
            for ann in annotations:
                if isinstance(ann['answer'], list) and len(ann['answer']) > 0:
                    pt_data = ann['answer'][0]
                    label = pt_data['label']
                    x = pt_data['answer']['x']
                    y = pt_data['answer']['y']
                    
                    keypoints_map[label] = (x, y)
        yolo_kpts = []
        valid_xs = []
        valid_ys = []

        for k_name in KEYPOINT_ORDER:
            if k_name in keypoints_map:
                x, y = keypoints_map[k_name]
                norm_x = x / img_w
                norm_y = y / img_h
                yolo_kpts.extend([f"{norm_x:.6f}", f"{norm_y:.6f}", "2"]) # 2 = visible
                
                valid_xs.append(x)
                valid_ys.append(y)
            else:
                
                yolo_kpts.extend(["0.000000", "0.000000", "0"])

        if valid_xs and valid_ys:
            min_x, max_x = min(valid_xs), max(valid_xs)
            min_y, max_y = min(valid_ys), max(valid_ys)

            pad = 20
            min_x = max(0, min_x - pad)
            min_y = max(0, min_y - pad)
            max_x = min(img_w, max_x + pad)
            max_y = min(img_h, max_y + pad)
            
            bbox_w = (max_x - min_x)
            bbox_h = (max_y - min_y)
            bbox_x_center = min_x + (bbox_w / 2)
            bbox_y_center = min_y + (bbox_h / 2)

            norm_bbox_x = bbox_x_center / img_w
            norm_bbox_y = bbox_y_center / img_h
            norm_bbox_w = bbox_w / img_w
            norm_bbox_h = bbox_h / img_h

            
            class_id = 0 
            
            line_parts = [
                str(class_id),
                f"{norm_bbox_x:.6f}",
                f"{norm_bbox_y:.6f}",
                f"{norm_bbox_w:.6f}",
                f"{norm_bbox_h:.6f}"
            ] + yolo_kpts

            line = " ".join(line_parts)

            txt_filename = os.path.splitext(file_name)[0] + ".txt"
            with open(os.path.join(output_folder, txt_filename), 'w') as out_f:
                out_f.write(line + "\n")

    print(f" Conversion complete. Labels saved in '{output_folder}/'")

if __name__ == "__main__":
    convert_to_yolo()

In [None]:
import os
import shutil
import random
import yaml


source_labels = "labels" 

source_images = "frames"  
dataset_root = "datasets/pullups"
train_ratio = 0.8 

def setup_directories():
    """Creates the YOLO standard directory structure"""
    dirs = [
        f"{dataset_root}/images/train",
        f"{dataset_root}/images/val",
        f"{dataset_root}/labels/train",
        f"{dataset_root}/labels/val"
    ]
    for d in dirs:
        os.makedirs(d, exist_ok=True)
    print(f" Created directories in {dataset_root}")

def create_yaml():
    """Generates the data.yaml file needed for training"""
    yaml_content = {
        'path': os.path.abspath(dataset_root),
        'train': 'images/train',
        'val': 'images/val',
        'names': {
            0: 'person' 
        },
        
        'kpt_shape': [7, 3],
        'flip_idx': [0, 2, 1, 4, 3, 6, 5] 
    }
    
    with open(f"{dataset_root}/data.yaml", 'w') as f:
        yaml.dump(yaml_content, f, sort_keys=False)
    print(f"‚úÖ Created data.yaml at {dataset_root}/data.yaml")

def organize_files():
    setup_directories()
    create_yaml()

    label_files = [f for f in os.listdir(source_labels) if f.endswith('.txt')]
    
    random.shuffle(label_files)
    
    split_index = int(len(label_files) * train_ratio)
    train_files = label_files[:split_index]
    val_files = label_files[split_index:]

    print(f"üîÑ Moving files: {len(train_files)} Train, {len(val_files)} Val")

    def move_batch(files, split_name):
        for label_file in files:
            
            src_lbl = os.path.join(source_labels, label_file)
            dst_lbl = os.path.join(dataset_root, "labels", split_name, label_file)
            shutil.copy(src_lbl, dst_lbl) # Using copy to be safe

            image_name = None
            base_name = os.path.splitext(label_file)[0]
            for ext in ['.jpg', '.png', '.jpeg']:
                potential_img = os.path.join(source_images, base_name + ext)
                if os.path.exists(potential_img):
                    image_name = base_name + ext
                    break
            
            if image_name:
                src_img = os.path.join(source_images, image_name)
                dst_img = os.path.join(dataset_root, "images", split_name, image_name)
                shutil.copy(src_img, dst_img)
            else:
                print(f" Warning: Image not found for {label_file}")

    move_batch(train_files, "train")
    move_batch(val_files, "val")
    print(" Dataset organization complete!")

if __name__ == "__main__":
    organize_files()

## Model Training: Fine-Tuning YOLO11 Pose

### **Script Purpose**
This script initiates the **transfer learning** process. It loads a pre-trained **YOLO11n-pose** model (which already understands general human structure) and fine-tunes it specifically on your custom dataset to accurately track the upper-body keypoints needed for pull-up analysis.

### **The Code**

In [None]:
from ultralytics import YOLO

# Load a model
model = YOLO('yolo11n-pose.pt') 

# Train the model
results = model.train(
    data='datasets/pullups/data.yaml', 
    epochs=25,                         
    imgsz=640,                         
    batch=8,
    project='pullup_project',          
    name='pose_train'                  
)

Ultralytics 8.4.8  Python-3.11.9 torch-2.9.1+cpu CPU (12th Gen Intel Core(TM) i5-1235U)
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, angle=1.0, augment=False, auto_augment=randaugment, batch=8, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, compile=False, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=datasets/pullups/data.yaml, degrees=0.0, deterministic=True, device=cpu, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, end2end=None, epochs=25, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolo11n-pose.pt, momentum=0.937, mosaic=1.0, multi_scale=0.0, name=pose_train2, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, p

# Step 1: System Calibration (Setting the Bar Height)

## Overview
This script acts as the **calibration phase** of the project. Before the AI can count pull-ups, it needs to know *where* the pull-up bar is located in the video frame.

Instead of hardcoding a pixel coordinate (which would break if the camera moves), this code opens an interactive window that allows the user to **click on the bar**. The Y-coordinate of that click is saved and used as the threshold for counting repetitions.

### **Key Features**
* **Interactive UI:** Uses OpenCV's mouse callback functionality to detect user clicks.
* **Visual Feedback:** Displays a static frame of the video with on-screen instructions.
* **Fail-Safe:** Loops indefinitely until a valid point is selected or the user cancels.

In [2]:
import cv2
import numpy as np

video_source = "Updated Pull Ups - Made with Clipchamp.mp4" 

rod_y = None 

def set_rod_height(event, x, y, flags, param):
    """Mouse callback to set the height of the pull-up bar"""
    global rod_y
    if event == cv2.EVENT_LBUTTONDOWN:
        rod_y = y
        print(f" Rod Height Set at Y={rod_y}")

cap = cv2.VideoCapture(video_source)

if not cap.isOpened():
    print(f" Error: Could not open '{video_source}'")
else:
    success, frame = cap.read()
    if success:
        window_name = 'Set Rod Height'
        cv2.namedWindow(window_name)
        cv2.setMouseCallback(window_name, set_rod_height)
        
        print(f" INSTRUCTION: A window has opened. CLICK the Pull-Up Bar to set the line.")
        
        while rod_y is None:
            
            display_frame = frame.copy()
            cv2.putText(display_frame, "CLICK BAR TO SET HEIGHT", (50, 50), 
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
            cv2.imshow(window_name, display_frame)
            
            if cv2.waitKey(10) == 27: 
                print("Setup cancelled.")
                break
        
        cv2.destroyWindow(window_name)
        cap.release()
        print(f" Configuration Complete. Rod Y-Coordinate is: {rod_y}")

 INSTRUCTION: A window has opened. CLICK the Pull-Up Bar to set the line.
 Rod Height Set at Y=206
 Configuration Complete. Rod Y-Coordinate is: 206


* **Purpose:** To verify full range of motion. A pull-up is only valid if the user starts from a "dead hang" (arms straight, angle $\approx 180^\circ$).



#### **2. The State Machine (Counting Logic)**
Instead of just counting every time the head moves up, the code uses a **State Machine** to prevent false positives (like half-reps or jitter):
* **State: "DOWN"**
    * Triggered when the average arm angle exceeds **160 degrees**.
    * This ensures the user has fully extended their arms before attempting a rep.
* **State: "UP" (The Count)**
    * Triggered *only if* the current state is "DOWN" **AND** the Nose Y-coordinate goes *above* the Bar Y-coordinate (`nose[1] < rod_y`).
    * This increments the counter and locks the state to "UP" until the user extends their arms again.



#### **3. Visualization Pipeline**
The script overlays rich visual feedback onto the video for debugging and user experience:
* **Skeleton Tracking:** Draws lines between joints (Shoulder $\to$ Elbow $\to$ Wrist) using `cv2.line`.
* **Joint Markers:** Places yellow circles on key joints using `cv2.circle`.
* **Live Metrics:** Displays the real-time elbow angle and rep count directly on the screen.
* **Dynamic Bar Line:** The horizontal bar line changes color from **Red** (Down/Reset) to **Green** (Up/Success).

#### **4. Output Management**
* **Video Writer:** It initializes `cv2.VideoWriter` to save the processed video with all overlays to `output_inference.mp4`.
* **Resource Management:** A `try...finally` block ensures that the video file is properly saved and closed, even if the script is interrupted or encounters an error.

In [None]:
import cv2
import numpy as np
from ultralytics import YOLO

model_path = 'runs/pose/pullup_project/pose_train2/weights/best.pt'
if rod_y is None:
    print(" Error: Please run Cell 1 first to set the rod height!")
else:
    print("Loading YOLO Model...")
    model = YOLO(model_path)
    print(" Model Loaded.")
    output_filename = "outputinference.mp4"
    
    count = 0
    stage = "down"
    def calculate_angle(a, b, c):
        """Calculates angle between three points (a, b, c)"""
        a, b, c = np.array(a), np.array(b), np.array(c)
        radians = np.arctan2(c[1] - b[1], c[0] - b[0]) - np.arctan2(a[1] - b[1], a[0] - b[0])
        angle = np.abs(radians * 180.0 / np.pi)
        if angle > 180.0: angle = 360 - angle
        return angle

    cap = cv2.VideoCapture(video_source)
    
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    fps = int(cap.get(cv2.CAP_PROP_FPS))
    out = cv2.VideoWriter(output_filename, cv2.VideoWriter_fourcc(*'mp4v'), fps, (width, height))

    print(f" Starting Inference... Output will be saved to '{output_filename}'")

    try:
        while cap.isOpened():
            ret, frame = cap.read()
            if not ret: break

            # 1. Draw Rod Line (FIXED LINE BELOW)
            line_color = (0, 255, 0) if stage == "up" else (0, 0, 255)
            cv2.line(frame, (0, rod_y), (width, rod_y), line_color, 3)

            # 2. Run AI
            results = model(frame, verbose=False)
            
            for r in results:
                if r.keypoints and len(r.keypoints.data) > 0:
                    kp = r.keypoints.data[0].cpu().numpy()
                    
                    if kp[5][2] > 0.5: 
                        nose = kp[0][:2]
                        l_sh, l_elb, l_wr = kp[5][:2], kp[7][:2], kp[9][:2]
                        r_sh, r_elb, r_wr = kp[6][:2], kp[8][:2], kp[10][:2]

                        left_angle = calculate_angle(l_sh, l_elb, l_wr)
                        right_angle = calculate_angle(r_sh, r_elb, r_wr)
                        avg_angle = int((left_angle + right_angle) / 2)

                        
                        cv2.line(frame, (int(l_sh[0]), int(l_sh[1])), (int(l_elb[0]), int(l_elb[1])), (255, 0, 255), 3)
                        cv2.line(frame, (int(l_elb[0]), int(l_elb[1])), (int(l_wr[0]), int(l_wr[1])), (255, 0, 255), 3)
                        cv2.line(frame, (int(r_sh[0]), int(r_sh[1])), (int(r_elb[0]), int(r_elb[1])), (255, 255, 0), 3)
                        cv2.line(frame, (int(r_elb[0]), int(r_elb[1])), (int(r_wr[0]), int(r_wr[1])), (255, 255, 0), 3)

                        for j in [l_sh, l_elb, l_wr, r_sh, r_elb, r_wr]:
                            cv2.circle(frame, (int(j[0]), int(j[1])), 6, (0, 255, 255), -1)

                        cv2.putText(frame, f"{avg_angle}", (int(l_elb[0] - 40), int(l_elb[1])), 
                                   cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)

                        if avg_angle > 160: stage = "down"
                        
                        if nose[1] < rod_y and stage == "down":
                            stage = "up"
                            count += 1
                            print(f" Rep {count} (Angle: {avg_angle})")

            cv2.rectangle(frame, (0, 0), (220, 90), (245, 117, 16), -1)
            cv2.putText(frame, "REPS", (15, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 0), 1)
            cv2.putText(frame, str(count), (15, 80), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255, 255, 255), 3)
            
            cv2.putText(frame, "STAGE", (100, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 0), 1)
            cv2.putText(frame, stage, (95, 80), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255, 255, 255), 3)

            out.write(frame)
            cv2.imshow('Pull-up AI', frame)

            if cv2.waitKey(1) & 0xFF == ord('q'):
                break

    finally:
        cap.release()
        out.release()
        cv2.destroyAllWindows()
        print(f" Done! Saved as '{output_filename}'")

‚è≥ Loading YOLO Model...
‚úÖ Model Loaded.
üöÄ Starting Inference... Output will be saved to 'output_inference.mp4'
üí™ Rep 1 (Angle: 31)
üí™ Rep 2 (Angle: 27)
üí™ Rep 3 (Angle: 22)
üí™ Rep 4 (Angle: 40)
üí™ Rep 5 (Angle: 42)
üí™ Rep 6 (Angle: 41)
üí™ Rep 7 (Angle: 50)
üí™ Rep 8 (Angle: 50)
üí™ Rep 9 (Angle: 42)
üí™ Rep 10 (Angle: 46)
üí™ Rep 11 (Angle: 48)
üí™ Rep 12 (Angle: 45)
üí™ Rep 13 (Angle: 39)
üí™ Rep 14 (Angle: 41)
‚úÖ Done! Saved as 'output_inference.mp4'


---

## üë®‚Äçüíª About Labellerr's Hands-On Learning in Computer Vision

Thank you for exploring this **Labellerr Hands-On Computer Vision Cookbook**! We hope this notebook helped you learn, prototype, and accelerate your vision projects.  
Labellerr provides ready-to-run Jupyter/Colab notebooks for the latest models and real-world use cases in computer vision, AI agents, and data annotation.

---
## üßë‚Äçüî¨ Check Our Popular Youtube Videos

Whether you're a beginner or a practitioner, our hands-on training videos are perfect for learning custom model building, computer vision techniques, and applied AI:

- [How to Fine-Tune YOLO on Custom Dataset](https://www.youtube.com/watch?v=pBLWOe01QXU)  
  Step-by-step guide to fine-tuning YOLO for real-world use‚Äîenvironment setup, annotation, training, validation, and inference.
- [Build a Real-Time Intrusion Detection System with YOLO](https://www.youtube.com/watch?v=kwQeokYDVcE)  
  Create an AI-powered system to detect intruders in real time using YOLO and computer vision.
- [Finding Athlete Speed Using YOLO](https://www.youtube.com/watch?v=txW0CQe_pw0)  
  Estimate real-time speed of athletes for sports analytics.
- [Object Counting Using AI](https://www.youtube.com/watch?v=smsjBBQcIUQ)  
  Learn dataset curation, annotation, and training for robust object counting AI applications.
---

## üé¶ Popular Labellerr YouTube Videos

Level up your skills and see video walkthroughs of these tools and notebooks on the  
[Labellerr YouTube Channel](https://www.youtube.com/@Labellerr/videos):

- [How I Fixed My Biggest Annotation Nightmare with Labellerr](https://www.youtube.com/watch?v=hlcFdiuz_HI) ‚Äì Solving complex annotation for ML engineers.
- [Explore Your Dataset with Labellerr's AI](https://www.youtube.com/watch?v=LdbRXYWVyN0) ‚Äì Auto-tagging, object counting, image descriptions, and dataset exploration.
- [Boost AI Image Annotation 10X with Labellerr's CLIP Mode](https://www.youtube.com/watch?v=pY_o4EvYMz8) ‚Äì Refine annotations with precision using CLIP mode.
- [Boost Data Annotation Accuracy and Efficiency with Active Learning](https://www.youtube.com/watch?v=lAYu-ewIhTE) ‚Äì Speed up your annotation workflow using Active Learning.

> üëâ **Subscribe** for Labellerr's deep learning, annotation, and AI tutorials, or watch videos directly alongside notebooks!

---

## ü§ù Stay Connected

- **Website:** [https://www.labellerr.com/](https://www.labellerr.com/)
- **Blog:** [https://www.labellerr.com/blog/](https://www.labellerr.com/blog/)
- **GitHub:** [Labellerr/Hands-On-Learning-in-Computer-Vision](https://github.com/Labellerr/Hands-On-Learning-in-Computer-Vision)
- **LinkedIn:** [Labellerr](https://in.linkedin.com/company/labellerr)
- **Twitter/X:** [@Labellerr1](https://x.com/Labellerr1)

*Happy learning and building with Labellerr!*
