# DLC to YOLO Pose Estimation Data Conversion

This notebook converts DeepLabCut (DLC) pose estimation labeled data to YOLO format for pose estimation model training.

## Overview
- **Source**: DLC_model/labeled-data/ (subdirectories with images and CSV files)
- **Target**: labeled-data/ (images with unique names and individual YOLO label files)
- **Key Tasks**: 
  - Rename images for global uniqueness
  - Convert CSV keypoint data to YOLO format
  - Generate individual label files for each image

## 1. Import Required Libraries

In [2]:
import shutil
import warnings
from pathlib import Path

import cv2
import pandas as pd

warnings.filterwarnings("ignore")

print("Libraries imported successfully!")

Libraries imported successfully!


## 2. Define Directory Paths and Configuration

In [3]:
# Define paths
SOURCE_DIR = Path("./DLC_model/labeled-data").resolve()  # Absolute path for source directory
TARGET_DIR = Path("./labeled-data").resolve()  # Absolute path for target directory
TARGET_IMAGES_DIR = TARGET_DIR / "images"
TARGET_LABELS_DIR = TARGET_DIR / "labels"

# Create target directories
TARGET_DIR.mkdir(exist_ok=True)
TARGET_IMAGES_DIR.mkdir(exist_ok=True)
TARGET_LABELS_DIR.mkdir(exist_ok=True)

# Configuration
BODYPARTS = ["L_antenna", "R_antenna", "L_mandible", "R_mandible", "Top_prob", "Tube_prob", "End_prob"]
CLASS_ID = 0  # Single class for pose estimation

print(f"Source directory: {SOURCE_DIR}")
print(f"Target directory: {TARGET_DIR}")
print(f"Body parts to convert: {BODYPARTS}")
print(f"Number of keypoints: {len(BODYPARTS)}")

Source directory: C:\Users\bee-ops\code\Choice-assay\DLC_model\labeled-data
Target directory: C:\Users\bee-ops\code\Choice-assay\labeled-data
Body parts to convert: ['L_antenna', 'R_antenna', 'L_mandible', 'R_mandible', 'Top_prob', 'Tube_prob', 'End_prob']
Number of keypoints: 7


In [4]:
# Clean up existing files before conversion
print("=== Cleaning Target Directories ===")

# Remove old files if they exist
if TARGET_IMAGES_DIR.exists():
    for file in TARGET_IMAGES_DIR.glob("*"):
        if file.is_file():
            file.unlink()
    print(f"Cleaned {TARGET_IMAGES_DIR}")

if TARGET_LABELS_DIR.exists():
    for file in TARGET_LABELS_DIR.glob("*"):
        if file.is_file():
            file.unlink()
    print(f"Cleaned {TARGET_LABELS_DIR}")

print("✅ Directories cleaned and ready for conversion")

=== Cleaning Target Directories ===
Cleaned C:\Users\bee-ops\code\Choice-assay\labeled-data\images
Cleaned C:\Users\bee-ops\code\Choice-assay\labeled-data\labels
✅ Directories cleaned and ready for conversion


## 3. Explore DLC Data Structure

In [5]:
# Scan DLC data structure
subdirs = [d for d in SOURCE_DIR.iterdir() if d.is_dir()]
print(f"Found {len(subdirs)} subdirectories:")

total_images = 0
total_csv_files = 0

for subdir in subdirs[:5]:  # Show first 5 as example
    images = list(subdir.glob("*.png")) + list(subdir.glob("*.jpg")) + list(subdir.glob("*.jpeg"))
    csv_files = list(subdir.glob("*.csv"))

    print(f"  {subdir.name}: {len(images)} images, {len(csv_files)} CSV files")
    total_images += len(images)
    total_csv_files += len(csv_files)

if len(subdirs) > 5:
    print(f"  ... and {len(subdirs) - 5} more subdirectories")

# Count all images and CSV files
for subdir in subdirs[5:]:
    images = list(subdir.glob("*.png")) + list(subdir.glob("*.jpg")) + list(subdir.glob("*.jpeg"))
    csv_files = list(subdir.glob("*.csv"))
    total_images += len(images)
    total_csv_files += len(csv_files)

print(f"\nTotal: {total_images} images, {total_csv_files} CSV files across {len(subdirs)} subdirectories")

Found 196 subdirectories:
  left_10-06-2024_16-40-41: 5 images, 1 CSV files
  left_10-06-2024_16-55-06: 5 images, 1 CSV files
  left_10-06-2024_16-55-21: 5 images, 1 CSV files
  left_10-06-2024_17-01-03: 5 images, 1 CSV files
  left_10-06-2024_17-01-17: 5 images, 1 CSV files
  ... and 191 more subdirectories

Total: 1071 images, 195 CSV files across 196 subdirectories


## 4. Create Image Renaming Functions

In [6]:
def generate_unique_image_name(subdir_name, image_name):
    """Generate a unique image name by combining subdirectory name with image name"""
    # Remove file extension
    name_without_ext = Path(image_name).stem
    extension = Path(image_name).suffix

    # Create unique name: subdir_imagename.ext
    unique_name = f"{subdir_name}_{name_without_ext}{extension}"

    return unique_name


def get_image_dimensions(image_path):
    """Get image width and height using OpenCV"""
    try:
        img = cv2.imread(str(image_path))
        if img is not None:
            height, width = img.shape[:2]
            return width, height  # Return (width, height) to match previous PIL behavior
        else:
            print(f"Error: Could not read image {image_path}")
            return None, None
    except Exception as e:
        print(f"Error reading image {image_path}: {e}")
        return None, None


# Test the function with an example
test_subdir = "left_10-06-2024_16-40-41"
test_image = "img05.png"
test_unique_name = generate_unique_image_name(test_subdir, test_image)
print(f"Original: {test_image}")
print(f"Unique name: {test_unique_name}")

Original: img05.png
Unique name: left_10-06-2024_16-40-41_img05.png


## 5. Extract and Parse CSV Label Data

In [7]:
def parse_dlc_csv(csv_path):
    """Parse DLC CSV file and extract keypoint coordinates (x, y only)."""

    try:
        # Read DLC CSV with multi-row header: scorer/bodyparts/coords
        df = pd.read_csv(csv_path, header=[0, 1, 2])

        # Keep current notebook behavior of skipping initial rows if present
        data_rows = df.iloc[3:]

        # Build a bodypart -> {x: col_idx, y: col_idx, confidence: col_idx} map
        coord_indices = {}
        confidence_aliases = {"likelihood", "confidence", "conf"}
        for col_idx, col in enumerate(df.columns):
            if not isinstance(col, tuple) or len(col) < 3:
                continue

            bodypart = str(col[1]).strip()
            coord_name = str(col[2]).strip().lower()

            if bodypart in BODYPARTS:
                if coord_name in {"x", "y"}:
                    coord_indices.setdefault(bodypart, {})[coord_name] = col_idx
                elif coord_name in confidence_aliases:
                    coord_indices.setdefault(bodypart, {})["confidence"] = col_idx

        parsed_data = []

        for idx, row in data_rows.iterrows():
            try:
                # Metadata columns in DLC exports are expected at positions 1 and 2
                if len(row) < 3:
                    continue

                subdir_name = row.iloc[1]
                raw_image_name = row.iloc[2]

                if pd.isna(raw_image_name) or str(raw_image_name).strip() == "":
                    continue
                if pd.isna(subdir_name) or str(subdir_name).strip() == "":
                    continue

                image_name = generate_unique_image_name(str(subdir_name), str(raw_image_name))

                # Extract keypoint coordinates in BODYPARTS order
                keypoints = []
                for bodypart in BODYPARTS:
                    part_indices = coord_indices.get(bodypart, {})
                    x_idx = part_indices.get("x")
                    y_idx = part_indices.get("y")
                    conf_idx = part_indices.get("confidence")

                    if x_idx is not None and y_idx is not None and x_idx < len(row) and y_idx < len(row):
                        x = row.iloc[x_idx]
                        y = row.iloc[y_idx]

                        confidence = 1.0  # Default confidence if no column is found
                        if conf_idx is not None and conf_idx < len(row):
                            confidence = row.iloc[conf_idx]

                        # Handle missing values and apply confidence threshold
                        if pd.notna(x) and pd.notna(y) and x != "" and y != "":
                            if confidence is not None and pd.notna(confidence) and confidence != "":
                                if float(confidence) > 0.5:
                                    keypoints.append((float(x), float(y), 2))  # 2 = visible
                                else:
                                    keypoints.append((0, 0, 0))  # Below confidence threshold
                            else:
                                # No confidence column/value available, keep existing x/y behavior
                                keypoints.append((float(x), float(y), 2))
                        else:
                            keypoints.append((0, 0, 0))  # 0 = not visible
                    else:
                        keypoints.append((0, 0, 0))  # Missing x/y columns for this bodypart

                parsed_data.append({"image_name": image_name, "keypoints": keypoints})

            except Exception as e:
                print(f"Error parsing row {idx}: {e}")
                continue

        return parsed_data

    except Exception as e:
        print(f"Error reading CSV file {csv_path}: {e}")
        return []


# Test with the first CSV file
test_csv = SOURCE_DIR / "left_10-06-2024_16-40-41" / "CollectedData_Anna.csv"
if test_csv.exists():
    test_data = parse_dlc_csv(test_csv)
    print(f"Parsed {len(test_data)} entries from test CSV")
    if test_data:
        print(f"First entry: {test_data[0]['image_name']}")
        print(f"Keypoints: {len(test_data[0]['keypoints'])}")
        print(f"Sample keypoint: {test_data[0]['keypoints'][0]}")
else:
    print("Test CSV file not found")

Parsed 1 entries from test CSV
First entry: left_10-06-2024_16-40-41_img30.png
Keypoints: 7
Sample keypoint: (39.03741747241937, 104.16974629530438, 2)


## 6. Convert DLC Labels to YOLO Format

In [8]:
def dlc_to_yolo_format(keypoints, image_width, image_height):
    """Convert DLC keypoints to YOLO pose format (Ultralytics)"""

    if image_width <= 0 or image_height <= 0:
        return None

    # Calculate bounding box from visible keypoints
    visible_keypoints = [(x, y) for x, y, v in keypoints if v > 0]

    if not visible_keypoints:
        # No visible keypoints - create a minimal bounding box
        bbox_cx, bbox_cy, bbox_w, bbox_h = 0.5, 0.5, 0.0, 0.0
    else:
        # Calculate bounding box from visible keypoints
        x_coords = [x / image_width for x, y in visible_keypoints]
        y_coords = [y / image_height for x, y in visible_keypoints]

        min_x, max_x = min(x_coords), max(x_coords)
        min_y, max_y = min(y_coords), max(y_coords)

        # Add small margin to bounding box
        margin = 0.02  # 2% margin
        min_x = max(0, min_x - margin)
        min_y = max(0, min_y - margin)
        max_x = min(1, max_x + margin)
        max_y = min(1, max_y + margin)

        # Calculate center and size
        bbox_cx = (min_x + max_x) / 2
        bbox_cy = (min_y + max_y) / 2
        bbox_w = max_x - min_x
        bbox_h = max_y - min_y

    # Start with class ID and bounding box
    yolo_line_parts = [str(CLASS_ID), f"{bbox_cx:.6f}", f"{bbox_cy:.6f}", f"{bbox_w:.6f}", f"{bbox_h:.6f}"]

    # Add keypoints
    for x, y, visibility in keypoints:
        # Normalize coordinates to 0-1 range
        norm_x = x / image_width
        norm_y = y / image_height

        # Clamp to [0, 1] range
        norm_x = max(0, min(1, norm_x))
        norm_y = max(0, min(1, norm_y))

        # Add to YOLO format: x, y, visibility
        yolo_line_parts.extend([f"{norm_x:.6f}", f"{norm_y:.6f}", str(int(visibility))])

    return " ".join(yolo_line_parts) + "\n"


def create_yolo_label_content(parsed_data, image_name, image_width, image_height):
    """Create YOLO label content for a specific image"""

    # Find the data for this image
    for entry in parsed_data:
        if entry["image_name"] == image_name:
            return dlc_to_yolo_format(entry["keypoints"], image_width, image_height)

    return None


# Test the conversion
if test_data and len(test_data) > 0:
    # Get test image dimensions
    test_image_path = SOURCE_DIR / "left_10-06-2024_16-40-41" / test_data[0]["image_name"]
    if test_image_path.exists():
        width, height = get_image_dimensions(test_image_path)
        if width and height:
            yolo_content = dlc_to_yolo_format(test_data[0]["keypoints"], width, height)
            print(f"Test image: {test_data[0]['image_name']}")
            print(f"Dimensions: {width} x {height}")
            print("YOLO Ultralytics Pose format:")
            print(yolo_content)

            # Validate format
            parts = yolo_content.strip().split()
            if len(parts) >= 5:
                print("Format check:")
                print(f"  Class ID: {parts[0]}")
                print(f"  Bounding box: cx={parts[1]}, cy={parts[2]}, w={parts[3]}, h={parts[4]}")
                keypoint_data = parts[5:]
                print(f"  Keypoints: {len(keypoint_data) // 3} points ({len(keypoint_data)} values)")
                print(f"  Expected: {len(BODYPARTS)} points ({len(BODYPARTS) * 3} values)")
        else:
            print("Could not get image dimensions")
    else:
        print("Test image file not found")
else:
    print("No test data available")

Test image file not found


## 7. Main Conversion Process

In [9]:
def convert_dlc_to_yolo():
    """Main function to convert all DLC data to YOLO format"""

    total_processed = 0
    total_successful = 0
    total_errors = 0

    print("Starting DLC to YOLO conversion...")

    # Process each subdirectory
    for subdir in subdirs:
        print(f"\nProcessing {subdir.name}...")

        # Find CSV file
        csv_files = list(subdir.glob("*.csv"))
        if not csv_files:
            print(f"  No CSV file found in {subdir.name}")
            continue

        csv_file = csv_files[0]  # Take the first CSV file
        print(f"  Reading labels from {csv_file.name}")

        # Parse CSV data
        parsed_data = parse_dlc_csv(csv_file)
        if not parsed_data:
            print("  No data found in CSV file")
            continue

        # Find all images in subdirectory
        image_files = list(subdir.glob("*.png")) + list(subdir.glob("*.jpg")) + list(subdir.glob("*.jpeg"))

        print(f"  Found {len(image_files)} images")

        subdir_processed = 0
        subdir_successful = 0

        for image_file in image_files:
            try:
                total_processed += 1
                subdir_processed += 1

                new_name = generate_unique_image_name(subdir.name, image_file.name)
                new_path = TARGET_IMAGES_DIR / new_name

                # Copy image with new name
                shutil.copy2(image_file, new_path)

                # Get image dimensions
                width, height = get_image_dimensions(new_path)
                if not width or not height:
                    print(f"    Warning: Could not get dimensions for {new_path.name}")
                    continue

                # Create YOLO label
                yolo_content = create_yolo_label_content(parsed_data, new_path.name, width, height)
                if yolo_content:
                    # Write label file with corresponding numeric name
                    label_name = f"{new_path.stem}.txt"
                    label_path = TARGET_LABELS_DIR / label_name

                    with open(label_path, "w") as f:
                        f.write(yolo_content)

                    total_successful += 1
                    subdir_successful += 1

                else:
                    print(f"    Warning: No labels found for {new_path.name}")

            except Exception as e:
                print(f"    Error processing {new_path.name}: {e}")
                total_errors += 1

        print(f"  Processed {subdir_processed} images, {subdir_successful} successful")

    print("\n=== Conversion Complete ===")
    print(f"Total images processed: {total_processed}")
    print(f"Successfully converted: {total_successful}")
    print(f"Errors encountered: {total_errors}")
    print(f"Output directory: {TARGET_DIR}")

    return total_processed, total_successful, total_errors


# Run the conversion
processed, successful, errors = convert_dlc_to_yolo()

Starting DLC to YOLO conversion...

Processing left_10-06-2024_16-40-41...
  Reading labels from CollectedData_Anna.csv
  Found 5 images
  Processed 5 images, 1 successful

Processing left_10-06-2024_16-55-06...
  Reading labels from CollectedData_Anna.csv
  Found 5 images
  Processed 5 images, 1 successful

Processing left_10-06-2024_16-55-21...
  Reading labels from CollectedData_Anna.csv
  No data found in CSV file

Processing left_10-06-2024_17-01-03...
  Reading labels from CollectedData_Anna.csv
  Found 5 images
  Processed 5 images, 1 successful

Processing left_10-06-2024_17-01-17...
  Reading labels from CollectedData_Anna.csv
  Found 5 images
  Processed 5 images, 2 successful

Processing left_10-06-2024_19-06-19...
  Reading labels from CollectedData_Anna.csv
  Found 5 images
  Processed 5 images, 2 successful

Processing left_10-06-2024_19-14-45...
  Reading labels from CollectedData_Anna.csv
  Found 5 images
  Processed 5 images, 1 successful

Processing left_10-06-2024_19

## 8. Validate Output Data

In [10]:
# Validate the conversion results
print("=== Validation Results ===")

# Count output files
output_images = list(TARGET_IMAGES_DIR.glob("*"))
output_labels = list(TARGET_LABELS_DIR.glob("*.txt"))

print(f"Output images: {len(output_images)}")
print(f"Output labels: {len(output_labels)}")

# Check for matching pairs
images_without_labels = []
labels_without_images = []

image_stems = {Path(img.name).stem for img in output_images}
label_stems = {Path(lbl.name).stem for lbl in output_labels}

images_without_labels = image_stems - label_stems
labels_without_images = label_stems - image_stems

print(f"Images without labels: {len(images_without_labels)}")
print(f"Labels without images: {len(labels_without_images)}")

if images_without_labels:
    print("First 5 images without labels:", list(images_without_labels)[:5])
if labels_without_images:
    print("First 5 labels without images:", list(labels_without_images)[:5])


# NEW: Function to clean up orphaned images
def cleanup_orphaned_images():
    """Remove images that don't have corresponding label files"""

    if not images_without_labels:
        print("\n=== Image Cleanup ===")
        print("✓ No orphaned images found. Dataset is clean!")
        return 0

    print("\n=== Image Cleanup ===")
    print(f"Found {len(images_without_labels)} images without labels")

    # Find the actual image files to delete
    images_to_delete = []
    for img_file in output_images:
        if Path(img_file.name).stem in images_without_labels:
            images_to_delete.append(img_file)

    if not images_to_delete:
        print("No matching image files found to delete.")
        return 0

    print(f"Will delete {len(images_to_delete)} orphaned image files:")
    for img in images_to_delete[:5]:  # Show first 5 as example
        print(f"  - {img.name}")
    if len(images_to_delete) > 5:
        print(f"  ... and {len(images_to_delete) - 5} more")

    # Ask for confirmation (or auto-proceed in notebook context)
    proceed = True  # Set to True for automatic cleanup, change to False for manual confirmation

    if proceed:
        deleted_count = 0
        for img_file in images_to_delete:
            try:
                img_file.unlink()  # Delete the file
                deleted_count += 1
            except Exception as e:
                print(f"Error deleting {img_file.name}: {e}")

        print(f"✓ Successfully deleted {deleted_count} orphaned images")
        return deleted_count
    else:
        print("Cleanup cancelled.")
        return 0


# Run the cleanup
deleted_images = cleanup_orphaned_images()

# Update counts after cleanup
if deleted_images > 0:
    output_images = list(TARGET_IMAGES_DIR.glob("*"))
    print(f"Updated image count: {len(output_images)}")

# Sample a few label files to verify format
print("\n=== Sample Label Files ===")
sample_labels = output_labels[:3]
for label_file in sample_labels:
    print(f"\nFile: {label_file.name}")
    with open(label_file, "r") as f:
        content = f.read().strip()
        print(f"Content: {content}")

        # Validate format
        parts = content.split()
        if len(parts) >= 1:
            class_id = parts[0]
            keypoint_data = parts[1:]
            expected_keypoint_parts = len(BODYPARTS) * 3  # x, y, visibility for each keypoint
            print(f"Class ID: {class_id}")
            print(f"Keypoint data parts: {len(keypoint_data)} (expected: {expected_keypoint_parts})")

# NEW: Cross-validate keypoint counts between CSV and YOLO files
print("\n=== Keypoint Count Validation ===")


def count_csv_keypoints():
    """Count total visible keypoints in all original CSV files"""
    total_csv_keypoints = 0
    csv_files_processed = 0

    for subdir in subdirs:
        csv_files = list(subdir.glob("*.csv"))
        if csv_files:
            csv_file = csv_files[0]
            csv_files_processed += 1

            try:
                parsed_data = parse_dlc_csv(csv_file)
                for entry in parsed_data:
                    for keypoint in entry["keypoints"]:
                        x, y, visibility = keypoint
                        if visibility == 2:  # 2 = visible
                            total_csv_keypoints += 1
            except Exception as e:
                print(f"Error processing CSV {csv_file}: {e}")

    return total_csv_keypoints, csv_files_processed


def count_yolo_keypoints():
    """Count total visible keypoints in all YOLO label files"""
    total_yolo_keypoints = 0
    yolo_files_processed = 0

    for label_file in output_labels:
        yolo_files_processed += 1
        try:
            with open(label_file, "r") as f:
                content = f.read().strip()
                parts = content.split()

                if (
                    len(parts) >= 8
                ):  # Need class_id + bbox (4 values) + at least 1 keypoint (3 values) = 8 minimum
                    # Skip class ID and bounding box (5 values total), process keypoint data in groups of 3 (x, y, visibility)
                    keypoint_data = parts[5:]  # Skip class_id, cx, cy, w, h

                    for i in range(0, len(keypoint_data), 3):
                        if i + 2 < len(keypoint_data):
                            try:
                                visibility = int(keypoint_data[i + 2])
                                if visibility == 2:  # 2 = visible
                                    total_yolo_keypoints += 1
                            except ValueError:
                                continue
        except Exception as e:
            print(f"Error processing YOLO file {label_file}: {e}")

    return total_yolo_keypoints, yolo_files_processed


# Count keypoints in both formats
csv_keypoints, csv_files_count = count_csv_keypoints()
yolo_keypoints, yolo_files_count = count_yolo_keypoints()

print(f"CSV files processed: {csv_files_count}")
print(f"YOLO files processed: {yolo_files_count}")
print(f"Total visible keypoints in CSV files: {csv_keypoints}")
print(f"Total visible keypoints in YOLO files: {yolo_keypoints}")

# Validation check
if csv_keypoints == yolo_keypoints:
    print("✓ PASS: Keypoint counts match between CSV and YOLO files")
else:
    print("✗ FAIL: Keypoint count mismatch!")
    print(f"  Difference: {abs(csv_keypoints - yolo_keypoints)} keypoints")
    if csv_keypoints > yolo_keypoints:
        print(f"  {csv_keypoints - yolo_keypoints} keypoints lost during conversion")
    else:
        print(f"  {yolo_keypoints - csv_keypoints} extra keypoints in YOLO files")

print("\n=== Conversion Summary ===")
print(f"✓ Created {len(output_images)} images in {TARGET_IMAGES_DIR}")
print(f"✓ Created {len(output_labels)} label files in {TARGET_LABELS_DIR}")
if deleted_images > 0:
    print(f"✓ Cleaned up {deleted_images} orphaned images")
print("✓ Ready for YOLO pose estimation training!")
print("\nNext steps:")
print("1. Create a YOLO dataset configuration file")
print("2. Split data into train/val sets")
print("3. Train your YOLO pose estimation model")

=== Validation Results ===
Output images: 1036
Output labels: 386
Images without labels: 650
Labels without images: 0
First 5 images without labels: ['right_18-06-2024_13-43-10_img010', 'left_18-06-2024_13-12-55_img28', 'right_13-06-2024_15-41-59_img224', 'left_12-06-2024_17-42-46_img00', 'left_18-06-2024_13-53-51_img35']

=== Image Cleanup ===
Found 650 images without labels
Will delete 650 orphaned image files:
  - left_10-06-2024_16-40-41_img05.png
  - left_10-06-2024_16-40-41_img12.png
  - left_10-06-2024_16-40-41_img26.png
  - left_10-06-2024_16-40-41_img80.png
  - left_10-06-2024_16-55-06_img012.png
  ... and 645 more
✓ Successfully deleted 650 orphaned images
Updated image count: 386

=== Sample Label Files ===

File: left_10-06-2024_16-40-41_img30.txt
Content: 0 0.470968 0.175183 0.802865 0.314870 0.089535 0.238921 2 0.697039 0.306643 2 0.645252 0.037748 2 0.000000 0.000000 0 0.000000 0.000000 0 0.000000 0.000000 0 0.852400 0.312619 2
Class ID: 0
Keypoint data parts: 25 (expect

## 9. Package Dataset for CVAT Import

In [11]:
# Package dataset for CVAT (Computer Vision Annotation Tool) import using traditional YOLO format
import zipfile
from datetime import datetime


def create_cvat_package():
    """Create a CVAT-compatible traditional YOLO ZIP package with numeric filenames"""

    print("=== Creating CVAT Traditional YOLO Package ===")

    # Create a timestamp for the ZIP file name
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
    zip_filename = f"bee_pose_yolo_{timestamp}.zip"
    zip_path = TARGET_DIR / zip_filename

    # Verify we have data to package
    if len(output_images) == 0 or len(output_labels) == 0:
        print("❌ Error: No images or labels found to package!")
        return None

    print(f"Creating Traditional YOLO ZIP package: {zip_filename}")
    print(f"📦 Packaging {len(output_images)} images and {len(output_labels)} labels")

    try:
        with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as zipf:
            # 3. Add images and labels to obj_train_data directory (Traditional YOLO format requirement)
            print("  Adding images and labels to obj_train_data/...")
            train_image_paths = []

            for img_file in output_images:
                # Find corresponding label file
                label_file = None
                img_stem = img_file.stem
                for lbl_file in output_labels:
                    if lbl_file.stem == img_stem:
                        label_file = lbl_file
                        break

                if label_file:
                    # Add image to obj_train_data/
                    img_path_in_zip = f"images/train/{img_file.name}"
                    zipf.write(img_file, img_path_in_zip)

                    # Add label file to same directory (Traditional YOLO requirement)
                    label_path_in_zip = f"labels/train/{img_file.stem}.txt"
                    zipf.write(label_file, label_path_in_zip)

                    # Track for train.txt (use data/ prefix as in CVAT examples)
                    train_image_paths.append(f"{img_path_in_zip}")
                else:
                    print(f"    Warning: No label found for {img_file.name}")

            # 4. Create train.txt (list of image paths with data/ prefix)
            print("  Creating train.txt...")
            train_txt_content = "\n".join(train_image_paths) + "\n"
            zipf.writestr("train.txt", train_txt_content)

            # Include the data.yaml
            print("Adding data.yaml")
            yamlfile = Path("./labeled-data/data.yaml").resolve()
            assert yamlfile.exists()
            zipf.write(yamlfile, "data.yaml")

        print(f"✅ Successfully created Traditional YOLO package: {zip_path}")
        print(f"📊 Package size: {zip_path.stat().st_size / (1024 * 1024):.1f} MB")
        print(f"📁 Structure: obj_train_data/ with {len(train_image_paths)} image-label pairs")
        print(f"🔢 Files numbered from 1.jpg to {len(output_images)}.jpg")

        return zip_path

    except Exception as e:
        print(f"❌ Error creating ZIP package: {e}")
        return None


# Create the CVAT Traditional YOLO package
cvat_package_path = create_cvat_package()

if cvat_package_path:
    print("\n🎉 CVAT Traditional YOLO package ready for upload!")
    print(f"📁 Location: {cvat_package_path}")
    print("\n📋 Import Instructions:")
    print("1. Go to https://app.cvat.ai")
    print("2. Create new project → Pose estimation")
    print("3. Click 'Upload annotation' → Select 'YOLO 1.1' format")
    print("4. Upload the ZIP file created above")
    print(
        "5. Configure skeleton with keypoint order: L_antenna, R_antenna, L_mandible, R_mandible, Top_prob, Tube_prob, End_prob"
    )
    print("\n💡 Key Changes for CVAT Compatibility:")
    print("   ✅ Numeric file naming (1.jpg, 2.jpg, 3.jpg, ...)")
    print("   ✅ Traditional YOLO structure (obj_train_data/ directory)")
    print("   ✅ Correct train.txt format with data/ prefix")
    print("   ✅ Pose format: bounding box + keypoints")
    print("   ✅ No README.md to avoid import issues")
else:
    print("❌ Failed to create CVAT package. Check the errors above.")

=== Creating CVAT Traditional YOLO Package ===
Creating Traditional YOLO ZIP package: bee_pose_yolo_20260223_135735.zip
📦 Packaging 386 images and 386 labels
  Adding images and labels to obj_train_data/...
  Creating train.txt...
Adding data.yaml
✅ Successfully created Traditional YOLO package: C:\Users\bee-ops\code\Choice-assay\labeled-data\bee_pose_yolo_20260223_135735.zip
📊 Package size: 22.6 MB
📁 Structure: obj_train_data/ with 386 image-label pairs
🔢 Files numbered from 1.jpg to 386.jpg

🎉 CVAT Traditional YOLO package ready for upload!
📁 Location: C:\Users\bee-ops\code\Choice-assay\labeled-data\bee_pose_yolo_20260223_135735.zip

📋 Import Instructions:
1. Go to https://app.cvat.ai
2. Create new project → Pose estimation
3. Click 'Upload annotation' → Select 'YOLO 1.1' format
4. Upload the ZIP file created above
5. Configure skeleton with keypoint order: L_antenna, R_antenna, L_mandible, R_mandible, Top_prob, Tube_prob, End_prob

💡 Key Changes for CVAT Compatibility:
   ✅ Numeric 

In [12]:
# Refresh output file variables after numeric conversion
print("=== Refreshing File Lists ===")
output_images = list(TARGET_IMAGES_DIR.glob("*"))
output_labels = list(TARGET_LABELS_DIR.glob("*.txt"))

# Remove any orphaned images without labels
orphaned_images = []
for img_file in output_images[:]:
    label_file = TARGET_LABELS_DIR / f"{img_file.stem}.txt"
    if not label_file.exists():
        orphaned_images.append(img_file)
        output_images.remove(img_file)
        img_file.unlink()  # Delete the orphaned image

if orphaned_images:
    print(f"🧹 Removed {len(orphaned_images)} orphaned images")

print(f"✅ Final counts: {len(output_images)} images, {len(output_labels)} labels")
print(f"📁 Files: {output_images[0].name} to {output_images[-1].name}")

=== Refreshing File Lists ===
✅ Final counts: 386 images, 386 labels
📁 Files: left_10-06-2024_16-40-41_img30.png to right_24-08-2024_14-55-15_img050.png
