# Fish Analytic Notebook
The following notebook runs through the process of creating and training a dataset for a site-specific model

## Workflow
- FFMPEG extracts frames from video and saves to a folder
- Florence-2 locates fish in the images and creates a set of bounding boxes
- Bounding boxes are used to assist SAM-2 in creating segmentation masks
- Segmentation masks are converted to COCO format polygons and exported as a .JSON file
- - optional step
  - import images and .JSON file into CVAT to add species labels to annotations
  - export as COCO JSON file
- Convert JSON file to YOLO bounding box format
- Detect small annotations (as <1% of frame) and assign them to the discard class
- Detect overlapping boxes and assign them to the discard class
- - must do if optional step wasn't done above
  - import images and YOLO files into CVAT to add species labels to annotations
- Explode out remaining images and store as seperate images files with new YOLO annotations
- Select images for training/validation/testing
- Initiate YOLO training from created dataset.
- Run inference using model trained from previous step
- Extract MaxN and create species accumulation data
- Some magic

In [16]:
!nvidia-smi

Mon Oct 28 08:53:58 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.61                 Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 3060 ...  WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   45C    P8              9W /   75W |       0MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

## Setup working directories
This notebook should be started from a root folder where the following two folders are subdirectories of. It could probably work, but I haven't tested it.
- AIDIR - used to define where the AI scripts reside (YOLO, SAM-2 & FLORENCE-2)
- ANNODIR - used to define where the annotation files and images reside. In my setup, they are all setup in further subdirectories based on site and annotation method. Also contains post-processing scripts.

In [2]:
import os
WORKING_DIR = os.getcwd()
AIDIR = 'C:\\Users\\neez\\OneDrive\\Uni\\VLS301\\detect_scripts'
ANNODIR = 'C:\\Users\\neez\\OneDrive\\Uni\\VLS301\\Completed_Annotations'
print("WORKING_DIR:", WORKING_DIR)

WORKING_DIR: C:\Users\neez\OneDrive\Uni\VLS301


## Extract video frames

Variables to extract frames from initial video
* INITVID_SOURCE_LOCATION = Location of video file to train from
* INITVID_EXTRACT_LOCATION_ANNO = Location to store image files from video to use in annotation
* INITVID_EXTRACT_LOCATION_DETECT = Location to store image files from video to use in annotation
* INITVID_START_TIME = Time in the video to start extracting frames from (hh:mm:ss)
* INITVID_EXTRACT_LENGTH = Length of time in the video to extract (from start time) (hh:mm:ss)
* INITVID_EXTRACT_FPS = Frames per Second to extract for annotation dataset (5 = 5fps, 1/30 = 1fp30s)

YOLO parameters
* YOLO_TRN_EPOCHS = Yolo training epochs (100 is a good place to start)
* YOLO_TRN_BATCH_SIZE = Yolo training bacth size (16 is what I generally use)

In [7]:
PROJECT_NAME = "Waychinicup_1fp15s"

INITVID_SOURCE_LOCATION = "C:\\Users\\neez\\OneDrive\\Uni\\VLS301\\Waychinicup_full.mp4"
INITVID_EXTRACT_LOCATION_ANNO = os.path.join(WORKING_DIR, PROJECT_NAME, "Annotate")
INITVID_EXTRACT_LOCATION_DETECT = os.path.join(WORKING_DIR, PROJECT_NAME, "Detect")
INITVID_START_TIME = "0:02:09" # Time BRUV has steadied on substrate
INITVID_EXTRACT_LENGTH = "1:00:00" # 1 hour of footage
INITVID_EXTRACT_FPS = "1/15"

YOLO_TRN_EPOCHS = 100
YOLO_TRN_BATCH_SIZE = 16

### Extracts files for annotation

In [None]:
from pathlib import Path

# Create output directory if it doesn't exist
os.makedirs(INITVID_EXTRACT_LOCATION_ANNO, exist_ok=True)

# Build ffmpeg command
ffmpeg_cmd = f'ffmpeg -i "{INITVID_SOURCE_LOCATION}" ' \
            f'-ss {INITVID_START_TIME} ' \
            f'-t {INITVID_EXTRACT_LENGTH} ' \
            f'-vf "fps={INITVID_EXTRACT_FPS}" ' \
            f'-q:v 2 "{INITVID_EXTRACT_LOCATION_ANNO}/frame_%04d.jpg"'

# Execute ffmpeg command
print("Extracting files for automated annotation:")
print(ffmpeg_cmd)
!{ffmpeg_cmd}

# Verify output
extracted_frames = list(Path(INITVID_EXTRACT_LOCATION).glob('*.jpg'))
print(f"\nExtracted {len(extracted_frames)} frames to {INITVID_EXTRACT_LOCATION}")

### Extracts files for detection

In [None]:
import os
from pathlib import Path

# Create output directory if it doesn't exist
os.makedirs(INITVID_EXTRACT_LOCATION_DETECT, exist_ok=True)

# Build ffmpeg command
ffmpeg_cmd = f'ffmpeg -i "{INITVID_SOURCE_LOCATION}" ' \
            f'-ss {INITVID_START_TIME} ' \
            f'-t {INITVID_EXTRACT_LENGTH} ' \
            f'-vf "fps=2" ' \
            f'-q:v 2 "{INITVID_EXTRACT_LOCATION_DETECT}/frame_%04d.jpg"'

# Execute ffmpeg command
print("Extracting files for automated annotation:")
print(ffmpeg_cmd)
!{ffmpeg_cmd}

# Verify output
extracted_frames = list(Path(INITVID_EXTRACT_LOCATION).glob('*.jpg'))
print(f"\nExtracted {len(extracted_frames)} frames to {INITVID_EXTRACT_LOCATION}")

## Download and setup SAM-2
These only need to be run once when setting up the system. Does not need to be run everytime.

In [None]:
os.chdir(AIDIR)

In [None]:
!pip install flash_attn -q timm -q
!pip install accelerate -q
!pip install einops -q
!pip install -q supervision
!pip install scikit-image

In [None]:
!mkdir my_models
!mkdir my_models/Florence_2

!curl -o {AIDIR}\my_models\sam2\sam2_hiera_tiny.pt https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt
!curl -o {AIDIR}\my_models\sam2\sam2_hiera_small.pt https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt
!curl -o {AIDIR}\my_models\sam2\sam2_hiera_base_plus.pt https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt
!curl -o {AIDIR}\my_models\sam2\sam2_hiera_large.pt https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt

In [None]:
!git clone https://github.com/facebookresearch/segment-anything-2.git
%cd segment-anything-2
!pip install -e . -q

## Download and setup YOLO
These only need to be run once when setting up the system. Does not need to be run everytime.

### Install YOLOv8

In [3]:
# Pip install method (recommended)

!pip install ultralytics==8.2.103 -q

from IPython import display
display.clear_output()

import ultralytics
ultralytics.checks()

Ultralytics YOLOv8.2.103  Python-3.12.5 torch-2.5.0+cpu CPU (12th Gen Intel Core(TM) i7-1280P)
Setup complete  (20 CPUs, 15.7 GB RAM, 426.6/930.4 GB disk)


## Setup Florence-2 discovery
This does need to be run as this loads in the Florence-2 modules to perform the initial inference.

In [6]:
os.chdir(AIDIR)
CURRDIR = os.getcwd()
print("CURRDIR:", CURRDIR)

CURRDIR: C:\Users\neez\OneDrive\Uni\VLS301\detect_scripts


In [7]:
from transformers import AutoModelForCausalLM, AutoProcessor

model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large-ft",
                                             cache_dir=f"{AIDIR}/my_models/Florence_2",
                                             device_map="cuda",
                                             trust_remote_code=True)

processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large-ft",
                                             cache_dir=f"{AIDIR}/my_models/Florence_2",
                                             trust_remote_code=True)



In [15]:
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from pathlib import Path
from PIL import Image
import cv2
import torch
import base64
import json
import os
import glob

import numpy as np
import supervision as sv

from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from sam2.automatic_mask_generator import SAM2AutomaticMaskGenerator
from datetime import datetime
from skimage import measure

## Fish Detection

In [5]:
def find_all_fish(image):
    PROMPT = "<OD>"
    task_type = "<OD>"

    inputs = processor(text=PROMPT, images=image, return_tensors="pt").to("cuda")

    generated_ids = model.generate(
        input_ids=inputs["input_ids"],
        pixel_values=inputs["pixel_values"],
        max_new_tokens=2048,
        do_sample=False,
    )
    text_generations = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

    results = processor.post_process_generation(text_generations,
    task=task_type, image_size=(image.width, image.height))

    raw_lists = []

    for bbox, label in zip(results[task_type]['bboxes'], results[task_type]['labels']):
        if label == "fish":
            raw_lists.append(bbox)

    return raw_lists, results[task_type]['labels']

def convert_mask_to_polygon(mask):
    """Convert binary mask to COCO polygon format."""
    print(f"Input mask shape: {mask.shape}")
    print(f"Input mask dtype: {mask.dtype}")
    print(f"Input mask values range: [{mask.min()}, {mask.max()}]")
    
    # Ensure mask is 2D and binary
    if len(mask.shape) > 2:
        mask = np.squeeze(mask)
    binary_mask = (mask > 0.5).astype(np.uint8)
    
    try:
        # Find contours
        contours = measure.find_contours(binary_mask, 0.5)
        print(f"Found {len(contours)} contours")
        
        # Convert contours to COCO format
        polygons = []
        for contour in contours:
            if len(contour) < 3:
                continue
            polygon = []
            for point in contour:
                polygon.extend([float(point[1]), float(point[0])])
            polygons.append(polygon)
        
        if not polygons:
            print("No valid polygons found, creating bounding box")
            h, w = binary_mask.shape
            box_polygon = [0, 0, w, 0, w, h, 0, h]
            polygons.append(box_polygon)
            
        return polygons
        
    except Exception as e:
        print(f"Error in convert_mask_to_polygon: {str(e)}")
        print(f"Mask shape after processing: {binary_mask.shape}")
        raise

def visualize_florence_and_sam2(image, boxes, masks):
    """Visualization function"""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))
    
    # Plot Florence-2 detections
    ax1.imshow(image)
    for box in boxes:
        x1, y1, x2, y2 = box
        width = x2 - x1
        height = y2 - y1
        rect = patches.Rectangle(
            (x1, y1), width, height,
            linewidth=2,
            edgecolor='red',
            facecolor='none'
        )
        ax1.add_patch(rect)
    ax1.set_title("Florence-2 Detections")
    ax1.axis('off')
    
    # Plot SAM2 segmentations
    ax2.imshow(image)
    colors = ['red', 'blue', 'green', 'yellow', 'purple']
    
    for i, mask in enumerate(masks):
        color = colors[i % len(colors)]
        colored_mask = np.zeros((*mask.shape, 3), dtype=np.uint8)
        if i == 0:
            colored_mask[mask > 0.5] = [255, 0, 0]
        elif i == 1:
            colored_mask[mask > 0.5] = [0, 255, 0]
        else:
            colored_mask[mask > 0.5] = [0, 0, 255]
            
        ax2.imshow(colored_mask, alpha=0.3)
        
        box = boxes[i]
        x1, y1, x2, y2 = box
        width = x2 - x1
        height = y2 - y1
        rect = patches.Rectangle(
            (x1, y1), width, height,
            linewidth=2,
            edgecolor=color,
            facecolor='none'
        )
        ax2.add_patch(rect)
    ax2.set_title("SAM2 Segmentations")
    ax2.axis('off')
    
    plt.tight_layout()
    plt.show()

def process_directory_to_coco(input_dir, output_json, visualize=False):
    """
    Process all images in a directory and create COCO format annotations.
    
    Args:
        input_dir: Directory containing images
        output_json: Path to save the COCO json file
        visualize: Whether to show visualizations for each image
    """
    # Initialize COCO dataset structure
    coco_dataset = {
        "info": {
            "year": datetime.now().year,
            "version": "1.0",
            "description": "Fish Dataset",
            "contributor": "Your Name",
            "date_created": datetime.now().strftime("%Y-%m-%d")
        },
        "licenses": [],
        "categories": [
            {
                "id": 1,
                "name": "fish",
                "supercategory": "animal"
            }
        ],
        "images": [],
        "annotations": []
    }

    # Get list of images
    image_paths = sorted(glob.glob(os.path.join(input_dir, "*.jpg")))
    print(f"Found {len(image_paths)} images to process")

    # Setup SAM2 model (do this once outside the loop)
    DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    CHECKPOINT = f"{AIDIR}/my_models/sam2/sam2_hiera_small.pt"
    CONFIG = f"{AIDIR}/segment-anything-2/sam2/configs/sam2/sam2_hiera_s.yaml"
    sam2_model = build_sam2(CONFIG, CHECKPOINT, device=DEVICE, apply_postprocessing=False)
    
    annotation_id = 1  # Counter for annotation IDs

    # Process each image
    for image_id, image_path in enumerate(image_paths):
        print(f"\nProcessing image {image_id + 1}/{len(image_paths)}: {os.path.basename(image_path)}")
        
        try:
            # Load image
            image = Image.open(image_path).convert("RGB")
            
            # Add image info to COCO dataset
            image_info = {
                "id": image_id,
                "file_name": os.path.basename(image_path),
                "width": image.width,
                "height": image.height,
                "date_captured": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
            }
            coco_dataset["images"].append(image_info)

            # Get Florence-2 detections
            boxes, labels = find_all_fish(image)
            print(f"Florence-2 detection results - Boxes: {len(boxes)}, Labels: {len(labels)}")

            if len(boxes) > 0:
                # Setup SAM2 predictor for this image
                predictor = SAM2ImagePredictor(sam2_model)
                predictor.set_image(image)

                # Get SAM2 masks
                masks, scores, logits = predictor.predict(
                    box=boxes,
                    multimask_output=False
                )
                print(f"SAM2 prediction results - Masks shape: {masks.shape}")

                # Ensure masks is 3D (N, H, W)
                if len(masks.shape) == 2:
                    masks = masks[np.newaxis, :, :]
                elif len(masks.shape) == 4:
                    masks = np.squeeze(masks, axis=1)

                # Create COCO annotations
                for i, (mask, box) in enumerate(zip(masks, boxes)):
                    try:
                        # Ensure mask is 2D binary
                        binary_mask = mask.astype(np.uint8)
                        
                        # Find contours using cv2 instead of skimage
                        contours, _ = cv2.findContours(
                            binary_mask, 
                            cv2.RETR_EXTERNAL, 
                            cv2.CHAIN_APPROX_SIMPLE
                        )
                        
                        # Convert contours to COCO polygon format
                        polygons = []
                        for contour in contours:
                            # Flatten the contour and convert to list
                            contour = contour.flatten().tolist()
                            # Convert to x,y pairs
                            polygon = []
                            for j in range(0, len(contour), 2):
                                polygon.extend([float(contour[j]), float(contour[j+1])])
                            if len(polygon) >= 6:  # Must have at least 3 points
                                polygons.append(polygon)
                        
                        # If no valid polygons found, use bounding box
                        if not polygons:
                            x1, y1, x2, y2 = map(float, box)
                            polygons = [[x1, y1, x2, y1, x2, y2, x1, y2]]
                        
                        # Calculate area
                        area = float(binary_mask.sum())
                        
                        # Create annotation
                        annotation = {
                            "id": annotation_id,
                            "image_id": image_id,
                            "category_id": 1,  # fish
                            "segmentation": polygons,
                            "area": area,
                            "bbox": [float(box[0]), float(box[1]), 
                                   float(box[2] - box[0]), float(box[3] - box[1])],
                            "iscrowd": 0
                        }
                        
                        coco_dataset["annotations"].append(annotation)
                        annotation_id += 1
                        
                    except Exception as e:
                        print(f"Error processing mask {i} for image {image_path}: {str(e)}")
                        continue

            if visualize:
                # Create visualization
                fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))
                
                # Show original image with boxes
                ax1.imshow(image)
                for box in boxes:
                    x1, y1, x2, y2 = box
                    rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, 
                                          linewidth=2, edgecolor='r', facecolor='none')
                    ax1.add_patch(rect)
                ax1.set_title("Detections")
                
                # Show masks
                ax2.imshow(image)
                for mask in masks:
                    mask_img = np.zeros((*mask.shape, 3), dtype=np.uint8)
                    mask_img[mask > 0] = [255, 0, 0]
                    ax2.imshow(mask_img, alpha=0.3)
                ax2.set_title("Segmentations")
                
                plt.show()

        except Exception as e:
            print(f"Detailed error processing image {image_path}:")
            import traceback
            traceback.print_exc()
            print(f"Skipping this image and continuing with the next one...")
            continue

    # Save COCO dataset to JSON file
    with open(output_json, 'w') as f:
        json.dump(coco_dataset, f)
    
    print(f"\nProcessing complete. COCO annotations saved to {output_json}")
    print(f"Total images processed: {len(coco_dataset['images'])}")
    print(f"Total annotations created: {len(coco_dataset['annotations'])}")

In [None]:
print(INITVID_EXTRACT_LOCATION_ANNO)

In [6]:
# Usage
input_directory = f"{AIDIR}/data/Waychinicup_train"  # Your input directory
output_json_file = f"{AIDIR}/Waychinicup_train_fish_dataset_coco.json"  # Where to save the COCO json

# Run the processing
process_directory_to_coco(input_directory, output_json_file, visualize=False)

Found 180 images to process

Processing image 1/180: frame_0001.jpg




Florence-2 detection results - Boxes: 3, Labels: 4


Falling back to all available kernels for scaled_dot_product_attention (which may have a slower speed).


SAM2 prediction results - Masks shape: (3, 1, 1440, 1920)

Processing image 2/180: frame_0002.jpg
Florence-2 detection results - Boxes: 1, Labels: 2
SAM2 prediction results - Masks shape: (1, 1440, 1920)

Processing image 3/180: frame_0003.jpg
Florence-2 detection results - Boxes: 2, Labels: 3
SAM2 prediction results - Masks shape: (2, 1, 1440, 1920)

Processing image 4/180: frame_0004.jpg
Florence-2 detection results - Boxes: 2, Labels: 2
SAM2 prediction results - Masks shape: (2, 1, 1440, 1920)

Processing image 5/180: frame_0005.jpg
Florence-2 detection results - Boxes: 2, Labels: 3
SAM2 prediction results - Masks shape: (2, 1, 1440, 1920)

Processing image 6/180: frame_0006.jpg
Florence-2 detection results - Boxes: 5, Labels: 6
SAM2 prediction results - Masks shape: (5, 1, 1440, 1920)

Processing image 7/180: frame_0007.jpg
Florence-2 detection results - Boxes: 6, Labels: 6
SAM2 prediction results - Masks shape: (6, 1, 1440, 1920)

Processing image 8/180: frame_0008.jpg
Florence-2 

## Convert Segmentation Masks to YOLO Bounding Boxes

Once annotations are complete in CVAT, export the annotations in YOLO format. This works properly for Bounding Box annotations, but when segmentation is used, the annotation files are empty. Re-export from CVAT in COCO or CVAT format and copy the annotation JSON or XML file to the same folder as the YOLO files. The following scripts will convert from CVAT/COCO into YOLO.
Only one is needed.

Set the annotation file to convert from and the folder name to write to. Append obj_train_data to the folder name.
- TRAINING_SITE_TYPE = Essentially, the location where the annotation files are saved to. All other variables are updated using this for consistency.
- COCO_ANNO_FOLDER = Folder location to coco or cvat annotation file. Script appends relevent file name automatically.
- YOLO_ANNO_FOLDER = Folder where YOLO files will be extracted.
- IOU_OUTPUT_CSV = File to write YOLO data for each unique fish, in each frame, from YOLO annotation folder.
- CLASS_COUNT_CSV = CSV file that contains the total count for each class
- DETAILED_ANNO_CSV = CSV file that contains the dimensions of each fish (may be redundant)

In [3]:
os.chdir(ANNODIR)
CURRDIR = os.getcwd()
#os.path.join(os.getcwd(), WORKING_DIR)
#HOME = 'C:\\Users\\neez\\OneDrive\\Uni\\VLS301\\detect_scripts'
#os.chdir(HOME)
print("CURRDIR:", CURRDIR)

CURRDIR: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations


In [4]:
TRAINING_SITE_TYPE = "Reef_BB"

COCO_ANNO_FOLDER = os.path.join(ANNODIR, TRAINING_SITE_TYPE)
YOLO_ANNO_FOLDER = os.path.join(ANNODIR, TRAINING_SITE_TYPE, "obj_train_data")
IOU_OUTPUT_CSV = os.path.join(ANNODIR, f"{TRAINING_SITE_TYPE}_IoU.csv")
CLASS_COUNT_CSV = os.path.join(ANNODIR, f"{TRAINING_SITE_TYPE}_class_counts.csv")
DETAILED_ANNO_CSV = os.path.join(ANNODIR, f"{TRAINING_SITE_TYPE}_detailed_annotations.csv")

print("ANNODIR:", ANNODIR)
print("CURRDIR:", CURRDIR)
print("COCO_ANNO_FOLDER:", COCO_ANNO_FOLDER)
print("YOLO_ANNO_FOLDER:", YOLO_ANNO_FOLDER)

ANNODIR: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations
CURRDIR: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations
COCO_ANNO_FOLDER: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB
YOLO_ANNO_FOLDER: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data


In [32]:
!python cocomask-to-yolo-conversion.py {COCO_ANNO_FOLDER} {YOLO_ANNO_FOLDER}

Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
Processed annotation for category 46
P

In [17]:
!python coco-to-yolo-conversion.py {COCO_ANNO_FOLDER} {YOLO_ANNO_FOLDER}

In [None]:
!python cvat-xml-to-yolo-converter {COCO_ANNO_FOLDER} {YOLO_ANNO_FOLDER}

## Extract Annotation Data

Point the following to the folder where the images and YOLO annotations are. It will create 2 CSV files, 
- detailed_annotations = contains the size of the annotation in 1 file (frame #, class id, width, height, area)
- class_counts = contains the count of each class for the whole set of images (class id, count)

Useage
    yolo-annotation-processor.py <dir_name> --class_counts=<class_count.csv> --detailed_annotations=<detailed_annotations.csv>

In [5]:
!python yolo-annotation-processor.py {YOLO_ANNO_FOLDER} --class_counts={CLASS_COUNT_CSV} --detailed_annotations={DETAILED_ANNO_CSV}

Processing: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data\frame_0001.txt
Processing: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data\frame_0002.txt
Processing: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data\frame_0003.txt
Processing: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data\frame_0004.txt
Processing: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data\frame_0005.txt
Processing: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data\frame_0006.txt
Processing: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data\frame_0007.txt
Processing: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data\frame_0008.txt
Processing: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Reef_BB\obj_train_data\frame_0009.txt
Processing: C:\Users\neez\OneDrive\Uni\VLS301\

Extracts IoU data and writes it to a CSV file. This is used for comparing methods. Uses YOLO_ANNO_FILE from previous step

In [7]:
!python iou_data_extractor.py {YOLO_ANNO_FOLDER} {IOU_OUTPUT_CSV}

CSV file 'C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Artreef_FloSam_IoU.csv' has been created successfully.


## Post-processing Annotations

Find small annotations and intersecting bounding boxes and assign to discard

In [11]:
# First cell - Import required modules and define constants
import os
from itertools import combinations

# Constants (you can adjust these if needed)
IOU_THRESHOLD = 0.0  # Intersection over Union threshold
MIN_BOX_AREA_PERCENT = 0.5  # Minimum box area as a percentage of image area

In [12]:
# Second cell - Define helper functions
def parse_yolo_annotation(line):
    class_id, x_center, y_center, width, height = map(float, line.strip().split())
    return int(class_id), x_center, y_center, width, height

def calculate_iou(box1, box2):
    # Convert YOLO format to (x1, y1, x2, y2)
    def yolo_to_corners(x_center, y_center, width, height):
        x1 = x_center - width / 2
        y1 = y_center - height / 2
        x2 = x_center + width / 2
        y2 = y_center + height / 2
        return x1, y1, x2, y2

    box1_corners = yolo_to_corners(*box1[1:])
    box2_corners = yolo_to_corners(*box2[1:])

    # Calculate intersection area
    x_left = max(box1_corners[0], box2_corners[0])
    y_top = max(box1_corners[1], box2_corners[1])
    x_right = min(box1_corners[2], box2_corners[2])
    y_bottom = min(box1_corners[3], box2_corners[3])

    if x_right < x_left or y_bottom < y_top:
        return 0.0

    intersection_area = (x_right - x_left) * (y_bottom - y_top)

    # Calculate union area
    box1_area = (box1_corners[2] - box1_corners[0]) * (box1_corners[3] - box1_corners[1])
    box2_area = (box2_corners[2] - box2_corners[0]) * (box2_corners[3] - box2_corners[1])
    union_area = box1_area + box2_area - intersection_area

    # Calculate IoU
    iou = intersection_area / union_area if union_area > 0 else 0.0
    return iou

def is_box_too_small(box):
    _, _, _, width, height = box
    box_area = width * height
    return box_area < (MIN_BOX_AREA_PERCENT / 100)

def process_yolo_file(input_file, output_file):
    with open(input_file, 'r') as f:
        annotations = [parse_yolo_annotation(line) for line in f]

    boxes_to_discard = set()
    
    # Check for intersections
    for box1, box2 in combinations(annotations, 2):
        if calculate_iou(box1, box2) > IOU_THRESHOLD:
            boxes_to_discard.add(box1)
            boxes_to_discard.add(box2)
    
    # Check for small boxes
    for box in annotations:
        if is_box_too_small(box):
            boxes_to_discard.add(box)

    with open(output_file, 'w') as f:
        for box in annotations:
            if box in boxes_to_discard:
                f.write(f"1 {' '.join(map(str, box[1:]))}\n")
            else:
                f.write(f"{box[0]} {' '.join(map(str, box[1:]))}\n")

In [13]:
# Third cell - Process files using environment variables
def process_directory(input_dir, output_dir):
    """
    Process all YOLO annotation files in the input directory and save to output directory.
    
    Args:
        input_dir: Input directory containing YOLO annotation files
        output_dir: Output directory for processed files
    """
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        print(f"Created output directory: {output_dir}")

    # Get list of txt files
    txt_files = [f for f in os.listdir(input_dir) if f.endswith('.txt')]
    print(f"Found {len(txt_files)} text files to process")

    # Process each file
    for filename in txt_files:
        input_file = os.path.join(input_dir, filename)
        output_file = os.path.join(output_dir, filename)
        process_yolo_file(input_file, output_file)
        print(f"Processed {filename}")

    print("\nProcessing complete!")
    print(f"Input directory: {input_dir}")
    print(f"Output directory: {output_dir}")

In [29]:
# Fourth cell - Execute the processing
# Define your input and output directories
INPUT_DIR = YOLO_ANNO_FOLDER
UPDATED_CLASS_DIR = os.path.join(COCO_ANNO_FOLDER, "updated_classes")
OUTPUT_DIR = UPDATED_CLASS_DIR

# Run the processing
process_directory(INPUT_DIR, OUTPUT_DIR)

Created output directory: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Waych1fp20s\updated_classes
Found 180 images to process
Processing frame_0001.jpg...
Processing frame_0002.jpg...
Processing frame_0003.jpg...
Processing frame_0004.jpg...
Processing frame_0005.jpg...
Processing frame_0006.jpg...
Processing frame_0007.jpg...
Processing frame_0008.jpg...
Processing frame_0009.jpg...
Processing frame_0010.jpg...
Processing frame_0011.jpg...
Processing frame_0012.jpg...
Processing frame_0013.jpg...
Processing frame_0014.jpg...
Processing frame_0015.jpg...
Processing frame_0016.jpg...
Processing frame_0017.jpg...
Processing frame_0018.jpg...
Processing frame_0019.jpg...
Processing frame_0020.jpg...
Processing frame_0021.jpg...
Processing frame_0022.jpg...
Processing frame_0023.jpg...
Processing frame_0024.jpg...
No annotation file found for frame_0025.jpg
Processing frame_0026.jpg...
Processing frame_0027.jpg...
Processing frame_0028.jpg...
Processing frame_0029.jpg...
Proces

Explode out images

In [16]:
# First cell - Imports and basic functions
import os
import cv2
import numpy as np

def parse_yolo_annotation(line):
    class_id, x_center, y_center, width, height = map(float, line.strip().split())
    return int(class_id), x_center, y_center, width, height

def yolo_to_pixel_coords(box, img_width, img_height):
    class_id, x_center, y_center, width, height = box
    x1 = int((x_center - width/2) * img_width)
    y1 = int((y_center - height/2) * img_height)
    x2 = int((x_center + width/2) * img_width)
    y2 = int((y_center + height/2) * img_height)
    return class_id, x1, y1, x2, y2

def pixel_to_yolo_coords(box, img_width, img_height):
    class_id, x1, y1, x2, y2 = box
    x_center = (x1 + x2) / (2 * img_width)
    y_center = (y1 + y2) / (2 * img_height)
    width = (x2 - x1) / img_width
    height = (y2 - y1) / img_height
    return class_id, x_center, y_center, width, height

In [17]:
# Second cell - Define expansion orders and intersection check
def intersects(x1, y1, x2, y2, ox1, oy1, ox2, oy2):
    return not (x2 < ox1 or ox2 < x1 or y2 < oy1 or oy2 < y1)

# Define all possible expansion orders
expansion_orders = [
    ['left', 'right', 'up', 'down'],
    ['left', 'right', 'down', 'up'],
    ['left', 'down', 'right', 'up'],
    ['left', 'down', 'up', 'right'],
    ['left', 'up', 'down', 'right'],
    ['left', 'up', 'right', 'down'],
    ['right', 'left', 'up', 'down'],
    ['right', 'left', 'down', 'up'],
    ['right', 'down', 'left', 'up'],
    ['right', 'down', 'up', 'left'],
    ['right', 'up', 'down', 'left'],
    ['right', 'up', 'left', 'down'],
    ['up', 'right', 'left', 'down'],
    ['up', 'right', 'down', 'left'],
    ['up', 'down', 'right', 'left'],
    ['up', 'down', 'left', 'right'],
    ['up', 'left', 'down', 'right'],
    ['up', 'left', 'right', 'down'],
    ['down', 'left', 'up', 'right'],
    ['down', 'left', 'right', 'up'],
    ['down', 'right', 'left', 'up'],
    ['down', 'right', 'up', 'left'],
    ['down', 'up', 'right', 'left'],
    ['down', 'up', 'left', 'right']
]

In [18]:
# Third cell - Cropping function
def crop_image(image, target_box, other_boxes, margin_factor=0.2):
    img_height, img_width = image.shape[:2]
    class_id, tx1, ty1, tx2, ty2 = target_box

    # Calculate margins
    margin_x = int((tx2 - tx1) * margin_factor)
    margin_y = int((ty2 - ty1) * margin_factor)

    # Initialize crop to target box with margins
    cx1 = max(0, tx1 - margin_x)
    cy1 = max(0, ty1 - margin_y)
    cx2 = min(img_width, tx2 + margin_x)
    cy2 = min(img_height, ty2 + margin_y)

    def expand_direction(x1, y1, x2, y2, direction):
        while True:
            if direction == 'left' and x1 > 0:
                new_x1 = x1 - 1
                if any(intersects(new_x1, y1, x2, y2, *box[1:]) for box in other_boxes if box[0] != class_id):
                    break
                x1 = new_x1
            elif direction == 'right' and x2 < img_width:
                new_x2 = x2 + 1
                if any(intersects(x1, y1, new_x2, y2, *box[1:]) for box in other_boxes if box[0] != class_id):
                    break
                x2 = new_x2
            elif direction == 'up' and y1 > 0:
                new_y1 = y1 - 1
                if any(intersects(x1, new_y1, x2, y2, *box[1:]) for box in other_boxes if box[0] != class_id):
                    break
                y1 = new_y1
            elif direction == 'down' and y2 < img_height:
                new_y2 = y2 + 1
                if any(intersects(x1, y1, x2, new_y2, *box[1:]) for box in other_boxes if box[0] != class_id):
                    break
                y2 = new_y2
            else:
                break
        return x1, y1, x2, y2

    best_crop = None
    max_area = 0

    for order in expansion_orders:
        temp_cx1, temp_cy1, temp_cx2, temp_cy2 = cx1, cy1, cx2, cy2
        for direction in order:
            temp_cx1, temp_cy1, temp_cx2, temp_cy2 = expand_direction(temp_cx1, temp_cy1, temp_cx2, temp_cy2, direction)
        
        area = (temp_cx2 - temp_cx1) * (temp_cy2 - temp_cy1)
        if area > max_area:
            max_area = area
            best_crop = (temp_cx1, temp_cy1, temp_cx2, temp_cy2)

    cx1, cy1, cx2, cy2 = best_crop

    # Crop the image
    cropped_image = image[int(cy1):int(cy2), int(cx1):int(cx2)]

    # Adjust bounding box coordinates for the cropped image
    new_box = (class_id, tx1 - cx1, ty1 - cy1, tx2 - cx1, ty2 - cy1)

    return cropped_image, new_box

In [19]:
# Fourth cell - File processing function
def process_file(image_path, annotation_path, output_dir):
    # Read image and annotations
    image = cv2.imread(image_path)
    img_height, img_width = image.shape[:2]

    with open(annotation_path, 'r') as f:
        annotations = [parse_yolo_annotation(line) for line in f]

    # Convert YOLO coords to pixel coords
    pixel_annotations = [yolo_to_pixel_coords(box, img_width, img_height) for box in annotations]

    # Process each object
    for i, target_box in enumerate(pixel_annotations):
        other_boxes = [box for box in pixel_annotations if box != target_box]
        
        cropped_image, new_box = crop_image(image, target_box, other_boxes)

        # Generate new filenames
        base_name = os.path.splitext(os.path.basename(image_path))[0]
        class_id = target_box[0]
        new_image_name = f"{class_id}_{base_name}_{i+1:02d}.jpg"
        new_anno_name = f"{class_id}_{base_name}_{i+1:02d}.txt"

        # Create class subdirectory if it doesn't exist
        class_dir = os.path.join(output_dir, str(class_id))
        os.makedirs(class_dir, exist_ok=True)

        # Save cropped image
        cv2.imwrite(os.path.join(class_dir, new_image_name), cropped_image)

        # Save new annotation
        crop_height, crop_width = cropped_image.shape[:2]
        new_yolo_box = pixel_to_yolo_coords(new_box, crop_width, crop_height)
        with open(os.path.join(class_dir, new_anno_name), 'w') as f:
            f.write(f"{new_yolo_box[0]} {new_yolo_box[1]} {new_yolo_box[2]} {new_yolo_box[3]} {new_yolo_box[4]}\n")

In [22]:
# Fifth cell - Process all files using environment variables
def process_directory(input_dir, output_dir):
    """Process all images and annotations in the input directory."""
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
        print(f"Created output directory: {output_dir}")

    # Get list of image files
    image_files = [f for f in os.listdir(input_dir) if f.endswith(('.jpg', '.png'))]
    print(f"Found {len(image_files)} images to process")

    # Process each file
    for filename in image_files:
        image_path = os.path.join(input_dir, filename)
        annotation_path = os.path.join(input_dir, os.path.splitext(filename)[0] + '.txt')
        
        if os.path.exists(annotation_path):
            print(f"Processing {filename}...")
            process_file(image_path, annotation_path, output_dir)
        else:
            print(f"No annotation file found for {filename}")

    print("\nProcessing complete!")
    print(f"Input directory: {input_dir}")
    print(f"Output directory: {output_dir}")

# Set up input and output directories using environment variables
INPUT_DIR = UPDATED_CLASS_DIR
OUTPUT_DIR = os.path.join(UPDATED_CLASS_DIR, "cropped")

# Run the processing
process_directory(INPUT_DIR, OUTPUT_DIR)

Created output directory: C:\Users\neez\OneDrive\Uni\VLS301\Completed_Annotations\Waych1fp20s\updated_classes\cropped
Found 180 images to process
Processing frame_0001.jpg...
Processing frame_0002.jpg...
Processing frame_0003.jpg...
Processing frame_0004.jpg...
Processing frame_0005.jpg...
Processing frame_0006.jpg...
Processing frame_0007.jpg...
Processing frame_0008.jpg...
Processing frame_0009.jpg...
Processing frame_0010.jpg...
Processing frame_0011.jpg...
Processing frame_0012.jpg...
Processing frame_0013.jpg...
Processing frame_0014.jpg...
Processing frame_0015.jpg...
Processing frame_0016.jpg...
Processing frame_0017.jpg...
Processing frame_0018.jpg...
Processing frame_0019.jpg...
Processing frame_0020.jpg...
Processing frame_0021.jpg...
Processing frame_0022.jpg...
Processing frame_0023.jpg...
Processing frame_0024.jpg...
No annotation file found for frame_0025.jpg
Processing frame_0026.jpg...
Processing frame_0027.jpg...
Processing frame_0028.jpg...
Processing frame_0029.jpg..

## YOLO Training

In [11]:
CWD = os.getcwd()
print(CWD)

C:\Users\neez\OneDrive\Uni\VLS301\detect_scripts


In [5]:
from ultralytics import YOLO

from IPython.display import display, Image

In [10]:
model = YOLO(f'{CWD}/yolov8s.pt')
results = model.predict(source='https://media.roboflow.com/notebooks/examples/dog.jpeg', conf=0.25)

Downloading https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s.pt to 'C:\Users\neez\OneDrive\Uni\VLS301\detect_scripts\yolov8s.pt'...


100%|█████████████████████████████████████████████████████████████████████████████| 21.5M/21.5M [00:03<00:00, 6.01MB/s]



Found https://media.roboflow.com/notebooks/examples/dog.jpeg locally at dog.jpeg
image 1/1 C:\Users\neez\OneDrive\Uni\VLS301\detect_scripts\dog.jpeg: 640x384 1 person, 1 car, 1 dog, 1 handbag, 186.7ms
Speed: 1.0ms preprocess, 186.7ms inference, 2.0ms postprocess per image at shape (1, 3, 640, 384)


## Detection/Inference

In [None]:
MODEL_PT=YOLO(f'{CWD}/yolov8s.pt')
DET_CONF=0.7
DET_IOU=0.7
DET_PROJECT=TRAINING_SITE_TYPE
DET_SOURCE=

In [None]:
yolo task=detect mode=predict model={MODEL_PT} conf={DET_CONF} SOURCE={DET_SOURCE} project={DET_PROJECT} name="detection" iou={DET_IOU} save_txt=True exist_ok=True save=True imgsz=1280

## Data Extraction
The below script hasn't been tested in this notebook and has only been used as a standalone script. It needs modification to work.
This script reads through the annotation files created by YOLO inference. Data is collected to create tables for species accumulation curves and MaxN.
Because we've extracted frames at 2fps above, we should be able to get a species accumulation curve and MaxN down to 0.5 sec accuracy.

In [None]:
import cv2
import os
import pandas as pd
import yaml
import argparse

# Create an argument parser
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--folder', type=str, help='The folder location to find the labels folder')
parser.add_argument('--data', type=str, help='The file location of the data.yaml')
parser.add_argument('--fps', type=float, help='The frame rate')

# Parse the arguments
args = parser.parse_args()

# Now you can use args.folder, args.data, args.fps, and args.video in your script
str_folder = args.folder
data_file = args.data
fps = args.fps
#video_file = args.video

# Open the video file
#cap = cv2.VideoCapture(video_file)

# Create an empty dictionary to store the class counts for each file
class_counts = {}
# Create an empty dictionary to store the first discovery time for each class
first_discovery = {}

# Loop through the files in the folder
files = os.listdir(str_folder)
# Sort files by timecode
files.sort(key=lambda x: int(x.split('_')[-1].split('.')[0]))
for file in files:
    # Read the report file as a pandas dataframe
    df = pd.read_csv(os.path.join(str_folder, file), sep=" ", header=None)
    # Rename the columns according to the yolov5 format
    df.columns = ["class","xcenter", "ycenter", "width", "height"]
    # Group the dataframe by the class column and count the number of rows
    counts = df.groupby("class").size()
    # Convert the counts to a dictionary and store it in the class_counts dictionary with the file name as the key
    class_counts[file] = counts.to_dict()

    # Update the first_discovery dictionary
    for cls in df["class"].unique():
        if cls not in first_discovery:
            frame_number = int(file.split('_')[-1].split('.')[0])
            minutes, seconds = divmod(frame_number / fps, 60)
            timestamp = f"{int(minutes)}:{int(seconds):02d}"
            first_discovery[cls] = timestamp


# Create an empty dictionary to store the file name with the highest count for each class
max_counts = {}

# Loop through the files in the class_counts dictionary
for fname, counts in class_counts.items():
    # Loop through the classes and counts in the counts dictionary
    for cls, count in counts.items():
        # If the class is not in the max_counts dictionary or the count is higher than the current maximum
        if cls not in max_counts or count > max_counts[cls][1]:
            # Update the max_counts dictionary with the file name and the count as a tuple
            max_counts[cls] = (fname, count)

# Load the data.yaml file and get the class names
with open(data_file, "r") as f:
    data = yaml.load(f, Loader=yaml.FullLoader)
class_names = data["names"]

# Print the results and write them to a file
with open(os.path.join(os.path.dirname(str_folder), 'MaxN results.txt'), 'w') as f, \
     open(os.path.join(os.path.dirname(str_folder), 'Discovery times.txt'), 'w') as g, \
     open(os.path.join(os.path.dirname(str_folder), 'Frame_and_class_ID.txt'), 'w') as h:
    print("The file with the highest count for each class is:", file=f)
    print("The first discovery time for each class is:", file=g)
#    print("The frame number and class ID for each class is:", file=h)
    for cls, (file, count) in max_counts.items():
        # Extract the frame number from the filename
        frame_number = int(file.split('_')[-1].split('.')[0])
        # Convert the frame number to a timestamp
        minutes, seconds = divmod(frame_number / fps, 60)
        timestamp = f"{int(minutes)}:{int(seconds):02d}"
        # Use the class names instead of the class numbers
        result = f"Class {class_names[cls]}: {timestamp} with {count} objects"
        print(result, file=f)
        print(f"{cls} {frame_number} {class_names[cls]}", file=h)

        # Print the first discovery time for each class
        if cls in first_discovery:
            print(f"Class {class_names[cls]} was first discovered at {first_discovery[cls]}", file=g)

        # Set the current position of the video file to the specified frame
        #cap.set(cv2.CAP_PROP_POS_FRAMES, frame_number)

        # Read the frame from the video file
        #ret, frame = cap.read()

        # If the frame was read successfully, save it as an image
        #if ret:
        #    cv2.imwrite(os.path.join(os.path.dirname(str_folder), f'{class_names[cls]}_frame.jpg'), frame)

## Grand Plan
The plan is to turn MaxN into a tight histogram, or maybe some formula. This will be used to overlay on a search bar for a video player interface. The idea is to have a curve per species, similar to the YouTube most replayed feature that will give a visual representation of where MaxN is. This will allowed rapid location in the video for manual verification.

In [None]:
Script to turn MaxN into an image or formual to use in video interface