This notebook contains a portion of the main pipeline and is meant to randomly select a user defined number of days and generate annotated raw images to later be analized for the calculation of AMT performance metrics.

##### Implemented:
- Looks at the brightness of each image. If the image is to dark, it will not be analyzed and moved to the `dark_frames` folder.
- Identifies insects in the image with GroundinDino, defines a bounding-box and saves a cropped image of the insect in the `cropped` folder.
- Checks whether the detection has already appeared on one of the last images to prevent saving a cropped image of the same individuum over and over. The tracker from AMT (Automatic Moth Trap: https://stangeia.hobern.net/autonomous-moth-trap-image-pipeline/) was implemented for this task.
- AMT also includes funtionality that attempts to optimize the cropped image of an individuum by saving a new version of the insect if it is sharper than the old one.
- The code quickly checks the size of each detection. If it is unrealistically big, the crop is saved in the `potentially_faulty` folder.
- Saves the raw images with the tracking visualized on them in another subfolder called `detection_drawings` - if activated.

In [21]:
from transformers import AutoModelForMaskGeneration, AutoProcessor, pipeline
from ipywidgets import interact, interactive, fixed, interact_manual, Layout
from PIL import Image, ImageDraw, ImageFont, PngImagePlugin
from typing import Any, List, Dict, Optional, Union, Tuple
from scipy.optimize import linear_sum_assignment
#from bioclip import TreeOfLifeClassifier, Rank
from IPython.display import clear_output
import matplotlib.patches as patches
import plotly.graph_objects as go
from dataclasses import dataclass
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
from collections import deque
import ipywidgets as widgets
import plotly.express as px
from pathlib import Path
import pandas as pd
import numpy as np
import subprocess
import warnings
import requests
import random
import torch
import glob
import json
import math
import time
import csv
import cv2
import os
import re

### Selfmade functions
from FunktionenZumImportieren.helper_funktionen_clean import *
from FunktionenZumImportieren.settings_widgets import *
from AMT_functions.colors import reportcolors
from AMT_functions.amt_tracker import *

warnings.simplefilter(action='always', category=UserWarning)
warnings.formatwarning = custom_warning_format
warnings.filterwarnings('ignore')

envs_path = get_local_env_path()

The envs directory is: /Users/Nils/miniforge3/envs


#### Define settings variables
##### Variables include:
- labels: prompt that specifies what GroundingDino searches in the images. Leave as `insect`.
- threshold: a threshold that tells GroundingDino whether or not to reject a detection or to accept it. Lower values means more detections but also more faulty detections.
- buffer: specifies the amount of border space in the cropped images. `15` is a good value usually.
- pixel_scale: how many pixels are in a cm of image on the current camera. Necessary for outputing mm readings of the insect length. Input `10` if you want to output the length in pixel.
- start_image: The name of a picture can be specified here (with extention). The code will then skip all the files before. Leave empty if you want to analyze all the images in the folder. Please leave this empty in the batch analysis pipeline. Might cause issues. If you want to start at the folder of a specific day, then put the folder number at the first spot in the brackets ([___:]) where the big arrow is in the main cell (<---). If you want to start from the 5th folder for some reason, then put in a 4.
- save_visualisation: activate if you want a visualisation of the detections and tracks saved in "detection_drawings"
- rank: Defines the taxonomic rank to which insects should be classified if BioClip is used. Can be set to None. This will make the algorithm classify up to the taxonomic rank that first satisfies the requirement set by certainty_threshold (see the function BioClip_inference in helper_funktionen). Setting rank to None will increase compute time.
- DIOPSIS_folder: diopsis camera folder containing the raw images to be analyzed. This structure is expected: DIOPSIS_folder/photos/folders_of_days

    WORD OF CAUTION: The 'score' value (in the results) of the BioClip inference will only reflect the certainty for the taxonomic rank up to which the classification went. That means that if you classified up to species level for example, and later in the analysis go up to family level, the score will no longer reflect how certain BioClip is for the family rank. The score always reflects the certainty that is achieved for the taxonomic rank that is stated in the 'highest_classification_rank' column of the results csv. If you first classified to species, but then later decide to switch your analysis to family, you need to run the inference again (ideally with the 'Only_BioClip_inference' notebook in 'utils') in order to obtain the right score values for that taxonomic rank. Otherwise, the 'score' value might appear either way to high or way to low.

In [2]:
get_values = create_interactive_widgets()

Dropdown(description='Annotation algorithm:', layout=Layout(height='30px', width='50%'), options=(('BioClip', …

Checkbox(value=True, description='Perform image classification with BioClip', indent=False)

Dropdown(description='Taxonomic rank:', index=4, layout=Layout(height='30px', width='50%'), options=(('kingdom…

Checkbox(value=False, description='Perform image classification with ApolloNet', indent=False)

Checkbox(value=False, description='Perform image classification with InsectDetect', indent=False)

Checkbox(value=False, description='Save visualisations', indent=False)

Text(value='', description='If you want to start at a specific image:', layout=Layout(height='30px', width='50…

In [3]:
labels = ["insect"]
threshold = 0.25
buffer = 15
#pixel_scale = ask_for_pixel_scale()

In [None]:
def find_day_folders(input_folder, n, seed=42):
    random.seed(seed)
    day_folders = []
    
    # Iterate over all subdirectories in the input folder
    for subfolder in os.listdir(input_folder):
        subfolder_path = os.path.join(input_folder, subfolder)
        
        if not os.path.isdir(subfolder_path):
            continue  # Skip if not a directory
        
        # Look for "_analyzed" subfolders
        analyzed_folders = [f for f in os.listdir(subfolder_path) if f.endswith("_analyzed")]
        
        for analyzed in analyzed_folders:
            analyzed_path = os.path.join(subfolder_path, analyzed)
            photos_path = os.path.join(analyzed_path, "photos")
            
            if os.path.exists(photos_path) and os.path.isdir(photos_path):
                # Collect all day folders in "photos"
                day_folders.extend([os.path.join(photos_path, day) for day in os.listdir(photos_path) if os.path.isdir(os.path.join(photos_path, day))])
    
    # Randomly select n folders
    if len(day_folders) < n:
        print(f"Warning: Only found {len(day_folders)} folders, selecting all.")
        selected_folders = day_folders
    else:
        selected_folders = random.sample(day_folders, n)
    
    return selected_folders

In [28]:
# Example usage
input_folder = "/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed"
n = 20  # Change this to your desired number of folders
random_days = find_day_folders(input_folder, n)

# Print or use the selected paths
for path in random_days:
    print(path)

/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/384_Weiss_mid/DIOPSIS-384_analyzed/photos/20240813
/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/383_Weiss_low/DIOPSIS-383_analyzed/photos/20240714
/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/387_Jatz_mid/DIOPSIS-387_analyzed/photos/20240909
/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/387_Jatz_mid/DIOPSIS-387_analyzed/photos/20240809
/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/387_Jatz_mid/DIOPSIS-387_analyzed/photos/20240718
/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/384_Weiss_mid/DIOPSIS-384_analyzed/photos/20240910
/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/384_Weiss_mid/DIOPSIS-384_analyzed/photos/20240803
/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/421_Jatz_low/Jatzhorn_lower_analyzed/photos/20240920
/Volumes/T7_Shiel

In [29]:
def plot_tracks_and_detections1(image_array, new_tracks, image):
    if len(new_tracks)>0:
        image_folder = Path(new_tracks[0]["image_folder"])
        day = image_folder.name
        site = image_folder.parts[-4]
        annotation_out_folder = os.path.join("/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Annotated_selection", site, day)
        os.makedirs(annotation_out_folder, exist_ok=True)
        annotation_out_path = os.path.join(annotation_out_folder, image)
        
        # Convert image array to BGR format if it's in RGB
        image_bgr = cv2.cvtColor(image_array, cv2.COLOR_RGB2BGR)
        
        for track in new_tracks:
            x_center = track['xcenter']
            y_center = track['ycenter']
            width = track['crop'].shape[1]
            height = track['crop'].shape[0]
            #print(f"Bin drin. Track: {track}")
            if 'saved' in track and track['saved'] == False:
                color = (0, 0, 255)  # Red
            elif 'sharper' in track and track["sharper"] == True:
                #color = (255, 255, 255)  # White
                color = (0, 0, 255)  # Red -- Für bessere Sichtbarkeit beim AMT check
            else:
                color = (0, 255, 0)  # Green
            
            # Draw rectangle
            top_left = (x_center - width // 2, y_center - height // 2)
            bottom_right = (x_center + width // 2, y_center + height // 2)
            cv2.rectangle(image_bgr, top_left, bottom_right, color, 2)
            
            # Add text
            cv2.putText(image_bgr, str(track['trackid']), (int(track['xcenter'] - 15), int(track['ycenter'] - height/2 - 20)), 
                        cv2.FONT_HERSHEY_SIMPLEX, 1.5, color, 2, cv2.LINE_AA)
            if 'cost' in track and track['cost'] is not None:
                cv2.putText(image_bgr, f"{track['cost']:.2f}", (int(track['xcenter'] - 50), int(track['ycenter'] + height/2 + 50)), 
                            cv2.FONT_HERSHEY_SIMPLEX, 1.5, color, 2, cv2.LINE_AA)
            if track['age'] != 0:
                cv2.putText(image_bgr, str(track['age']), (int(track['xcenter'] - width/2 - 40), int(track['ycenter'] + 20)), 
                            cv2.FONT_HERSHEY_SIMPLEX, 1.5, color, 2, cv2.LINE_AA)
        
        # Save the annotated image
        cv2.imwrite(annotation_out_path, image_bgr)

Main pipeline cell. All functionallity is included here\
Carefully monitor the printed details since some errors are not cought, but rather only printed. An error summary will appear at the end, telling you whether an error occured during the whole inference or not.

In [30]:
settings = get_values()
global_start_time = time.time()

global_error = False
#classifier = TreeOfLifeClassifier()
tracker = AMTTracker(config)
object_detector = load_grounding_dino_model()

#image_folders = [folder for folder in os.listdir(os.path.join(DIOPSIS_folder, "photos")) if not folder.startswith('.')]
#image_folders = sorted(image_folders)

for path in random_days[:]: #  <------------------------------HERE----------------------------------------HERE-------------------------
    #image_folder = os.path.join(DIOPSIS_folder, "photos", img_folder)
    start_time = time.time()
    
    start_processing = False
    blobs = []
    trails = {}
    first_pass = True
    images = os.listdir(path)
    images = sorted(images)
    
    for image in tqdm(images):
    #for image in images:
        image_arg = os.path.join(path, image)
        skip, start_processing = should_image_be_skipped(settings.start_image, start_processing, image_arg, image, path)
        if skip:
            continue
        detections = detect(
            object_detector,
            image=image_arg,
            labels=labels,
            threshold=threshold
        )
        #print("Nr. of detections:", len(detections))
        image_array = Image.open(image_arg)
        image_array = np.array(image_array)
        blobs = convert_bounding_boxes(detections, image, image_array, path, buffer)
        if first_pass:
            tracker.savedois = blobs
        new_tracks, _ = tracker.managetracks(tracker.savedois, blobs, first_pass)
        tracker.savedois = new_tracks
        first_pass = False
    
        if settings.save_visualisation:
            plot_tracks_and_detections1(image_array, new_tracks, image)
    
    #print("--------")
    
    ## Classifying the cropped images
    print("Done detecting all the insects and saving cropped versions.")
    
    end_time = time.time()
    elapsed_time = end_time - start_time
    
    print(f"Done with day! No length Errors occured :)\nElapsed time: {elapsed_time/60:.2f} minutes \nTime per Image: {elapsed_time/len(images):.2f} seconds")

global_end_time = time.time()
global_elapsed_time = global_end_time - global_start_time

print(f"Time elapsed in total: {(global_elapsed_time/60)/60:.2f} hours")
print(f"Pipeline took {round((global_elapsed_time/len(image_folders))/60, 2)} minutes per day on average")

---------- GroundingDino runs on cpu ----------


Device set to use cpu


  0%|          | 0/531 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 39.94 minutes 
Time per Image: 4.51 seconds


  0%|          | 0/206 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 12.39 minutes 
Time per Image: 3.61 seconds


  0%|          | 0/236 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 15.47 minutes 
Time per Image: 3.93 seconds


  0%|          | 0/219 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 15.01 minutes 
Time per Image: 4.11 seconds


  0%|          | 0/388 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 27.46 minutes 
Time per Image: 4.25 seconds


  0%|          | 0/215 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 13.46 minutes 
Time per Image: 3.76 seconds


  0%|          | 0/644 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 48.78 minutes 
Time per Image: 4.54 seconds


  0%|          | 0/112 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 5.75 minutes 
Time per Image: 3.08 seconds


  0%|          | 0/78 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 1.82 minutes 
Time per Image: 1.40 seconds


  0%|          | 0/136 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 9.81 minutes 
Time per Image: 4.33 seconds


  0%|          | 0/179 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 9.93 minutes 
Time per Image: 3.33 seconds


  0%|          | 0/248 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 15.66 minutes 
Time per Image: 3.79 seconds


  0%|          | 0/282 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 17.42 minutes 
Time per Image: 3.71 seconds


  0%|          | 0/490 [00:00<?, ?it/s]

Image is truncated or corrupted: /Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/387_Jatz_mid/DIOPSIS-387_analyzed/photos/20240713/20240713164809.jpg - image file is truncated (11 bytes not processed)
Skipping this image.
Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 30.73 minutes 
Time per Image: 3.76 seconds


  0%|          | 0/414 [00:00<?, ?it/s]

Image is truncated or corrupted: /Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/387_Jatz_mid/DIOPSIS-387_analyzed/photos/20240728/20240728100832.jpg - cannot identify image file '/Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/387_Jatz_mid/DIOPSIS-387_analyzed/photos/20240728/20240728100832.jpg'
Skipping this image.
Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 25.67 minutes 
Time per Image: 3.72 seconds


  0%|          | 0/366 [00:00<?, ?it/s]

Image is truncated or corrupted: /Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/421_Jatz_low/Jatzhorn_lower_analyzed/photos/20240727/20240727100513.jpg - image file is truncated (19 bytes not processed)
Skipping this image.
Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 21.94 minutes 
Time per Image: 3.60 seconds


  0%|          | 0/353 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 21.19 minutes 
Time per Image: 3.60 seconds


  0%|          | 0/460 [00:00<?, ?it/s]

Image is truncated or corrupted: /Volumes/T7_Shield/Diopsis_Cameras/RESULTS_2024/Images_RAW_and_analyzed/385_Weiss_up/DIOPSIS-385_analyzed/photos/20240901/20240901100653.jpg - image file is truncated (6 bytes not processed)
Skipping this image.
Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 28.58 minutes 
Time per Image: 3.73 seconds


  0%|          | 0/485 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 29.31 minutes 
Time per Image: 3.63 seconds


  0%|          | 0/357 [00:00<?, ?it/s]

Done detecting all the insects and saving cropped versions.
Done with day! No length Errors occured :)
Elapsed time: 21.66 minutes 
Time per Image: 3.64 seconds
Time elapsed in total: 6.87 hours


NameError: name 'image_folders' is not defined