This notebook contains the pipeline to analyze insect images from a single image folder 

##### Implemented:
- Looks at the brightness of each image. If the image is to dark, it will not be analyzed and moved to the `dark_frames` folder.
- Identifies insects in the image with GroundinDino, defines a bounding-box and saves a cropped image of the insect in the `cropped` folder.
- Checks whether the detection has already appeared on one of the last images to prevent saving a cropped image of the same individuum over and over. The tracker from AMT (Automatic Moth Trap: https://stangeia.hobern.net/autonomous-moth-trap-image-pipeline/) was implemented for this task.
- AMT also includes funtionality that attempts to optimize the cropped image of an individuum by saving a new version of the insect if it is sharper than the old one.
- The code quickly checks the size of each detection. If it is unrealistically big, the crop is saved in the `potentially_faulty` folder.
- Saves the raw images with the tracking visualized on them in another subfolder called `detection_drawings` - if activated.
- Goes through all the cropped images in `cropped` again and sorts out any that are not recognized as an insect. Moves these images into the `potentially_faulty` folder. A custom trained algorithm (Apollo Environment, inference_dirt_classifier.py) is responsible for this. Room for improvement here through better training data.
- Classifies the cropped images using the InsectDetect classifier (https://github.com/maxsitt/insect-detect/tree/main?tab=readme-ov-file), ApolloNet - a self trained neural Network, or BioClip (https://imageomics.github.io/bioclip/) (https://github.com/Imageomics/pybioclip)
- Measures the length of the insects in the cropped images using a custom trained version of the SLEAP pipeline (https://sleap.ai/tutorials/initial-training.html)

In [1]:
from transformers import AutoModelForMaskGeneration, AutoProcessor, pipeline
from ipywidgets import interact, interactive, fixed, interact_manual, Layout
from PIL import Image, ImageDraw, ImageFont, PngImagePlugin
from typing import Any, List, Dict, Optional, Union, Tuple
from scipy.optimize import linear_sum_assignment
from bioclip import TreeOfLifeClassifier, Rank
from IPython.display import clear_output
from ipyfilechooser import FileChooser
import matplotlib.patches as patches
import plotly.graph_objects as go
from dataclasses import dataclass
import matplotlib.pyplot as plt
from collections import deque
import ipywidgets as widgets
import plotly.express as px
from tqdm import tqdm
import pandas as pd
import numpy as np
import subprocess
import warnings
import requests
import random
import torch
import glob
import json
import math
import time
import csv
import cv2
import os
import re

### Selfmade
from FunktionenZumImportieren.helper_funktionen_clean import *
from FunktionenZumImportieren.settings_widgets import *
from AMT_functions.colors import reportcolors
from AMT_functions.amt_tracker import *

warnings.simplefilter(action='always', category=UserWarning)
warnings.formatwarning = custom_warning_format
warnings.filterwarnings('ignore')

envs_path = get_local_env_path()

The envs directory is: C:\Users\rolingni\AppData\Local\anaconda3\envs


#### Define settings variables
##### Variables include:
- labels: prompt that specifies what GroundingDino searches in the images. Leave as `insect`.
- threshold: a threshold that tells GroundingDino whether or not to reject a detection or to accept it. Lower values means more detections but also more faulty detections.
- buffer: specifies the amount of border space in the cropped images. `15` is a good value usually.
- pixel_scale: how many pixels are in a cm of image on the current camera. Necessary for outputing mm readings of the insect length. Input `10` if you want to output the length in pixel.
- start_image: The name of a picture can be specified here (with extention). The code will then skip all the files before. Leave empty if you want to analyze all the images in the folder. Please leave this empty in the batch analysis pipeline. Might cause issues. If you want to start at the folder of a specific day, then put the folder number at the first spot in the brackets ([___:]) where the big arrow is in the main cell (<---). If you want to start from the 5th folder for some reason, then put in a 4.
- save_visualisation: activate if you want a visualisation of the detections and tracks saved in "detection_drawings"
- rank: Defines the taxonomic rank to which insects should be classified if BioClip is used. Can be set to None. This will make the algorithm classify up to the taxonomic rank that first satisfies the requirement set by certainty_threshold (see the function BioClip_inference in helper_funktionen). Setting rank to None will increase compute time.
- image_folder: folder containing the raw images to be analyzed.

    WORD OF CAUTION: The 'score' value (in the results) of the BioClip inference will only reflect the certainty for the taxonomic rank up to which the classification went. That means that if you classified up to species level for example, and later in the analysis go up to family level, the score will no longer reflect how certain BioClip is for the family rank. The score always reflects the certainty that is achieved for the taxonomic rank that is stated in the 'highest_classification_rank' column of the results csv. If you first classified to species, but then later decide to switch your analysis to family, you need to run the inference again (ideally with the 'Only_BioClip_inference' notebook in 'utils') in order to obtain the right score values for that taxonomic rank. Otherwise, the 'score' value might appear either way to high or way to low.

In [2]:
get_values = create_interactive_widgets()

Dropdown(description='Annotation algorithm:', layout=Layout(height='30px', width='50%'), options=(('BioClip', …

Checkbox(value=True, description='Perform image classification with BioClip', indent=False)

Dropdown(description='Taxonomic rank:', index=4, layout=Layout(height='30px', width='50%'), options=(('kingdom…

Checkbox(value=False, description='Perform image classification with ApolloNet', indent=False)

Checkbox(value=False, description='Perform image classification with InsectDetect', indent=False)

Checkbox(value=False, description='Save visualisations', indent=False)

Text(value='', description='If you want to start at a specific image:', layout=Layout(height='30px', width='50…

In [3]:
labels = ["insect"]
threshold = 0.3
buffer = 15
pixel_scale = 87

In [4]:
## On Mac:
#image_folder = "/Users/rentaluser/Downloads/Diopsis_photo_2024_09_03"

## On Windows:
image_folder = r"C:\Users\rolingni\OneDrive - Eidg. Forschungsanstalt WSL\Bilder\input_tests\input_test"

Main pipeline cell. All functionallity is included here\
Carefully monitor the printed details since some errors are not cought, but rather only printed. An error summary will appear at the end, telling you whether an error occured during the whole inference or not.

In [6]:
start_time = time.time()

settings = get_values()
start_processing = False
pixel_scale /= 10
blobs = []
trails = {}
classifier = TreeOfLifeClassifier()
tracker = AMTTracker(config)
object_detector = load_grounding_dino_model()
first_pass = True
images = os.listdir(image_folder)
images = sorted(images)

for image in tqdm(images):
    print("Analyzing image", image)
    image_arg = os.path.join(image_folder, image)
    skip, start_processing = should_image_be_skipped(settings.start_image, start_processing, image_arg, image, image_folder)
    if skip:
        continue
    detections = detect(
        object_detector,
        image=image_arg,
        labels=labels,
        threshold=threshold
    )
    print("Nr. of detections:", len(detections))
    image_array = Image.open(image_arg)
    image_array = np.array(image_array)
    blobs = convert_bounding_boxes(detections, image, image_array, image_folder, buffer)
    if first_pass:
        tracker.savedois = blobs
    new_tracks, _ = tracker.managetracks(tracker.savedois, blobs, first_pass)
    tracker.savedois = new_tracks
    first_pass = False

    if settings.save_visualisation:
        plot_tracks_and_detections(image_array, new_tracks, image)

print("--------")

## Classifying the cropped images
print("Done detecting all the insects and saving cropped versions.")

print("Double-checking for faulty detections...")
Apollo_input_path = os.path.join(image_folder, "cropped")
Apollo_script_path = os.path.join(envs_path, "ApolloNet", "Intro-to-CV-for-Ecologists-main", "inference_dirt_classifier.py")
Apollo_command = f'conda run -n ApolloNet python "{Apollo_script_path}" "{Apollo_input_path}"'
Apollo = subprocess.run(Apollo_command, shell=True, capture_output=True, text=True)
#print(Apollo.stdout)
if Apollo.returncode != 0:
    print(Apollo.stderr)
else:
    matches = re.search(r"HERE (\d+)", Apollo.stdout)
    matches = int(matches.group(1))
    if matches > 0:
        print(f"Found {matches} potentially faulty crops. Moved them to the potentially_faulty folder.\n--------")
    else:
        print("All crops seem to be insects :)\n--------")


print("Classifying crops now.")        

if settings.InsectDetect:
    print("InsectDetect classifier running...")
    os.chdir(os.path.join(envs_path, "InsectDetectSAM", "yolov5-cls"))
    print("Working directory is now:", os.getcwd())
    InsectDetect_input_path = os.path.join(image_folder, "cropped")
    !python classify/predict.py --project {image_folder} --name results --source {InsectDetect_input_path} --weights "insect-detect-ml-main/models/efficientnet-b0_imgsz128.onnx" --img 128 --sort-top1 --sort-prob --concat-csv
    print("--------")

if settings.ApolloNet:
    print("Performing ApolloNet Classification :)")
    Apollo_input_path = os.path.join(image_folder, "cropped")
    Apollo_script_path = os.path.join(envs_path, "ApolloNet", "Intro-to-CV-for-Ecologists-main", "inference.py")
    Apollo_command = f'conda run -n ApolloNet python "{Apollo_script_path}" "{Apollo_input_path}"'
    Apollo = subprocess.run(Apollo_command, shell=True, capture_output=True, text=True)
    if Apollo.returncode != 0:
        print(Apollo.stderr)
    else:
        print("ApolloNet ran clean\n--------")

if settings.BioClip:
    print("The tree of life is growing: running BioClip algorithm for image classification now...")
    BioClip_input_path = os.path.join(image_folder, "cropped")
    crops_for_BioClip = glob.glob(os.path.join(BioClip_input_path, '*'))
    crops_for_BioClip = sorted([item for item in crops_for_BioClip if os.path.isfile(item)])
    BioClip_predictions = BioClip_inference(classifier, crops_for_BioClip, settings.rank, certainty_threshold = 0.45)
    if len(BioClip_predictions) > 0:
        clean_predictions = process_BioClip_predictions(BioClip_predictions, image_folder, settings.rank)
    print("--------")
    

print("Done classifying the insects. Measuring body lengths now...")
length_input_path = os.path.join(image_folder, "cropped")
length_script_path = os.path.join(envs_path, "sleap", "body_length_inference_folder.py")
command = f'conda run -n sleap python "{length_script_path}" "{length_input_path}" "{pixel_scale}"'
ran_clean = subprocess.run(command, shell=True, capture_output=True, text=True)
if ran_clean.returncode != 0:
    print(f"stdout = {ran_clean.stdout}")
    traceback_index = ran_clean.stderr.find("Traceback")
    print(ran_clean.stderr[traceback_index:])
else:
    print("Length measurements ran clean\n--------")

merge_result_csvs(image_folder)
    

print("Done measuring. Annotating all results onto cropped images now...")
results_csv = get_classification_results_csv(image_folder, settings.annotation_algorithm)
input_folder = os.path.join(image_folder, "cropped_and_annotated")
length_csv_file_path = os.path.join(image_folder, "results", "body_length_results.csv")

if results_csv is not None:
    annotate_classifications(classification_results_csv = results_csv, body_length_csv = length_csv_file_path,
                             cropped_images_folder = input_folder, image_folder = image_folder, pixel_scale = pixel_scale,
                            annotation_algorithm = settings.annotation_algorithm)

print("--------")

end_time = time.time()
elapsed_time = end_time - start_time

print(f"Done with everything! No length Errors occured :)\nElapsed time: {elapsed_time/60:.2f} minutes \nTime per Image: {elapsed_time/len(images):.2f} seconds")

---------- GroundingDino runs on cpu ----------


  0%|                                                                                                                | 0/2 [00:00<?, ?it/s]

Analyzing image 20240825235447.jpg


100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00,  7.14s/it]

Nr. of detections: 8
Analyzing image cropped
--------
Done detecting all the insects and saving cropped versions.
Double-checking for faulty detections...





All crops seem to be insects :)
--------
Classifying crops now.
The tree of life is growing: running BioClip algorithm for image classification now...
Classified all images up to family-level
--------
Done classifying the insects. Measuring body lengths now...
Length measurements ran clean
--------
Done measuring. Annotating all results onto cropped images now...
--------
Done with everything! No length Errors occured :)
Elapsed time: 1.28 minutes 
Time per Image: 38.45 seconds
