This notebook is useful if you only want to perform the classification task using BioClip. Cropped images need to be generated beforehand by using the main pipeline.

##### Implemented:
- Classifies the cropped images using BioClip (https://imageomics.github.io/bioclip/) (https://github.com/Imageomics/pybioclip)

In [26]:
from transformers import AutoModelForMaskGeneration, AutoProcessor, pipeline
from ipywidgets import interact, interactive, fixed, interact_manual, Layout
from PIL import Image, ImageDraw, ImageFont, PngImagePlugin
from typing import Any, List, Dict, Optional, Union, Tuple
from scipy.optimize import linear_sum_assignment
from bioclip import TreeOfLifeClassifier, Rank
from IPython.display import clear_output
#from ipyfilechooser import FileChooser
import matplotlib.patches as patches
import plotly.graph_objects as go
from dataclasses import dataclass
import matplotlib.pyplot as plt
from collections import deque
import ipywidgets as widgets
import plotly.express as px
from tqdm.notebook import tqdm
import pandas as pd
import numpy as np
import subprocess
import warnings
import requests
import random
import torch
import glob
import json
import math
import time
import csv
import cv2
import os
import re

### Selbst erstellt
# from FunktionenZumImportieren.classes import *
warnings.simplefilter(action='always', category=UserWarning)
from FunktionenZumImportieren.helper_funktionen_clean import *
from FunktionenZumImportieren.settings_widgets import *
warnings.formatwarning = custom_warning_format
from AMT_functions.colors import reportcolors
from AMT_functions.amt_tracker import *
warnings.filterwarnings('ignore')

envs_path = get_local_env_path()

The envs directory is: /Users/rentaluser/mambaforge3/envs


#### Define settings variables
##### Variables include:
- start_image: The name of a picture can be specified here (with extention). The code will then skip all the files before. Leave empty if you want to analyze all the images in the folder. Please leave this empty in the batch analysis pipeline. Might cause issues. If you want to start at the folder of a specific day, then put the folder number at the first spot in the brackets ([___:]) where the big arrow is in the main cell (<---). If you want to start from the 5th folder for some reason, then put in a 4.
- save_visualisation: activate if you want a visualisation of the detections and tracks saved in "detection_drawings". Not relevant in this notebook.
- rank: Defines the taxonomic rank to which insects should be classified if BioClip is used. Can be set to None. This will make the algorithm classify up to the taxonomic rank that first satisfies the requirement set by certainty_threshold (see the function BioClip_inference in helper_funktionen). Setting rank to None will increase compute time.
- DIOPSIS_folder: diopsis camera folder containing the raw images to be analyzed. This structure is expected: DIOPSIS_folder/photos/folders_of_days

    WORD OF CAUTION: The 'score' value (in the results) of the BioClip inference will only reflect the certainty for the taxonomic rank up to which the classification went. That means that if you classified up to species level for example, and later in the analysis go up to family level, the score will no longer reflect how certain BioClip is for the family rank. The score always reflects the certainty that is achieved for the taxonomic rank that is stated in the 'highest_classification_rank' column of the results csv. If you first classified to species, but then later decide to switch your analysis to family, you need to run the inference again (ideally with the 'Only_BioClip_inference' notebook in 'utils') in order to obtain the right score values for that taxonomic rank. Otherwise, the 'score' value might appear either way to high or way to low.

In [27]:
get_values = create_interactive_widgets()

Dropdown(description='Annotation algorithm:', layout=Layout(height='30px', width='50%'), options=(('BioClip', …

Checkbox(value=True, description='Perform image classification with BioClip', indent=False)

Dropdown(description='Taxonomic rank:', index=4, layout=Layout(height='30px', width='50%'), options=(('kingdom…

Checkbox(value=False, description='Perform image classification with ApolloNet', indent=False)

Checkbox(value=False, description='Perform image classification with InsectDetect', indent=False)

Checkbox(value=False, description='Save visualisations', indent=False)

Text(value='', description='If you want to start at a specific image:', layout=Layout(height='30px', width='50…

`output_filename` specifies the name that will be given to the results csv file that is saved separately in the `results` folder of each day.

The merged results csv that contains the results from all days will be saved as `results_order_classification.csv` in the upmost folder of that camera by default. Name can be changed further down in the main cell if need be.

In [28]:
## On Mac:
DIOPSIS_folder = "/Volumes/Untitled/Monstein_mid_analyzed"

## On Windows:
#DIOPSIS_folder = r"C:\Users\rolingni\Desktop\input_test4"


output_filename = "BioClip_order"

Main cell:

In [29]:
settings = get_values()
global_start_time = time.time()

global_error = False
classifier = TreeOfLifeClassifier()

image_folders = [folder for folder in os.listdir(os.path.join(DIOPSIS_folder, "photos")) if not folder.startswith('.')]
image_folders = sorted(image_folders)

for img_folder in tqdm(image_folders[:]): #  <------------------------------HERE----------------------------------------HERE-------------------------
    image_folder = os.path.join(DIOPSIS_folder, "photos", img_folder)
    start_time = time.time()
    
    start_processing = False
    first_pass = True
    images = os.listdir(image_folder)
    images = sorted(images)
    
                
    crops_available, crops_folder, crops_filepaths = are_crops_available(image_folder)
    
    print("Classifying crops now.")
    
    if settings.BioClip and crops_available:
        print("The tree of life is growing: running BioClip algorithm for image classification now...")
        BioClip_predictions = BioClip_inference(classifier, crops_filepaths, settings.rank, certainty_threshold = 0.45)
        if len(BioClip_predictions) > 0:
            clean_predictions = process_BioClip_predictions(BioClip_predictions, image_folder, settings.rank, output_filename)
        
        
        merge_result_csvs(image_folder)
        
    
    print("Annotating all results onto cropped images now...")
    results_csv = get_classification_results_csv(image_folder, settings.annotation_algorithm, BioClip_filename = output_filename)
    input_folder = os.path.join(image_folder, "cropped_and_annotated")
    length_csv_file_path = os.path.join(image_folder, "results", "body_length_results.csv")
    
    if results_csv is not None and crops_available:
        annotate_classifications(classification_results_csv = results_csv, body_length_csv = length_csv_file_path,
                                 cropped_images_folder = input_folder, image_folder = image_folder,
                                annotation_algorithm = settings.annotation_algorithm, output_folder = "annotated_order")
    
    end_time = time.time()
    elapsed_time = end_time - start_time
    
    print(f"Done with day! No length Errors occured :)\nElapsed time: {elapsed_time:.0f} seconds \nTime per Image: {elapsed_time/len(images):.2f} seconds")
    print("--------")

print("Merging all results csv into one...")
merge_all_results(DIOPSIS_folder, results_filename = output_filename, out_filename = "results_order_classification")

global_end_time = time.time()
global_elapsed_time = global_end_time - global_start_time

print(f"Time elapsed in total: {(global_elapsed_time/60):.2f} minutes")
print(f"Pipeline took {round((global_elapsed_time/len(image_folders))/60, 2)} minutes per day on average")
if not global_error:
    print("All inferences ran clean :)")
else:
    print("WARNING: At least one inference error occured somewhere :(")

  0%|          | 0/6 [00:00<?, ?it/s]

Classifying crops now.
The tree of life is growing: running BioClip algorithm for image classification now...
Classified all images up to order-level
Annotating all results onto cropped images now...
Done with day! No length Errors occured :)
Elapsed time: 2 seconds 
Time per Image: 0.09 seconds
--------
Classifying crops now.
The tree of life is growing: running BioClip algorithm for image classification now...
Classified all images up to order-level
Annotating all results onto cropped images now...
Done with day! No length Errors occured :)
Elapsed time: 13 seconds 
Time per Image: 0.05 seconds
--------
Classifying crops now.
The tree of life is growing: running BioClip algorithm for image classification now...
Classified all images up to order-level
Annotating all results onto cropped images now...
Done with day! No length Errors occured :)
Elapsed time: 17 seconds 
Time per Image: 0.05 seconds
--------
Classifying crops now.
The tree of life is growing: running BioClip algorithm fo