<a href="https://colab.research.google.com/github/matjesg/deepflash2/blob/master/paper/3-1_performance_comparison_reliability.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# deepflash2 - Performance Comparison for Reliability

> This notebook calculates the performance metrics for the methods in the deepflash2 [paper](https://arxiv.org/abs/2111.06693).

- **Data and results**: The data and results of the different methods are available on [Google Drive](https://drive.google.com/drive/folders/1r9AqP9qW9JThbMIvT0jhoA5mPxWEeIjs?usp=sharing). To use the data in Google Colab, create a [shortcut](https://support.google.com/drive/answer/9700156?hl=en&co=GENIE.Platform%3DDesktop) of the data folder in your personal Google Drive.

*Source files created with this notebook*
- `semantic_segmentation_reliability.csv`
- `instance_segmentation_reliability.csv`
- `instance_segmentation_reliability_agg.csv`

The preceding segmentation results can be reproduced using the `train-and-predict` notebooks on [github](https://github.com/matjesg/deepflash2/paper).

*References*:

Griebel, M., Segebarth, D., Stein, N., Schukraft, N., Tovote, P., Blum, R., & Flath, C. M. (2021). Deep-learning in the bioimaging wild: Handling ambiguous data with deepflash2. arXiv preprint arXiv:2111.06693.


## Setup

- Install dependecies
- Connect to drive

In [None]:
!pip install -Uq deepflash2

[?25l[K     |█████▉                          | 10 kB 24.9 MB/s eta 0:00:01[K     |███████████▊                    | 20 kB 13.6 MB/s eta 0:00:01[K     |█████████████████▌              | 30 kB 9.5 MB/s eta 0:00:01[K     |███████████████████████▍        | 40 kB 8.3 MB/s eta 0:00:01[K     |█████████████████████████████▏  | 51 kB 5.8 MB/s eta 0:00:01[K     |████████████████████████████████| 56 kB 1.5 MB/s 
[K     |████████████████████████████████| 88 kB 3.2 MB/s 
[K     |████████████████████████████████| 47.6 MB 114 kB/s 
[K     |████████████████████████████████| 197 kB 47.2 MB/s 
[K     |████████████████████████████████| 102 kB 49.4 MB/s 
[K     |████████████████████████████████| 3.3 MB 16.2 MB/s 
[K     |████████████████████████████████| 153 kB 50.1 MB/s 
[K     |████████████████████████████████| 60 kB 6.2 MB/s 
[K     |████████████████████████████████| 34.5 MB 11 kB/s 
[K     |████████████████████████████████| 376 kB 49.7 MB/s 
[K     |█████████████████████████████

In [None]:
# Imports
import imageio
import tifffile
import cv2
import pandas as pd
import numpy as np
from pathlib import Path
from fastprogress import progress_bar
from deepflash2.all import *
from deepflash2.data import _read_msk
from skimage.segmentation import relabel_sequential
from itertools import combinations
check_cellpose_installation()

Installing cellpose. Please wait.


In [None]:
# Connect to drive
try:
  from google.colab import drive
  drive.mount('/gdrive')
except:
  print('Google Drive is not available.')

Mounted at /gdrive


## Settings

For sementic and instance segmentation results. 

In [None]:
METHODS= ['cellpose', 'cellpose_single', 'cellpose_ensemble', 'unet_2019', 'nnunet', 'deepflash2']
DATASETS_SEMANTIC_SEG = ['PV_in_HC', 'cFOS_in_HC', 'mScarlet_in_PAG', 'YFP_in_CTX', 'GFAP_in_HC']
DATASETS_INSTANCE_SEG = ['PV_in_HC', 'cFOS_in_HC', 'mScarlet_in_PAG', 'YFP_in_CTX']

#https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py
thresholds = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)

OUTPUT_PATH = Path("/content/")
DATA_PATH = Path('/gdrive/MyDrive/deepflash2-paper')

SUBDIR = 'test'

min_pixel_dict = {
    'PV_in_HC':61, 
    'cFOS_in_HC':30, 
    'mScarlet_in_PAG':385, 
    'YFP_in_CTX':193,
}

cellpose_dict = {
    'PV_in_HC':'cyto', 
    'cFOS_in_HC':'cyto2',
    'mScarlet_in_PAG':'cyto2',
    'YFP_in_CTX':'cyto',
    'GFAP_in_HC':'cyto2'
}

def repetition_mapper(x, method, dataset):
  'Returns correct subfolder for non-trainable methods'
  if  method=='otsu': x = 'default'
  if  method=='cellpose': x = cellpose_dict[dataset]
  return str(x)

def expert_comparison(df, metric):
  'Calculates expert comparison metrics on data frame'
  df['expert_comparison'] = 'in expert range'
  df.loc[df[metric]>df['expert_max'], 'expert_comparison'] = 'above best expert'
  df.loc[df[metric]<df['expert_min'], 'expert_comparison'] = 'below worst expert'
  return df

def clean_labels(label_msk, min_pixel):
  'Remove areas blow below threshold'
  # remove areas < min pixel
  unique, counts = np.unique(label_msk, return_counts=True)
  label_msk[np.isin(label_msk, unique[counts<min_pixel])] = 0

  # re-label image
  label_msk, _ , _ = relabel_sequential(label_msk, offset=1)

  return label_msk

## Metrics

We propose a two-step evaluation:

1. Calculation of performance metrics (method vs. estimated ground truth)
  - Dice score for instance segmentation
  - Mean average precision for semantic segmentation
  - Average precision at IoU_50 for detection (supplement only)
2. Comparison to expert performance (against estimated ground truth)
  - Accounts for the ambiguity in the data

All results are calculated on the hold-out test sets.

In [None]:
# Semantic segmentation
results_semantic = []
metric = 'dice_score'

for dataset in progress_bar(DATASETS_SEMANTIC_SEG):
  revised = '' if  dataset=='GFAP_in_HC' else '_revised'
  mask_dir = 'masks_STAPLE'+revised
  path = DATA_PATH/'data'/dataset/SUBDIR

  for method in progress_bar(METHODS, leave=False):
    method_path = DATA_PATH/'results'/'semantic_segmentation'/dataset/method
    results_method = []
    
    for i, (rep_a, rep_b) in enumerate(combinations(range(1,4), 2)):
      rep_a_name = repetition_mapper(rep_a, method, dataset)
      rep_b_name = repetition_mapper(rep_b, method, dataset)
      
      pred_path_a = method_path/rep_a_name
      pred_path_b = method_path/rep_b_name

      masks_paths_a = [f for f in pred_path_a.iterdir()]

      for f in masks_paths_a:
        idx = f.stem.split('_')[0]
        pred_a = imageio.imread(f)//255
        pred_b = imageio.imread(pred_path_b/f'{idx}.png')//255

        # Calculate dice score
        ds = dice_score(pred_a, pred_b)

        tmp = pd.Series({
          'dataset': dataset,
          'method': method,
          'comparison': i,
          'repetition_a': str(rep_a),
          'repetition_a_name': rep_a_name,
          'repetition_b': str(rep_b),
          'repetition_b_name': rep_b_name,
          'idx': idx,
          metric: ds
          })   
        
        results_method.append(tmp)

    # Combine
    df_method = pd.DataFrame(results_method)
    results_semantic.append(df_method)

df_semantic = pd.concat(results_semantic)
df_semantic.to_csv(OUTPUT_PATH/'semantic_segmentation_reliability.csv', index=False)
df_semantic.tail()

Unnamed: 0,dataset,method,comparison,repetition_a,repetition_a_name,repetition_b,repetition_b_name,idx,dice_score
19,GFAP_in_HC,deepflash2,2,2,2,3,3,2377-0,0.964266
20,GFAP_in_HC,deepflash2,2,2,2,3,3,2378-2,0.957416
21,GFAP_in_HC,deepflash2,2,2,2,3,3,2376-3,0.951616
22,GFAP_in_HC,deepflash2,2,2,2,3,3,2376-2,0.969323
23,GFAP_in_HC,deepflash2,2,2,2,3,3,2375-1,0.96829


In [None]:
df_semantic.groupby(['dataset','method']).mean().round(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,comparison,dice_score
dataset,method,Unnamed: 2_level_1,Unnamed: 3_level_1
GFAP_in_HC,cellpose,1.0,1.0
GFAP_in_HC,cellpose_ensemble,1.0,0.844
GFAP_in_HC,cellpose_single,1.0,0.49
GFAP_in_HC,deepflash2,1.0,0.962
GFAP_in_HC,nnunet,1.0,0.957
GFAP_in_HC,unet_2019,1.0,0.861
PV_in_HC,cellpose,1.0,1.0
PV_in_HC,cellpose_ensemble,1.0,0.97
PV_in_HC,cellpose_single,1.0,0.951
PV_in_HC,deepflash2,1.0,0.977


In [None]:
# Instance segmentation and detection
results_instance = []
results_instance_agg = []
metric = 'mean_average_precision'

for dataset in progress_bar(DATASETS_INSTANCE_SEG):
  revised = '_revised'
  mask_dir = 'masks_STAPLE'+revised
  path = DATA_PATH/'data'/dataset/SUBDIR

  for method in progress_bar(METHODS, leave=False):
    method_path = DATA_PATH/'results'/'instance_segmentation'/dataset/method
    results_method_agg = []
    
    for i, (rep_a, rep_b) in enumerate(combinations(range(1,4), 2)):
      rep_a_name = repetition_mapper(rep_a, method, dataset)
      rep_b_name = repetition_mapper(rep_b, method, dataset)
      
      pred_path_a = method_path/rep_a_name
      pred_path_b = method_path/rep_b_name

      masks_paths_a = [f for f in pred_path_a.iterdir()]

      for f in masks_paths_a:
        idx = f.stem.split('_')[0]

        # Load and clean prediction
        label_pred_a = tifffile.imread(f)
        label_pred_a = clean_labels(label_pred_a, min_pixel=min_pixel_dict[dataset])

        # Load and clean prediction
        label_pred_b = tifffile.imread(pred_path_b/f'{idx}.tif')
        label_pred_b = clean_labels(label_pred_b, min_pixel=min_pixel_dict[dataset])

        # Calculate instance segmentation metrics
        ap, tp, fp, fn = get_instance_segmentation_metrics(label_pred_a,
                                                           label_pred_b, 
                                                           is_binary=False, 
                                                           thresholds=thresholds,
                                                           )
        # Detailed results
        tmp = pd.DataFrame({
          'dataset': dataset,
          'method': method,
          'comparison': i,
          'repetition_a': str(rep_a),
          'repetition_a_name': rep_a_name,
          'repetition_b': str(rep_b),
          'repetition_b_name': rep_b_name,
          'idx': idx,
          'threshold':thresholds,
          'average_precision':ap
          })   
        results_instance.append(tmp)

        # Aggregated results
        tmp_agg = pd.Series({
          'dataset': dataset,
          'method': method,
          'comparison': i,
          'repetition_a': str(rep_a),
          'repetition_a_name': rep_a_name,
          'repetition_b': str(rep_b),
          'repetition_b_name': rep_b_name,
          'idx': idx,
           metric: ap.mean(),
          'average_precision_at_iou_50':ap[0]
          })   
        
        results_method_agg.append(tmp_agg)

    # Aggregate
    df_method = pd.DataFrame(results_method_agg)
    results_instance_agg.append(df_method)

df_instance = pd.concat(results_instance)
df_instance.to_csv(OUTPUT_PATH/'instance_segmentation_reliability.csv', index=False)
display(df_instance.tail())

# Concat and save aggregated results
df_instance_agg = pd.concat(results_instance_agg)
df_instance_agg.to_csv(OUTPUT_PATH/'instance_segmentation_reliability_agg.csv', index=False)
df_instance_agg.tail()

creating new log file


Unnamed: 0,dataset,method,comparison,repetition_a,repetition_a_name,repetition_b,repetition_b_name,idx,threshold,average_precision
5,YFP_in_CTX,deepflash2,2,2,2,3,3,2342,0.75,0.872549
6,YFP_in_CTX,deepflash2,2,2,2,3,3,2342,0.8,0.854369
7,YFP_in_CTX,deepflash2,2,2,2,3,3,2342,0.85,0.854369
8,YFP_in_CTX,deepflash2,2,2,2,3,3,2342,0.9,0.720721
9,YFP_in_CTX,deepflash2,2,2,2,3,3,2342,0.95,0.469231


Unnamed: 0,dataset,method,comparison,repetition_a,repetition_a_name,repetition_b,repetition_b_name,idx,mean_average_precision,average_precision_at_iou_50
19,YFP_in_CTX,deepflash2,2,2,2,3,3,2341,0.848182,0.968421
20,YFP_in_CTX,deepflash2,2,2,2,3,3,2349,0.891784,0.973913
21,YFP_in_CTX,deepflash2,2,2,2,3,3,2332,0.870159,0.956522
22,YFP_in_CTX,deepflash2,2,2,2,3,3,2340,0.850779,0.952941
23,YFP_in_CTX,deepflash2,2,2,2,3,3,2342,0.822705,0.91


In [None]:
df_instance_agg.groupby(['dataset','method']).mean().round(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,comparison,mean_average_precision,average_precision_at_iou_50
dataset,method,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
PV_in_HC,cellpose,1.0,1.0,1.0
PV_in_HC,cellpose_ensemble,1.0,0.919,0.956
PV_in_HC,cellpose_single,1.0,0.873,0.927
PV_in_HC,deepflash2,1.0,0.927,0.95
PV_in_HC,nnunet,1.0,0.946,0.96
PV_in_HC,unet_2019,1.0,0.819,0.967
YFP_in_CTX,cellpose,1.0,1.0,1.0
YFP_in_CTX,cellpose_ensemble,1.0,0.886,0.957
YFP_in_CTX,cellpose_single,1.0,0.806,0.918
YFP_in_CTX,deepflash2,1.0,0.874,0.956
