<a href="https://colab.research.google.com/github/nicoelbert/deepflash2/blob/master/nbs/tutorial_train_and_pred_monuseg.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# deepflash2 - Train and Predict Tutorial on MoNuSeg
This notebook is optmizied to be executed on Google Colab (https://colab.research.google.com). It demonstrates how to use deepflash2 to train and predict on datasets comparable to [MoNuSeg: Multi-organ Nucleus Segmentation Challenge](https://monuseg.grand-challenge.org/).


*   Please read the instructions carefully.
*   Press the the *play* butten to execute the cells. It will show up between \[     \] on the left side of the code cells. 
*   Run the cells consecutively.

*References*:

Griebel, M., Segebarth, D., Stein, N., Schukraft, N., Tovote, P., Blum, R., & Flath, C. M. (2021). Deep-learning in the bioimaging wild: Handling ambiguous data with deepflash2. arXiv preprint arXiv:2111.06693.

## About the Example MoNuSeg Dataset
from https://monuseg.grand-challenge.org: Training data containing 30 images and around 22,000 nuclear boundary annotations has been released to the public previously as a dataset article in IEEE Transactions on Medical imaging in 2017.

* The train dataset (images and annotations) can be downloaded from https://drive.
google.com/file/d/1ZgqFJomqQGNnsx7w7QBzQQMVA16lbVCA/view


* Test set images with additional 7000 nuclear boundary annotations are available here MoNuSeg 2018 Testing data. Please cite the following papers if you use the training and testing datasets of this challenge: The test dataset can be downloaded from https://drive.google.com/file/d/1NKkSQ5T0ZNQ8aUhh0a8Dt2YKYCQXIViw/view

How to download and preprocess the data can be found in the [corresponding Notebook](https://github.com/matjesg/deepflash2/blob/master/paper/challenge_data/preprocess_monuseg.ipynb).

![MonNuSegCover.png](https://rumc-gcorg-p-public.s3.amazonaws.com/i/2020/02/22/Snip20200222_7.png)

## Setup
In this section, you will set up the training environment, install all dependencies and connect to the drive with the prepared datasets.

In [None]:
# Install deepflash package and import necessary files
!pip install deepflash2 --q
import numpy as np
from deepflash2.all import *
from pathlib import Path

### Settings



Prior to training and predicting directorys need to be specified and parameters need to be set. For convenience exissting Google Drive folders can be used. 

In [None]:
#################### Directories ####################

# Connect to drive
try:
  from google.colab import drive
  drive.mount('/gdrive')
except:
  print('Google Drive is not available.')

SEED = 0 # We used seeds [0,1,2] in our experiemnts
OUTPUT_PATH = Path("/content/predictions") # Save predictions here
MODEL_PATH = Path("/content/models") # Save models here
TRAINED_MODEL_PATH= Path('/gdrive/MyDrive/deepflash2-paper/models/')
DATA_PATH = Path('/gdrive/MyDrive/deepflash2-paper/data')

#################### Parameters ####################
DATASET = 'monuseg' 
mask_directory='masks_preprocessed'

# Datasets have different numbers of classes - 2 in case of monuseg
num_classes = 2
# Diameters are calculated using the median sizes from the respective training sets - 21 in case of monuseg
diameter = 21 

# Create deepflash2 config class
cfg = Config(random_state=SEED, 
            num_classes=num_classes, scale= 1.)



### Data preprocessing

- Initialize `EnsembleLearner`
- Plot images and masks to show if they are correctly loaded

In [None]:
train_data_path = DATA_PATH/DATASET/'train'
ensemble_path = MODEL_PATH/DATASET/f'{SEED+1}' 

el = EnsembleLearner(image_dir='images', 
                     mask_dir=mask_directory, 
                     config=cfg, 
                     path=train_data_path, 
                     ensemble_path=ensemble_path)

el.ds.show_data(max_n=2)

## Train models

- Train model ensemble with 5 models
  - 2500 iterations for each model
- You can skip this step use the trained models from our paper (see next section).

In [None]:
#el.fit_ensemble()

## Prediction on test set

We save
- Semantic segmentation masks (.png)
- Instance segmentation masks (.tif) using the cellpose flow representations
- Foreground uncertainty scores *U*

To ensure reproducibilty we will use the trained models from our paper!

In [None]:
test_data_path = DATA_PATH/DATASET/'test'
ensemble_name = f'{DATASET}_ensemble_{SEED+1}.pt'
ensemble_trained_dir = Path("/content/trained_models")/DATASET
ensemble_trained_dir.mkdir(exist_ok=True, parents=True)
ensemble_trained_path = ensemble_trained_dir/ensemble_name
prediction_path = OUTPUT_PATH/DATASET/f'{SEED+1}'

#load the training data from the training data
!wget -O {ensemble_trained_path.as_posix()} https://github.com/matjesg/deepflash2/releases/download/model_library/{ensemble_name}

ep = EnsemblePredictor('images',
                        path=test_data_path, 
                        config=cfg, 
                        ensemble_path=ensemble_trained_path) 

# Predict and save semantic segmentation masks
ep.get_ensemble_results(ep.files, export_dir=prediction_path)

# Save uncertainty scores
ep.df_ens.to_csv(prediction_path/'uncertainty_scores.csv', index=False)

# Show results scores
ep.show_ensemble_results()

In [None]:
print(f'Predicting class {1}')
ep.config.cellpose_export_class=1
ep.config.cellpose_diameter=diameter
ep.get_cellpose_results(export_dir=prediction_path)
ep.show_cellpose_results()