# Description
*author:* Vina My Pham<br>
*supervisor:* Robin van der Weide<br>
*project:* MSc internship project<br>
<br>
*date:* January 15 - July 26, 2024<br>
*host:* Kind group, Hubrecht Institute<br>
*university:* Bioinformatics, Wageningen University & Research<br>

---

Notebook to segment a 2D TIFF image using pre-trained Cellpose models (v2.2.2) [1].

> The provided image will be segmented with all available models from the model zoo. The models will run with the same hyperparameters set by the user. Parameters are saved in a JSON file.

---

**Input**<br>
- Path to the 2D TIFF file (shape: channels * X * Y)
- Cellpose hyperparameters:

        cyto_channel
        nucleus_channel

        diameter
        flow_threshold
        cellprob_threshold
- `output_dir`: Output directory path
- `pip_requirements_txt_path`: Path to the requirements.txt file for pip install. (optional. provide if the same dependencies are used as in the report.)
---

**Output**
In `output_dir`:
- Predicted segmentations (labels) from each model. Files are named: "<model_type>_predictions.tiff"
- JSON file containing the meta data (input file path and cellpose hyperparameters).

---

**References**<br>
1. Pachitariu, M., Stringer, C. Cellpose 2.0: how to train your own model. Nat Methods 19, 1634–1641 (2022). https://doi.org/10.1038/s41592-022-01663-4


# Notebook initialisation
Execute at the start of your run.

**Input**
- `mount_drive`: Whether to give access to files in your Drive.
- `pip_requirements_path`: Path to the pip install requirements.txt file
    - if mounted to the Drive, make sure the path starts with "/content/gdrive/MyDrive/"
- `use_gpu`: Whether to run Cellpose on the GPU.
    - To change hardware type: `Runtime` >> `Change runtime type` >> `Hardware accelerator`

In [None]:
#user input
mount_drive = True #@{type: "string"}
pip_requirements_path = "" #@param {type:"string"}
use_gpu = True #@param {type:"boolean"}

In [None]:
#@markdown [mounting to Drive]
if mount_drive:
    from google.colab import drive
    drive.mount('/content/gdrive')

## pip install - cellpose v2.2.2

In [None]:
from datetime import datetime

if len(pip_requirements_path) > 0:
    print(f"{datetime.now()}\tInstalling Cellpose using a requirements.txt")
    !pip install -r "$pip_requirements_path"
else:
    print(f"{datetime.now()}\tInstalling Cellpose v2.2.2")
    !pip install cellpose==2.2.2
print(f"{datetime.now()}\tFinished installing Cellpose")

## imports

In [None]:
#imports
print(f"{datetime.now()}\Importing packages")

import os
import json
import datetime
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import subprocess

from cellpose.io import imread
from cellpose import models
cellpose_models =  models.MODEL_NAMES

print(f"{datetime.now()}\tSuccesfully imported packages")

## custom functions

In [None]:

#custom functions
def check_gpu_connection(use_gpu: bool) -> None:
    """Reports the details on GPU connection

    Args:
        use_gpu (bool): Whether to use the GPU for the script

    Returns:
        None
    """
    class GPUConnectionError(Exception):
        def __init__(self, message):
            self.message = message

    if not use_gpu:
        print("GPU will not be used in this run as `use_gpu` " +
              f"has been set to {use_gpu}")
        return None

    try:
        print("GPU connection requested:\n--------\n")
        subprocess.run(["nvidia-smi"], check=True)
        !nvidia-smi
    except FileNotFoundError:
        raise GPUConnectionError(f"`use_gpu` has been set to {use_gpu}. "+
               "However, the notebook is not connected to a GPU."+
               "\nPlease check the hardware type in the Colab Notebook "+
               "settings.")

    return None

def write_json(parameters: dict,
               save_dir: str,
               output_name: str,
               overwrite=False,
               verbose=True) -> str:
    """Write settings to a JSON file

    Args:
        parameters (dict): Settings to be written to the JSON file
        save_dir (str): The directory path where the JSON file will be saved
        overwrite (bool, optional): Overwrite if file exists. Default: False
        verbose (bool, optional): Print verbose. Default: True

    Returns:
        str: The absolute path of the generated JSON file

    Raises:
        FileExistsError: If a file in `save_dir` already
                         exists with `output_name` and `overwrite` is set to False
    """
    if os.path.exists(save_dir) == False:
      os.makedirs(save_dir)

    json_path = os.path.join(save_dir, output_name)

    if os.path.exists(json_path) and not overwrite:
        raise FileExistsError(f"File '{json_path}' exists and `overwrite` has" +
                              f" been set to {overwrite}")

    with open(json_path, 'w') as outfile_obj:
        json.dump(parameters, outfile_obj, indent=4)

    if verbose:
        print(f"All settings written to {json_path}")

    return json_path

def stacked(matrix):
    """Stack the channels of a image matrix

    Args:
        matrix (np.ndarray): image matrix (nChannels x nX x nY)
    Returns:
        np.ndarray: matrix representing the RGB format of an image (nX x nY)
    """
    return np.dstack((matrix[0,:,:], matrix[1,:,:], matrix[2,:,:]))

# GPU connection
Check if GPU connection is established. The use of a GPU speeds up the model segmentation.


In [None]:
check_gpu_connection(use_gpu)

# Input files

- `input_img_path` (str) - path to the image to segment (shape: channel * X * Y)

In [None]:
#@markdown **Provide the absolute path to the test slice.**
input_img_path = "" #@param {type:"string"}
input_img_path = os.path.join(input_img_path, "")[:-1]

#@markdown **Provide the path to the main output directory**
output_dir = "" #@param {type:"string"}
output_dir = os.path.join(output_dir, "")

#@markdown [code: plot slice and channels]
input_img = imread(input_img_path) #shape: nChannels x nX x nY
print("Image has been stored in `input_img`")

#plot channels and merged slice
nchannels = input_img.shape[0]

y=7
figsize = (nchannels*y, y)
nplots = nchannels+1
rgb = {0:"Red", 1:"Green", 2:"Blue"}

fig, axes = plt.subplots(1, nplots, figsize=figsize)

for i in range(nplots-1):
    axes[i].imshow(input_img[i,:,:], cmap=plt.cm.gray)
    axes[i].axis(False)
    axes[i].set_title(f"Channel {i+1} ({rgb[i]})")

axes[nplots-1].imshow(stacked(input_img))
axes[nplots-1].axis(False)
axes[nplots-1].set_title("Merged");

# Cellpose model hyperparameters
Official documentation:
https://cellpose.readthedocs.io/en/v1.0.2/settings.html#

- `cyto_channel` (int) - ID of channel (1-based) to use for cellular segmentation (e.g. cytoplasm/membrane staining). Set to `0` if all channels should be used (i.e. composite greyscale img.)

- `nucleus_channel` (int) - optional. ID of channel (1-based) to use as nuclear channel - markers for separate cells; improves segmentation of cells. 

- `diameter` (float) - Mean diameter of the cells.

> Author's note: I recommend manually inspecting the diameter using the Cellpose GUI. Cellpose documentation states automated diameter estimation can be activated via `diameter = None`, but this does not seem to work properly in this notebook.

- `flow_threshold` (float) - Maximum allowed error of the flows for each mask. (See official documentation.)

> Increase if model does not detect many cells. Decrease if model returns too many / odd-shaped cells.

- `cellprob_threshold` (float) - Minimum certainty the model has in its mask prediction. (See official documentation.)

> Decrease if model does not detect many cells. Increase if model picks up too much background noise.

In [None]:
#@markdown **Specify model parameters.**
cyto_channel = 2 #@param {type:"number"}
nucleus_channel = 0 #@param {type:"number"}

diameter=None#@param {type:"raw"}
flow_threshold=0.4 #@param {type:"slider", min:0, max:1, step:0.01}
cellprob_threshold=0 #@param {type:"slider", min:0, max:1, step:0.01}

#code
channels = [cyto_channel,nucleus_channel]

# Write to JSON
A JSON file with the run settings (img path, shape, and model parameters) will be created (".settings.JSON" in the provided output directory).

In [None]:
#@markdown [code: write to JSON]
parameters = {
    "input_files": {
        "input_img_path" : input_img_path,
        "input_img_shape" : input_img.shape
    },
    "cellpose_parameters": {
        "use_gpu" : use_gpu,
        "cyto_channel" : cyto_channel,
        "nucleus_channel" : nucleus_channel,
        "diameter" : diameter,
        "flow_threshold" : flow_threshold,
        "cellprob_threshold" : cellprob_threshold
    }
}

_ = write_json(parameters=parameters,
               save_dir=output_dir,
               output_name=".settings.JSON",
               overwrite=True,
               verbose=True)

# Running the Cellpose2D models

**Output**<br>
In the output directory: `<model_type>_predictions.tiff` for each model from the zoo.<br>
- label matrix, where each integer represents a separate segmentation. (0 = bg)

In [None]:
#show settings
print("using settings:")
print(parameters)

In [None]:
#@markdown [code: running the models]
print(f"{datetime.datetime.now()}\tPre-trained models: {cellpose_models}")
if input("continue? (y/n)") != 'y':
    raise ValueError("Input was not `y`. Models will not run.")

#running .eval
for model_type in cellpose_models:
    print(f"{datetime.datetime.now()}\tRunning {model_type}")
    model = models.CellposeModel(gpu=use_gpu, model_type=model_type, net_avg = True)

    # run model
    masks, flows, styles = model.eval(input_img,
                                    diameter=diameter,
                                    flow_threshold=flow_threshold,
                                    cellprob_threshold=cellprob_threshold,
                                    channels=channels)
    print(len(np.unique(masks)))

    #store results
    filename = f"{output_dir}{model_type}_predictions.tiff"
    matrix_file = Image.fromarray(masks)
    matrix_file.save(filename)

    print(f"{datetime.datetime.now()}\tFinished running {model_type}. Predictions stored under {filename}")