# Running cellpose 2.0 on exacloud with a GPU

This notebook allows you to load this **custom model** and run the model on your images with a GPU. 

In this notebook, you can also **train** a custom model using your labels (`_seg.npy`) files, or other labels as `_masks.tif` files. If you already have a trained model, skip this part of the notebook.

For more details on cellpose 2.0 check out the [paper](https://www.biorxiv.org/content/10.1101/2022.04.01.486764v1) or the [talk](https://www.youtube.com/watch?v=3ydtAhfq6H0).


# Setup

We will first check the GPU is working, and mount google drive to get your models and images.

Check CUDA version and that GPU is working in cellpose and import other libraries.

In [28]:
#!nvcc --version
#!nvidia-smi

import os, shutil
import numpy as np
import matplotlib.pyplot as plt
from cellpose import core, utils, io, models, metrics, transforms
from glob import glob
from natsort import natsorted


use_GPU = core.use_gpu()
yn = ['NO', 'YES']
print(f'>>> GPU activated? {yn[use_GPU]}')

2023-05-02 14:07:35,565 [INFO] ** TORCH CUDA version installed and working. **
>>> GPU activated? YES


# Train model on manual annotations

Fill out the form below with the paths to your data and the parameters to start training.

## Training parameters

<font size = 4> **Paths for training, predictions and results**


<font size = 4>**`train_dir:`, `test_dir`:** These are the paths to your folders train_dir (with images and masks of training images) and test_dir (with images and masks of test images). You can leave the test_dir blank, but it's recommended to have some test images to check the model's performance. To find the paths of the folders containing the respective datasets, go to your Files on the left of the notebook, navigate to the folder containing your files and copy the path by right-clicking on the folder, **Copy path** and pasting it into the right box below.

<font size = 4>**`initial_model`:** Choose a model from the cellpose [model zoo](https://cellpose.readthedocs.io/en/latest/models.html#model-zoo) to start from.

<font size = 4>**`model_name`**: Enter the path where your model will be saved once trained (for instance your result folder).

<font size = 4>**Training parameters**

<font size = 4>**`number_of_epochs`:** Input how many epochs (rounds) the network will be trained. At least 100 epochs are recommended, but sometimes 250 epochs are necessary, particularly from scratch. **Default value: 100**



In [29]:

#@markdown ###Path to images and masks:
#data_path = "/home/exacloud/gscratch/HeiserLab/images/cellpose_Ctc_HCC1143"
#data_path = "/home/exacloud/gscratch/HeiserLab/images/cellpose_Ctc_AU565"
#data_path = "/home/exacloud/gscratch/HeiserLab/images/cellpose_Ctc_21MT1"
#data_path = "/home/groups/heiserlab_genomics/home/dane/CellTracking/images/cellpose_Ctc_HCC1143nlc"
data_path = "/home/groups/heiserlab_genomics/home/dane/CellTracking/images/cellpose_Ccnt_HCC1143nlc_10x"
#data_path = "/home/groups/heiserlab_genomics/home/dane/CellTracking/images/cellpose_Ccnt_HCC1143nlc_20x"


train_dir = os.path.join(data_path, "train") #@param {type:"string"}
train_dir = os.path.join(data_path, "train_Ctn") #@param {type:"string"}
test_dir = os.path.join(data_path, "test") #@param {type:"string"}
#Define where the patch file will be saved
base = "/content"

# model name and path
#@markdown ###Name of the pretrained model to start from and new model name:
from cellpose import models
#initial_model_name = "LC2" #@param ['cyto','nuclei','tissuenet','livecell','cyto2','CP','CPx','TN1','TN2','TN3','LC1','LC2','LC3','LC4','scratch']
initial_model_name = "Ctc" #@param ['cyto','nuclei','tissuenet','livecell','cyto2','CP','CPx','TN1','TN2','TN3','LC1','LC2','LC3','LC4','scratch']
initial_model_name = "Ctn" #@param ['cyto','nuclei','tissuenet','livecell','cyto2','CP','CPx','TN1','TN2','TN3','LC1','LC2','LC3','LC4','scratch']
#model_name = "Ccnt"
model_name = "Ctc"
model_name = "Ctn"
# other parameters for training.
#@markdown ###Training Parameters:
#@markdown Number of epochs:
n_epochs =  500#@param {type:"number"}

Channel_to_use_for_training = "Grayscale" #@param ["Grayscale", "Blue", "Green", "Red"]

# @markdown ###If you have a secondary channel that can be used for training, for instance nuclei, choose it here:

Second_training_channel= "None" #@param ["Grayscale", "Blue", "Green", "Red"]


#@markdown ###Advanced Parameters

Use_Default_Advanced_Parameters = True #@param {type:"boolean"}
#@markdown ###If not, please input:
learning_rate = 0.1 #@param {type:"number"}
weight_decay = 0.0001 #@param {type:"number"}

if (Use_Default_Advanced_Parameters): 
  print("Default advanced parameters enabled")
  learning_rate = 0.1 
  weight_decay = 0.0001
  
#here we check that no model with the same name already exist, if so delete
model_path = os.path.join(train_dir,'models')
if os.path.exists(model_path+'/'+model_name):
  print("!! WARNING: "+model_name+" already exists and will be deleted in the following cell !!")
  
if len(test_dir) == 0:
  test_dir = None

# Here we match the channel to number
if Channel_to_use_for_training == "Grayscale":
  chan = 0
elif Channel_to_use_for_training == "Blue":
  chan = 3
elif Channel_to_use_for_training == "Green":
  chan = 2
elif Channel_to_use_for_training == "Red":
  chan = 1


if Second_training_channel == "Blue":
  chan2 = 3
elif Second_training_channel == "Green":
  chan2 = 2
elif Second_training_channel == "Red":
  chan2 = 1
elif Second_training_channel == "None":
  chan2 = 0

if initial_model_name=='scratch':
  initial_model = 'None'

Default advanced parameters enabled


Here's what the command to train would be on the command line -- make sure if you run this locally to correct the paths for your local computer.

## Train new model

Using settings from form above, train the model on images and labels in "train".

In [30]:
#when updating a trained model, use it as both the initial model and the new model
#comment out this line when the initial model is from the cellpose zoo
initial_model_path = os.path.join(data_path, "train","models",initial_model_name) #@param {type:"string"}

# start logger (to see training across epochs)
logger = io.logger_setup()

diam_mean = 37 #21MT1
diam_mean = 35 #HCC1143
diam_mean = 39 #HCC1143nlc_20x
diam_mean = 15 #XX cellline (HCCqq43 @10x

# C:\Users\dane\.cellpose\models
model = models.CellposeModel(gpu=True, 
                             pretrained_model=initial_model_path,
                            net_avg = True,
                            diam_mean = diam_mean)
# set channels
channels = [chan, chan2]

# get files
output = io.load_train_test_data(train_dir, mask_filter='_seg.npy')
train_data, train_labels, _, test_data, test_labels, _ = output
#try inverting images before training
#foo = transforms.reshape_and_normalize_data(train_data)


new_model_path = model.train(train_data, train_labels, 
                              #test_data=test_data,
                              #test_labels=test_labels,
                             #diam_mean = diam_mean,
                             channels=channels,
                              save_path=train_dir, 
                              n_epochs=n_epochs,
                              learning_rate=learning_rate, 
                              weight_decay=weight_decay, 
                              nimg_per_epoch=8,
                              model_name=model_name)

# diameter of labels in training images
diam_labels = model.diam_labels.copy()

2023-05-02 14:07:36,023 [INFO] WRITING LOG OUTPUT TO /home/users/dane/.cellpose/run.log
2023-05-02 14:07:36,117 [INFO] >>>> loading model /home/groups/heiserlab_genomics/home/dane/CellTracking/images/cellpose_Ccnt_HCC1143nlc_10x/train/models/Ctn
2023-05-02 14:07:36,119 [INFO] ** TORCH CUDA version installed and working. **
2023-05-02 14:07:36,120 [INFO] >>>> using GPU
2023-05-02 14:07:37,321 [INFO] >>>> model diam_mean =  30.000 (ROIs rescaled to this size during training)
2023-05-02 14:07:37,322 [INFO] >>>> model diam_labels =  13.046 (mean diameter of training ROIs)
2023-05-02 14:07:37,627 [INFO] not all flows are present, running flow generation for all images
2023-05-02 14:07:40,273 [INFO] 3 / 3 images in /home/groups/heiserlab_genomics/home/dane/CellTracking/images/cellpose_Ccnt_HCC1143nlc_10x/train_Ctn folder have labels
2023-05-02 14:07:40,441 [INFO] computing flows for labels


100%|██████████| 3/3 [00:00<00:00, 12.01it/s]


2023-05-02 14:07:40,884 [INFO] >>>> median diameter set to = 30
2023-05-02 14:07:40,886 [INFO] >>>> mean of training label mask diameters (saved to model) 13.485
2023-05-02 14:07:40,887 [INFO] >>>> training network with 2 channel input <<<<
2023-05-02 14:07:40,887 [INFO] >>>> LR: 0.10000, batch_size: 8, weight_decay: 0.00010
2023-05-02 14:07:40,888 [INFO] >>>> ntrain = 3
2023-05-02 14:07:40,889 [INFO] >>>> nimg_per_epoch = 8
2023-05-02 14:07:41,154 [INFO] Epoch 0, Time  0.3s, Loss 0.1205, LR 0.0000
2023-05-02 14:07:41,362 [INFO] saving network parameters to /home/groups/heiserlab_genomics/home/dane/CellTracking/images/cellpose_Ccnt_HCC1143nlc_10x/train_Ctn/models/Ctn
2023-05-02 14:07:42,966 [INFO] Epoch 5, Time  2.1s, Loss 0.1373, LR 0.0556
2023-05-02 14:07:44,054 [INFO] Epoch 10, Time  3.2s, Loss 0.0702, LR 0.1000
2023-05-02 14:07:46,208 [INFO] Epoch 20, Time  5.3s, Loss 0.0971, LR 0.1000
2023-05-02 14:07:48,357 [INFO] Epoch 30, Time  7.5s, Loss 0.0786, LR 0.1000
2023-05-02 14:07:50,5

# Use custom model to segment images

Take custom trained model from above to segment an image set

## Parameters

In [31]:
# model name and path

#@markdown ###Custom model path (full path):
model_path = os.path.join(data_path, "train","models",model_name) #@param {type:"string"}
model_path = os.path.join(data_path, "train_Ctn","models",model_name) #@param {type:"string"}

#@markdown ###Path to images:

dir = os.path.join(data_path, "images") #@param {type:"string"}
dir = os.path.join(data_path, "images_Ctn") #@param {type:"string"}


#@markdown ###Channel Parameters:

Channel_to_use_for_segmentation = "Grayscale" #@param ["Grayscale", "Blue", "Green", "Red"]

# @markdown If you have a secondary channel that can be used, for instance nuclei, choose it here:

Second_segmentation_channel= "None" #@param ["None", "Blue", "Green", "Red"]


# Here we match the channel to number
if Channel_to_use_for_segmentation == "Grayscale":
  chan = 0
elif Channel_to_use_for_segmentation == "Blue":
  chan = 3
elif Channel_to_use_for_segmentation == "Green":
  chan = 2
elif Channel_to_use_for_segmentation == "Red":
  chan = 1


if Second_segmentation_channel == "Blue":
  chan2 = 3
elif Second_segmentation_channel == "Green":
  chan2 = 2
elif Second_segmentation_channel == "Red":
  chan2 = 1
elif Second_segmentation_channel == "None":
  chan2 = 0

#@markdown ### Segmentation parameters:

#@markdown diameter of cells (set to zero to use diameter from training set):
diameter =  0#@param {type:"number"}
#@markdown threshold on flow error to accept a mask (set higher to get more cells, e.g. in range from (0.1, 3.0), OR set to 0.0 to turn off so no cells discarded):
flow_threshold = 0.4 #@param {type:"slider", min:0.0, max:3.0, step:0.1}
#@markdown threshold on cellprob output to seed cell masks (set lower to include more pixels or higher to include fewer, e.g. in range from (-6, 6)):
cellprob_threshold=0.5 #@param {type:"slider", min:-6, max:6, step:1}


## run a model on a subset of the images



In [32]:
# gets image files in dir (ignoring image files ending in _masks)
files = io.get_image_files(dir, '_masks', imf = "_R_img")
#print(files)
images = [io.imread(f) for f in files]

# declare model
model = models.CellposeModel(gpu=True, 
                             pretrained_model=model_path)

# use model diameter
diameter = model.diam_labels #use diameter in trained model

# run model on test images
masks, flows, styles = model.eval(images, 
                                  channels=[chan, chan2],
                                  diameter=diameter,
                                  flow_threshold=flow_threshold,
                                  cellprob_threshold=cellprob_threshold
                                  )

2023-05-02 14:09:33,692 [INFO] >>>> loading model /home/groups/heiserlab_genomics/home/dane/CellTracking/images/cellpose_Ccnt_HCC1143nlc_10x/train_Ctn/models/Ctn
2023-05-02 14:09:33,694 [INFO] ** TORCH CUDA version installed and working. **
2023-05-02 14:09:33,695 [INFO] >>>> using GPU
2023-05-02 14:09:34,400 [INFO] >>>> model diam_mean =  30.000 (ROIs rescaled to this size during training)
2023-05-02 14:09:34,401 [INFO] >>>> model diam_labels =  13.485 (mean diameter of training ROIs)
2023-05-02 14:09:34,404 [INFO] 0%|          | 0/3 [00:00<?, ?it/s]
2023-05-02 14:09:36,701 [INFO] 33%|###3      | 1/3 [00:02<00:04,  2.30s/it]
2023-05-02 14:09:38,818 [INFO] 67%|######6   | 2/3 [00:04<00:02,  2.19s/it]
2023-05-02 14:09:42,359 [INFO] 100%|##########| 3/3 [00:07<00:00,  2.81s/it]
2023-05-02 14:09:42,360 [INFO] 100%|##########| 3/3 [00:07<00:00,  2.65s/it]


## save output to *_seg.npy

save images and mask labels combined in numpy arrays with "_img_seg.npy" suffixes

In [33]:
from cellpose import io

io.masks_flows_to_seg(images, 
                      masks, 
                      flows, 
                      diameter*np.ones(len(masks)), 
                      files, 
                      channels)