# Nuclear segmentation
In this notebook, we use TGMM (McDole et al. 2018) to segment our data from the SPIM.
To that end, we are using `tgmm_utility.py` to call the segmentation software and to distribute the tasks across the cores of the computer.

Moreover, we use this framework to run multiple segmentations using different parameter sets (speciffically the `maxPercentileTrimSV` parameter) that perform differently in different regions of the sample. Subsequently we merge the segmentations based on a distance criterion. Oversegmentation artifacts and false positives are eliminated using a histogram threshold in a later step.

In [1]:
import glob
import os
import datetime
import subprocess
from multiprocessing import Pool
import numpy as np
from pathlib import Path
import glob
import pandas as pd
from scipy.spatial import KDTree
import tqdm

from SegmentationIO import read_tgmm
from tgmm_utility import RunTGMM


## Parameters
First, we define the parameters of the segmentation.

In [2]:
params = {
    'imgFilePattern': None,
    'anisotropyZ':1,
    'backgroundThreshold': None,
    'persistanceSegmentationTau':5,
    'betaPercentageOfN_k':0.05,
    'nuPercentageOfN_k':1.0,
    'alphaPercentage':0.7,
    'maxIterEM':100,
    'tolLikelihood':1e-6,
    'regularizePrecisionMatrixConstants_lambdaMin':0.02,
    'regularizePrecisionMatrixConstants_lambdaMax':0.1,
    'regularizePrecisionMatrixConstants_maxExcentricity':9.0,
    'temporalWindowForLogicalRules':5,
    'thrBackgroundDetectorHigh':1.1,
    'thrBackgroundDetectorLow':0.2,
    'SLD_lengthTMthr':5,
    'radiusMedianFilter':1,
    'minTau':2,
    'conn3D':74,
    'estimateOpticalFlow':0,
    'maxDistPartitionNeigh':80.0,
    'deathThrOpticalFlow':-1,
    'minNucleiSize':8,
    'maxNucleiSize':3000,
    'maxPercentileTrimSV':0.2,
    'conn3DsvTrim':6,
    'maxNumKNNsupervoxel':10,
    'maxDistKNNsupervoxel':41.0,
    'thrSplitScore':-1.0,
    'thrCellDivisionPlaneDistance':12.403,
    'thrCellDivisionWithTemporalWindow':0.456,
}


In [6]:
import pandas as pd

dd = pd.DataFrame.from_dict(params, orient='index')

In [8]:
dd.to_csv('params.csv')

## Organisation of image files and paths
Next, we set up the environment to run the segmentations. For that, we get the locations of the image files and the corresponding background thresholds.

In [3]:
# klb_path = Path('Z:/data/m.brambach/klbs/')
klb_path = Path('Z:/data/m.brambach/2022-12-11-cdh1mKO/segment/')
files = list(klb_path.glob('*.tif'))


In [4]:
output_directory = Path('Z:\data\m.brambach\segmentation')
local_executables = Path(r'C:\Users\m.brambach\Documents\TGMM\Tracking_GMM_project-v0.3.0-win64\bin')

In [5]:
bg_threshold_file = pd.DataFrame(files, columns=['file'])
bg_threshold_file['threshold'] = 0
bg_threshold_file.to_csv(str(klb_path/'bg_thresholds.csv'))

In [6]:
bg_threshold = pd.read_csv(klb_path/'bg_thresholds_filled_in.csv')
threshold_dict = dict(zip(bg_threshold['file'].values, bg_threshold['threshold'].values))
threshold_dict

{'Z:\\data\\m.brambach\\2022-12-11-cdh1mKO\\segment\\C1-2022-12-11-cdh1mKO-dapi-cdh1-p63-48hpf-s01-v01.tif': 1540,
 'Z:\\data\\m.brambach\\2022-12-11-cdh1mKO\\segment\\C1-2022-12-11-cdh1mKO-dapi-cdh1-p63-48hpf-s01-v02.tif': 1540,
 'Z:\\data\\m.brambach\\2022-12-11-cdh1mKO\\segment\\C1-2022-12-11-cdh1mKO-dapi-cdh1-p63-48hpf-s01-v03.tif': 1540,
 'Z:\\data\\m.brambach\\2022-12-11-cdh1mKO\\segment\\C1-2022-12-11-cdh1mKO-dapi-cdh1-p63-48hpf-s01-v04.tif': 1540,
 'Z:\\data\\m.brambach\\2022-12-11-cdh1mKO\\segment\\C1-2022-12-11-cdh1mKO-dapi-cdh1-p63-48hpf-s02-v01.tif': 1540,
 'Z:\\data\\m.brambach\\2022-12-11-cdh1mKO\\segment\\C1-2022-12-11-cdh1mKO-dapi-cdh1-p63-48hpf-s02-v02.tif': 1540,
 'Z:\\data\\m.brambach\\2022-12-11-cdh1mKO\\segment\\C1-2022-12-11-cdh1mKO-dapi-cdh1-p63-48hpf-s02-v03.tif': 1540,
 'Z:\\data\\m.brambach\\2022-12-11-cdh1mKO\\segment\\C1-2022-12-11-cdh1mKO-dapi-cdh1-p63-48hpf-s03-v01.tif': 1540,
 'Z:\\data\\m.brambach\\2022-12-11-cdh1mKO\\segment\\C1-2022-12-11-cdh1mKO-dapi-

## Setting up the pipeline
Now, we need to specify the steps that need to be executed in our segmentation pipeline. Briefly, we first generate the `cf-tgmm` files that are used to call the segmentation software, run the first stage of the segmentation followed by three iterations of the second stage with different parameter sets. After that, we merge the segmentations based on the distance criterion and generate a summary `.csv` file.

In [7]:
def pipline(file):
    segmentation = RunTGMM(
        name=str(file.stem),
        output_directory=str(output_directory),
        local_executables=str(local_executables),
        parameters=params,
    )
    segmentation.update_parameters({'imgFilePattern':str(file)[:-4],
                                    'backgroundThreshold':threshold_dict[str(file)]})
    segmentation.generate_tgmm_config()
    segmentation.run_watershed_segmentation()
    for mptsv in [0.2, 0.5, 0.7]:
        segmentation.update_parameters({'maxPercentileTrimSV': mptsv})
        segmentation.generate_tgmm_config()
        segmentation.run_tgmm()
    segmentation.get_segmentation_as_df()
    df = segmentation.combine_segmentations()
    df.to_csv(str(output_directory / file.stem) + '.csv')
    return df

## Run the segmentation on multiple files

In [8]:
for f in tqdm.tqdm(files):
    pipline(f)

100%|████████████████████████████████████████████████████████████████████████████████| 39/39 [1:04:51<00:00, 99.78s/it]
