# Creating a **combined quantification pipeline & batch processing** - part 2.5
--------------------
Now that all the function to quantify features of organelle composition, morphology, interactions, and distribution have been created, we will combine them into a wrapper function for easy use. This will then be used as part of a batch processing function allowing the the wrapper to be applied to multiple images in the same directory.

## **OBJECTIVE**
### <input type="checkbox"/> Create a **combined quantification pipeline** for **batch processing** all organelles of interest
In this notebook, the logic for quantifying the **composition**, **morphology**, **interaction**, and **distribution** of **organelles** (as many as you would like) from single cells. To do this, the quantification functions from previous notebooks (notebooks 2.1-2.4) at combined into a single function to ***process multiple organelles*** at a time and a framework for ***batch processing*** multiple cells/images is also established here.

---------
## **Combined Measurements and Batch Processing**

### summary of steps

#### **PART 1️⃣: Combined Measurements**

- **`0`** - Establish Parameters for prototype `_make_all_metrics_tables` function *(preliminary step)*

- **`1`** - Stack intensity images with observed organelles

- **`2`** - Collect region morphology data using `get_region_morphology` function

    - identify example cell region *(cellmask)*
    - run `get_region_morphology_3D`

- **`3`** - Collect organelle morphology data using `get_org_morphology_3D` function

    - identify example organelle *(lysosome)*
    - obtain example organelle segmentation
    - run `get_org_morphology_3D`

- **`4`** - Collect distribution metrics data using `get_XY_distribution` and `get_Z_distribution` functions

    - establish containers and centering object
    - obtain XY Distribution data
    - obtain Z Distribution data
    - add distribution data (from both functions) to containers

- **`5`** - Collect organelle interaction metrics using `get_contact_metrics_3D`

    - create all 2-way organelle combinations
    - identify example interaction
    - retrieve organelle segmentations involved in interaction
    - run `get_contact_metrics_3D`

- **`6`** - Combine all tables into four inter-organelle tables

- Define `_get_org_morphology_3D` function

- Run `_get_org_morphology_3D` function

- Compare to finalized `get_org_morphology_3D` function

#### **PART 2️⃣: Batch Process Quantification**

- **`0`** - Establish the image data paths *(preliminary step)*

- **`1`** - Locate the matching raw and segmentation files *(`_find_segmentation_tiff_files`)*

    - identify protoype image and segmentations to collect
    - locate folders for raw image and segmentations
    - collect all listed region and segmentation filenames
    - define `_find_segmentation_tiff_files` function
    - run `_find_segmentation_tiff_files` function
    - compare to finalized `find_segmentation_tiff_files` function

- **`2`** - Establish the parameters for prototype `_batch_process_quantification` function

- **`3`** - List raw image files and their corresponding segmentations and to collect

- **`4`** - Obtain segmentations and raw image files

    - run `find_segmentation_tiff_files` to collect segmentation filenames
    - read in linearly unmixed image file *(raw)*
    - collect the listed organelle channel intensities from the raw image file
    - identify the scale of the raw image from the metadata
    - read in organelle and region segmentations

- **`5`** - Run `make_all_metrics_tables` and store output

    - run `make_all_metrics_tables` for the prototype image
    - repeat steps **`3`** and **`4`** and run `make_all_metrics_tables` for all images in `img_file_list`

- **`6`** - Combine all per image tables into four comprehensive tables

    - batch organelle morphology table
    - batch organelle interactions table
    - batch distribution measurments table
    - batch cell region morphology table

- **`7`** - Export comprehensive tables as .csv files

- Define `_batch_process_quantification` function

- Run `_batch_process_quantification` function

## **IMPORTS**

#### &#x1F3C3; **Run code; no user input required**

&#x1F453; **FYI:** This code block loads all of the necessary python packages and functions you will need for this notebook.

In [None]:
from pathlib import Path
from typing import Dict
import os
import itertools

from infer_subc.core.file_io import (read_czi_image,
                                     import_inferred_organelle,
                                     list_image_files)
from infer_subc.core.img import *
from infer_subc.utils.stats import *
from infer_subc.utils.stats_helpers import *
from infer_subc.organelles import * 

from datetime import datetime

import time
%load_ext autoreload
%autoreload 2

## **LOAD AND READ IN IMAGE FOR PROCESSING**
> ###### 📝 **Specifically, this will include the raw image and the outputs from segmentation**

#### &#x1F6D1; &#x270D; **User Input Required:**

In [None]:
## Define the path to the directory that contains the input image folder.
data_root_path = Path(os.getcwd()).parents[1] / "sample_data" /  "example_astrocyte"

## Specify which subfolder that contains the input data and what the file type is. Ex) ".czi" or ".tiff"
in_data_path = data_root_path / "raw"
raw_img_type = ".tiff"

## Specify which subfolder contains the segmentation outputs and their file type
seg_data_path = data_root_path / "seg"
seg_img_type = ".tiff"

## Specify the name of the output folder where quantification results will be saved
out_data_path = data_root_path / "quant"

# Specify which file you'd like to segment from the img_file_list
test_img_n = 0

#### &#x1F3C3; **Run code; no user input required**

In [None]:
if not Path.exists(out_data_path):
    Path.mkdir(out_data_path)
    print(f"making {out_data_path}")

raw_file_list = list_image_files(in_data_path, raw_img_type)
seg_file_list = list_image_files(seg_data_path, seg_img_type)
# pd.set_option('display.max_colwidth', None)
# pd.DataFrame({"Image Name":img_file_list})

In [None]:
raw_img_name = raw_file_list[test_img_n]

raw_img_data, raw_meta_dict = read_czi_image(raw_img_name)

channel_names = raw_meta_dict['name']
img = raw_meta_dict['metadata']['aicsimage']
scale = raw_meta_dict['scale']
channel_axis = raw_meta_dict['channel_axis']

In [None]:
## For each import, change the string to match the suffix on the segmentation files (i.e., the stuff following the "-")

# masks
masks_seg_names = ['masks','masks_A', 'masks_B']
for m in masks_seg_names:
    if m in [i.stem.split("-")[-1] for i in seg_file_list]:
        mask_seg = import_inferred_organelle(m, raw_meta_dict, seg_data_path, seg_img_type)
        nuc_seg, cell_seg, cyto_seg = mask_seg
        break

if 'nuc' in [i.stem.split("-")[-1] for i in seg_file_list]:
    nuc_seg = import_inferred_organelle("nuc", raw_meta_dict, seg_data_path, seg_img_type)
    cell_seg = import_inferred_organelle("cell", raw_meta_dict, seg_data_path, seg_img_type)
    cyto_seg = import_inferred_organelle("cyto", raw_meta_dict, seg_data_path, seg_img_type)

#organelles
lyso_seg = import_inferred_organelle("lyso", raw_meta_dict, seg_data_path, seg_img_type)
mito_seg = import_inferred_organelle("mito", raw_meta_dict, seg_data_path, seg_img_type)
golgi_seg = import_inferred_organelle("golgi", raw_meta_dict, seg_data_path, seg_img_type)
perox_seg = import_inferred_organelle("perox", raw_meta_dict, seg_data_path, seg_img_type)
ER_seg = import_inferred_organelle("ER", raw_meta_dict, seg_data_path, seg_img_type)
LD_seg = import_inferred_organelle("LD", raw_meta_dict, seg_data_path, seg_img_type)

# ***PART 1️⃣: Combined Measurements***

## **`0` - Establish Parameters for prototype `_make_all_metrics_tables` function *(preliminary step)***

> ###### **📝 some variable names will differ in the defined prototype function to make a distinction between global and local variables**

In [None]:
# names of organelles we have only two for this example
organelle_names = ["lyso", "ER"]

# to get the intensities of the test organelles
organelle_channels = [3,1]

# create intensities from raw as list
intensities = [raw_img_data[ch] for ch in organelle_channels]

# load organelles as list
organelles = [lyso_seg, ER_seg]

# load regions as list (only cellmask for this example)
regions = [cell_seg]

# list of region names
region_names = ['cell']

# Number of bins to be used during calculation of distribution metrics
dist_num_bins=5

# If set to true the bins will be distributed from the center of the centering object to the edge of the cellmask
# If set to false the bins will be distributed from the edge of the centering object to the edge of the cellmask
dist_center_on=False

# Whether or not to include the centering object as the first bin
dist_keep_center_as_bin=True

# the number of zernike degrees to include for the zernike shape descriptors
dist_zernike_degrees=9

# Whether or not to include distribution data for the interaction sites
include_contact_dist=True

# select the cellmask as the masking object for the organelle segmentations
# This is done differently in the function, but is simplified here
mask = cell_seg

## **`1` - Stack intensity images with observed organelles**

In [None]:
# create np.ndarray of intensity images
raw_image = np.stack(intensities)

## **`2` - Collect region morphology data using `get_region_morphology` function**

> ###### **📝 In the following cells, the cellmask will serve as the example region. The finalized prototype function will collect the morphology data for all regions listed as input.**

- identify example region *(cellmask)*

In [None]:
# contains the per region morphological information
region_tabs = []

# Establishing the cellmask as the example region
r_name = 'cell'
r = region_names.index(r_name)

- run `get_region_morphology_3D`

In [None]:
region = regions[r]
region_metrics = get_region_morphology_3D(region_seg=region, 
                                            region_name=r_name,
                                            channel_names=organelle_names,
                                            intensity_img=raw_image, 
                                            mask=mask,
                                            scale=scale)
region_tabs.append(region_metrics)

# Show cellmask morphology output
region_tabs[0]

## **`3` - Collect organelle morphology data using `get_org_morphology_3D` function**

> ###### **📝 In the following cells, the lysosome will serve as an example. The finalized prototype function will collect the morphology data for all organelles listed as input.**

- identify example organelle *(lysosome)*

In [None]:
# contains the per organelle morphological information
org_tabs = []

# Establishing the lysosome as the example organelle
target = "lyso"
j = organelle_names.index(target)

- obtain example organelle segmentation

In [None]:
# organelle intensity image
org_img = intensities[j]

# organelle segmentation
if target == 'ER':
    # ensure ER is only one object
    org_obj = (organelles[j] > 0).astype(np.uint16)
else:
    org_obj = organelles[j]

- run `get_org_morphology_3D`

In [None]:
# collect the morphology data for the organelle
org_metrics = get_org_morphology_3D(segmentation_img=org_obj, 
                                    seg_name=target,
                                    intensity_img=org_img, 
                                    mask=mask,
                                    scale=scale)

# add the morphology data to the container
org_tabs.append(org_metrics)

# Show organelle morphology output
org_tabs[0]

## **`4` - Collect distribution metrics data using `get_XY_distribution` and `get_Z_distribution` functions**

- establish containers and centering object

In [None]:
# contains the per organelle distribution information
dist_tabs = []

# contains the masks of the concentric ring bins per organelle
XY_bins = []

#contains the masks of the 8 radial wedges per organelle
XY_wedges = []

# Although set to the nucleus here, the protoype function allows the selection of the centering object
centering = nuc_seg

- obtain XY Distribution data

In [None]:
# collect the XY distribution data for the organelle
XY_org_distribution, XY_bin_masks, XY_wedge_masks = get_XY_distribution(mask=mask,
                                                                        centering_obj=centering,
                                                                        obj=org_obj,
                                                                        obj_name=target,
                                                                        scale=scale,
                                                                        num_bins=dist_num_bins,
                                                                        center_on=dist_center_on,
                                                                        keep_center_as_bin=dist_keep_center_as_bin,
                                                                        zernike_degrees=dist_zernike_degrees)

- obtain Z Distribution data

In [None]:
# collect the Z distribution data for the organelle
Z_org_distribution = get_Z_distribution(mask=mask, 
                                                obj=org_obj,
                                                obj_name=target,
                                                center_obj=centering,
                                                scale=scale)

- add distribution data (from both functions) to containers

In [None]:
# Combine distribution data
org_distribution_metrics = pd.merge(XY_org_distribution, Z_org_distribution,on=["object", "scale"])

# Add distribution data to the container
dist_tabs.append(org_distribution_metrics)

# Add the mask of the concentric ring bins to the bins container
XY_bins.append(XY_bin_masks)

# Add the mask of the 8 radial wedges to the wedges container
XY_wedges.append(XY_wedge_masks)

## **`5` - Collect organelle interaction metrics using `get_contact_metrics_3D`**

- create all 2-way organelle combinations (non-redundant)

> ###### **📝 This is an example of how the list of combinations is determined in the finalized prototype function. In the following steps, only the lysosome and ER interaction data will be quantified; this is performed for all combinations in the finalized prototype function.**

In [None]:
list(itertools.combinations(["lyso", "mito","golgi","perox","ER","LD"], 2))

In [None]:
contact_combos = list(itertools.combinations(organelle_names, 2))

# container to keep contact data in
contact_tabs = []

- identify example interaction

In [None]:
# Establshing lyso and ER as the only pair
pair = contact_combos[0]

# pair names
a_name = pair[0]
b_name = pair[1]

- retrieve organelle segmentations involved in interaction

In [None]:
# segmentations to measure
if a_name == 'ER':
    # ensure ER is only one object
    a = (organelles[organelle_names.index(a_name)] > 0).astype(np.uint16)
else:
    a = organelles[organelle_names.index(a_name)]

if b_name == 'ER':
    # ensure ER is only one object
    b = (organelles[organelle_names.index(b_name)] > 0).astype(np.uint16)
else:
    b = organelles[organelle_names.index(b_name)]

> ###### **📝 In the finalized prototype function, this operation will differ based on the value of the `include_contact_dist` input. It is set to true here resulting in the inclusion of the distribution data of lyso and ER interaction sites.**

- run `get_contact_metrics_3D`

In [None]:
contact_tab, contact_dist_tab = get_contact_metrics_3D(a, a_name, 
                                                       b, b_name, 
                                                       mask,
                                                       scale, 
                                                       include_dist = include_contact_dist,
                                                       dist_centering_obj=centering,
                                                       dist_num_bins=dist_num_bins,
                                                       dist_zernike_degrees=dist_zernike_degrees,
                                                       dist_center_on=dist_center_on,
                                                       dist_keep_center_as_bin=dist_keep_center_as_bin)

# Adds distribution data to containter
dist_tabs.append(contact_dist_tab)

# Adds interaction data to container
contact_tabs.append(contact_tab)

## **`6` - Combine all tables into four inter-organelle tables**

In [None]:
# Cell region morphology table
test_final_region_tab = pd.concat(region_tabs, ignore_index=True)
test_final_region_tab.insert(loc=0,column='image_name',value=raw_img_name.stem)

# Organelle morphology table
test_final_org_tab = pd.concat(org_tabs, ignore_index=True)
test_final_org_tab.insert(loc=0,column='image_name',value=raw_img_name.stem)

# Organelle interaction table
test_final_contact_tab = pd.concat(contact_tabs, ignore_index=True)
test_final_contact_tab.insert(loc=0,column='image_name',value=raw_img_name.stem)

# Distribution metrics table
test_combined_dist_tab = pd.concat(dist_tabs, ignore_index=True)
test_combined_dist_tab.insert(loc=0,column='image_name',value=raw_img_name.stem)

In [None]:
test_final_region_tab

In [None]:
test_final_org_tab

In [None]:
test_final_contact_tab

In [None]:
test_combined_dist_tab

## **Define `_make_all_metrics_tables` function**

In [None]:
# organelle_to_colname = {"nuc":"NU", "lyso": "LY", "mito":"MT", "golgi":"GL", "perox":"PR", "ER":"ER", "LD":"LD", "cell":"CM", "cyto":"CY", "nucleus": "N1","nuclei":"NU",}

def _make_all_metrics_tables(source_file: str,
                             list_obj_names: List[str],
                             list_obj_segs: List[np.ndarray],
                             list_intensity_img: List[np.ndarray],
                             list_region_names: List[str],
                             list_region_segs: List[np.ndarray],
                             mask: str,
                             dist_centering_obj:str, 
                             dist_num_bins: int,
                             dist_center_on: bool=False,
                             dist_keep_center_as_bin: bool=True,
                             dist_zernike_degrees: Union[int, None]=None,
                             scale: Union[tuple,None] = None,
                             include_contact_dist:bool=True):
    """
    Measure the composition, morphology, distribution, and contacts of multiple organelles in a cell

    Parameters:
    ----------
    source_file: str
        file path; this is used for recorder keeping of the file name in the output data tables
    list_obj_names: List[str]
        a list of object names (strings) that will be measured; this should match the order in list_obj_segs
    list_obj_segs: List[np.ndarray]
        a list of 3D (ZYX) segmentation np.ndarrays that will be measured per cell; the order should match the list_obj_names 
    list_intensity_img: List[np.ndarray]
        a list of 3D (ZYX) grayscale np.ndarrays that will be used to measure fluoresence intensity in each region and object
    list_region_names: List[str]
        a list of region names (strings); these should include the mask (entire region being measured - usually the cell) 
        and other sub-mask regions from which we can meausure the objects in (ex - nucleus, neurites, soma, etc.). It should 
        also include the centering object used when created the XY distribution bins.
        The order should match the list_region_segs
    list_region_segs: List[np.ndarray]
        a list of 3D (ZYX) binary np.ndarrays of the region masks; the order should match the list_region_names.
    mask: str
        a str of which region name (contained in the list_region_names list) should be used as the main mask (e.g., cell mask)
    dist_centering_obj:str
        a str of which region name (contained in the list_region_names list) should be used as the centering object in 
        get_XY_distribution()
    dist_num_bins: int
        the number of concentric rings to draw between the centering object and edge of the mask in get_XY_distribution()
    dist_center_on: bool=False,
        for get_XY_distribution:
        True = distribute the bins from the center of the centering object
        False = distribute the bins from the edge of the centering object
    dist_keep_center_as_bin: bool=True
        for get_XY_distribution:
        True = include the centering object area when creating the bins
        False = do not include the centering object area when creating the bins
    dist_zernike_degrees: Union[int, None]=None
        for get_XY_distribution:
        the number of zernike degrees to include for the zernike shape descriptors; if None, the zernike measurements will not 
        be included in the output
    scale: Union[tuple,None] = None
        a tuple that contains the real world dimensions for each dimension in the image (Z, Y, X)
    include_contact_dist:bool=True
        whether to include the distribution of contact sites in get_contact_metrics_3d(); True = include contact distribution

    Returns:
    ----------
    4 Dataframes of measurements of organelle morphology, region morphology, contact morphology, and organelle/contact distributions

    """
    start = time.time()
    count = 0

    # segmentation image for all masking steps below
    mask = list_region_segs[list_region_names.index(mask)]

    ######################
    # measure cell regions
    ######################
    # create np.ndarray of intensity images
    raw_image = np.stack(list_intensity_img)
    
    # container for region data
    region_tabs = []
    for r, r_name in enumerate(list_region_names):
        region = list_region_segs[r]
        region_metrics = get_region_morphology_3D(region_seg=region, 
                                                  region_name=r_name,
                                                  channel_names=list_obj_names,
                                                  intensity_img=raw_image, 
                                                  mask=mask,
                                                  scale=scale)
        region_tabs.append(region_metrics)

    ##############################################################
    # loop through all organelles to collect measurements for each
    ##############################################################
    # containers to collect per organelle information
    org_tabs = []
    dist_tabs = []
    XY_bins = []
    XY_wedges = []

    for j, target in enumerate(list_obj_names):
        # organelle intensity image
        org_img = list_intensity_img[j]

        # organelle segmentation
        if target == 'ER':
            # ensure ER is only one object
            org_obj = (list_obj_segs[j] > 0).astype(np.uint16)
        else:
            org_obj = list_obj_segs[j]

        ##########################################################
        # measure organelle morphology & number of objs contacting
        ##########################################################
        org_metrics = get_org_morphology_3D(segmentation_img=org_obj, 
                                            seg_name=target,
                                            intensity_img=org_img, 
                                            mask=mask,
                                            scale=scale)

        org_tabs.append(org_metrics)

        ################################
        # measure organelle distribution 
        ################################
        centering = list_region_segs[list_region_names.index(dist_centering_obj)]
        XY_org_distribution, XY_bin_masks, XY_wedge_masks = get_XY_distribution(mask=mask,
                                                                                centering_obj=centering,
                                                                                obj=org_obj,
                                                                                obj_name=target,
                                                                                scale=scale,
                                                                                num_bins=dist_num_bins,
                                                                                center_on=dist_center_on,
                                                                                keep_center_as_bin=dist_keep_center_as_bin,
                                                                                zernike_degrees=dist_zernike_degrees)
        Z_org_distribution = get_Z_distribution(mask=mask, 
                                                obj=org_obj,
                                                obj_name=target,
                                                center_obj=centering,
                                                scale=scale)
        
        org_distribution_metrics = pd.merge(XY_org_distribution, Z_org_distribution,on=["object", "scale"])

        dist_tabs.append(org_distribution_metrics)
        XY_bins.append(XY_bin_masks)
        XY_wedges.append(XY_wedge_masks)

    #######################################
    # collect non-redundant contact metrics 
    #######################################
    # list the non-redundant organelle pairs
    contact_combos = list(itertools.combinations(list_obj_names, 2))

    # container to keep contact data in
    contact_tabs = []

    # loop through each pair and measure contacts
    for pair in contact_combos:
        # pair names
        a_name = pair[0]
        b_name = pair[1]

        # segmentations to measure
        if a_name == 'ER':
            # ensure ER is only one object
            a = (list_obj_segs[list_obj_names.index(a_name)] > 0).astype(np.uint16)
        else:
            a = list_obj_segs[list_obj_names.index(a_name)]
        
        if b_name == 'ER':
            # ensure ER is only one object
            b = (list_obj_segs[list_obj_names.index(b_name)] > 0).astype(np.uint16)
        else:
            b = list_obj_segs[list_obj_names.index(b_name)]
        

        if include_contact_dist == True:
            contact_tab, contact_dist_tab = get_contact_metrics_3D(a, a_name, 
                                                                   b, b_name, 
                                                                   mask, 
                                                                   scale, 
                                                                   include_dist=include_contact_dist,
                                                                   dist_centering_obj=centering,
                                                                   dist_num_bins=dist_num_bins,
                                                                   dist_zernike_degrees=dist_zernike_degrees,
                                                                   dist_center_on=dist_center_on,
                                                                   dist_keep_center_as_bin=dist_keep_center_as_bin)
            dist_tabs.append(contact_dist_tab)
        else:
            contact_tab = get_contact_metrics_3D(a, a_name, 
                                                 b, b_name, 
                                                 mask, 
                                                 scale, 
                                                 include_dist=include_contact_dist)
        contact_tabs.append(contact_tab)


    ###########################################
    # combine all tabs into one table per type:
    ###########################################
    final_org_tab = pd.concat(org_tabs, ignore_index=True)
    final_org_tab.insert(loc=0,column='image_name',value=source_file.stem)

    final_contact_tab = pd.concat(contact_tabs, ignore_index=True)
    final_contact_tab.insert(loc=0,column='image_name',value=source_file.stem)

    combined_dist_tab = pd.concat(dist_tabs, ignore_index=True)
    combined_dist_tab.insert(loc=0,column='image_name',value=source_file.stem)

    final_region_tab = pd.concat(region_tabs, ignore_index=True)
    final_region_tab.insert(loc=0,column='image_name',value=source_file.stem)

    end = time.time()
    print(f"It took {(end-start)/60} minutes to quantify one image.")
    return final_org_tab, final_contact_tab, combined_dist_tab, final_region_tab

## **Run `_get_org_morphology_3D` function**

In [None]:
# TODO: things to fix - 
# figure out what is causing the convex hull error - I'm guessing major axis
organelle_names = ["LD","ER","golgi","lyso","mito","perox"]
organelles = [LD_seg, ER_seg, golgi_seg, lyso_seg, mito_seg, perox_seg]
organelle_channels = [0,1,2,3,4,5]
intensities = [raw_img_data[ch] for ch in organelle_channels]
regions = [cell_seg, cyto_seg, nuc_seg]
region_names = ['cell', 'cyto', 'nuc']

test_final_org_tab, test_final_contact_tab, test_combined_dist_tab, test_final_regions_tab = _make_all_metrics_tables(source_file=raw_img_name,
                                                                                                                      list_obj_names=organelle_names,
                                                                                                                      list_obj_segs=organelles,
                                                                                                                      list_intensity_img=intensities,
                                                                                                                      list_region_names=region_names,
                                                                                                                      list_region_segs=regions,
                                                                                                                      mask='cell',
                                                                                                                      dist_centering_obj='nuc',
                                                                                                                      dist_num_bins=5,
                                                                                                                      dist_center_on=False,
                                                                                                                      dist_keep_center_as_bin=True,
                                                                                                                      dist_zernike_degrees=9,
                                                                                                                      scale=scale,
                                                                                                                      include_contact_dist=True)

import warnings
warnings.simplefilter('ignore')

## **Compare to finalized `get_org_morphology_3D` function**

In [None]:
from infer_subc.utils.stats_helpers import make_all_metrics_tables

official_org, official_contacts, official_dist, official_regions = make_all_metrics_tables(source_file=raw_img_name,
                                                                                           list_obj_names=organelle_names,
                                                                                           list_obj_segs=organelles,
                                                                                           list_intensity_img=intensities,
                                                                                           list_region_names=region_names,
                                                                                           list_region_segs=regions,
                                                                                           mask='cell',
                                                                                           dist_centering_obj='nuc',
                                                                                           dist_num_bins=5,
                                                                                           dist_center_on=False,
                                                                                           dist_keep_center_as_bin=True,
                                                                                           dist_zernike_degrees=9,
                                                                                           scale=scale,
                                                                                           include_contact_dist=True)

test_final_org_tab.equals(official_org), test_final_contact_tab.equals(official_contacts), test_combined_dist_tab.equals(official_dist), test_final_regions_tab.equals(official_regions)

In [None]:
test_combined_dist_tab.compare(official_dist)

# ***PART 2️⃣: Batch Process Quantification***

## **`0` - Establish the image data paths *(preliminary step)***

In [None]:
# all the imaging data goes here
data_root_path = Path(os.getcwd()).parents[1] / "sample_data" /  "batch_example"

# linearly unmixed ".czi" files are here
raw_data_path = data_root_path / "raw"

# list of lineary unmixed ".czi" files
raw_file_list = list_image_files(raw_data_path,".tiff")

# adding an additional list of image paths for the matching segmentation files
seg_data_path = data_root_path / "seg"
seg_file_list = list_image_files(seg_data_path, ".tiff")

# changing output directory for this notebook to a new folder called "quant"
out_data_path = data_root_path / "quant"
if not Path.exists(out_data_path):
    Path.mkdir(out_data_path)
    print(f"making {out_data_path}")

raw_file_list, seg_file_list

## **`1` - Locate the matching raw and segmentation files (*`_find_segmentation_tiff_files`*)**


- identify protoype image and segmentations to collect

In [None]:
# linearly unmixed image used as an example
prototype = raw_file_list[0]

# suffix of the corresponding segmentations
test_suffix = "-20250204_ex-"

# the segmentations that are to be identified
segs_to_cellect_test = ["lyso","ER","masks"]

- locate folders for raw image and segmentations

In [None]:
# raw
prototype = Path(prototype)
if not prototype.exists():
    print(f"bad prototype. please choose an existing `raw` file as prototype")

out_files = {"raw":prototype}
seg_path = Path(seg_data_path)

# segmentations
if not seg_path.is_dir():
    print(f"bad path argument. please choose an existing path containing organelle segmentations")

- collect all listed region and segmentation filenames

In [None]:
for org_n in segs_to_cellect_test:
    org_name = Path(seg_path) / f"{prototype.stem}{test_suffix}{org_n}.tiff"
    if org_name.exists(): 
        out_files[org_n] = org_name
    else: 
        print(f"{org_n} .tiff file not found in {seg_path} returning")
        out_files[org_n] = None

In [None]:
def _find_segmentation_tiff_files(prototype:Union[Path,str],
                                  name_list:List[str], 
                                  seg_path:Union[Path,str],
                                  suffix:Union[str, None]=None) -> Dict:
    """
    Find the matching segmentation files to the raw image file based on the raw image file path.

    Paramters:
    ---------
    prototype:Union[Path,str]
        the file path (as a string) for one raw image file; this file should have matching segmentation 
        output files with the same file name root and different file name ending that match the strings 
        provided in name_list
    name_list:List[str]
        a list of file name endings related to what segmentation is that file
    seg_path:Union[Path,str]
        the path (as a string) to the matching segmentation files.
    suffix:Union[str, None]=None
        any additional text that exists between the file root and the name_list ending
        Ex) Prototype = "C:/Users/Shannon/Documents/Python_Scripts/Infer-subc/raw/a48hrs-Ctrl_9_Unmixing.czi"
            Name of organelle file = a48hrs-Ctrl_9_Unmixing-20230426_test_cell.tiff
            result of .stem = "a48hrs-Ctrl_9_Unmixing"
            organelle/cell area type = "cell"
            suffix = "-20230426_test_"
    
    Returns:
    ----------
    a dictionary of file paths for each image type (raw and all the different segmentations)

    """
    # raw
    prototype = Path(prototype)
    if not prototype.exists():
        print(f"bad prototype. please choose an existing `raw` file as prototype")
        return dict()

    out_files = {"raw":prototype}
    seg_path = Path(seg_path) 

    # raw
    if not seg_path.is_dir():
        print(f"bad path argument. please choose an existing path containing organelle segmentations")
        return out_files

    # segmentations
    for org_n in name_list:
        org_name = Path(seg_path) / f"{prototype.stem}{suffix}{org_n}.tiff"
        if org_name.exists(): 
            out_files[org_n] = org_name
        elif org_name.exists() == False: 
            org_name = Path(seg_path) / f"{prototype.stem}{suffix}{org_n}.tif"
            out_files[org_n] = org_name
        else: 
            print(f"{org_n} .tiff file not found in {seg_path} returning")
            out_files[org_n] = None
    
    return out_files

In [None]:
prototype = raw_file_list[0]
segs_to_cellect_test = ["LD","ER","golgi","lyso","mito","perox","masks"] #"cell", "cyto", "nuc"]
test_suffix = "-20250204_ex-"

filez = _find_segmentation_tiff_files(prototype, segs_to_cellect_test, seg_data_path, test_suffix)

filez

In [None]:
from infer_subc.utils.batch import find_segmentation_tiff_files

filez_final = find_segmentation_tiff_files(prototype, segs_to_cellect_test, seg_data_path, test_suffix)

filez==filez_final

## **`2` - Establish the parameters for prototype `_batch_process_quantification` function**

> ###### **📝 The lysosome, ER, nucleus and cellmask will serve as examples in the following cells. In the final function, quantification is performed for all listed organelles and regions**

In [None]:
# names of organelles and regions to quantify
organelle_names = ["lyso", "ER"]
region_names = ["nuc","cell"]

# channels for each organelle listed in the order they appear above
organelle_channels = [3,1]

# The file type of the linearly unmixed raw image files 
raw_file_type=".tiff"

# The name of the masks segmentation (can differ based on mask inferring method)
masks_file_name="masks"

# select the cellmask as the masking object for the organelle segmentations
# This is done differently in the function, but is simplified here
mask = "cell"

# Number of bins to be used during calculation of distribution metrics
dist_num_bins=5

# If set to true the bins will be distributed from the center of the centering object to the edge of the cellmask
# If set to false the bins will be distributed from the edge of the centering object to the edge of the cellmask
dist_center_on=False

# Whether or not to include the centering object as the first bin
dist_keep_center_as_bin=True

# the number of zernike degrees to include for the zernike shape descriptors
dist_zernike_degrees=9

# The centering object to measure cell distribution from
dist_centering_obj="nuc"

# Whether or not to include distribution data for the interaction sites
include_contact_dist=True

# filename for 
current_time = datetime.now().strftime("%Y%m%d")
out_file_name = f"{current_time}_prototype_"

## **`3` - List raw image files and their corresponding segmentations and to collect**

In [None]:
# reading list of files from the raw path
img_file_list = list_image_files(raw_data_path, raw_file_type)

# Establishing the third image as the prototype
img_f = img_file_list[0]

# list of segmentation files to collect
segs_to_collect = organelle_names + [masks_file_name]
print(segs_to_collect)

## **`4` - Obtain segmentations and raw image files**

> ###### **📝 The third image in the image file list will serve as the prototype in the following cells. In the final function, quantification is performed for all suitable images (based on file type) in the raw data folder**

- run `find_segmentation_tiff_files` to collect segmentation filenames

In [None]:
filez = find_segmentation_tiff_files(img_f, segs_to_collect, seg_path, test_suffix)

- read in linearly unmixed image file *(raw)*

In [None]:
# read in raw file and metadata
img_data, meta_dict = read_czi_image(filez["raw"])

- collect the listed organelle channel intensities from the raw image file

In [None]:
# create intensities from raw file as list based on the channel order provided
intensities = [img_data[ch] for ch in organelle_channels]

- identify the scale of the raw image from the metadata

In [None]:
# define the scale
# In the finalized function this is an option based on the "scale" boolean parameter
scale_tup = meta_dict['scale']

- read in organelle and region segmentations

In [None]:
# load regions as a list based on order in list (should match order in "masks" file)
masks = read_tiff_image(filez[masks_file_name]) 
regions = [masks[r] for r, region in enumerate(region_names)]

# store organelle images as list
organelles = [read_tiff_image(filez[org]) for org in organelle_names]

## **`5` - Run `make_all_metrics_tables` and store output**

- run `make_all_metrics_tables` for the prototype image

In [None]:
org_metrics, contact_metrics, dist_metrics, region_metrics = make_all_metrics_tables(source_file=img_f,
                                                                                             list_obj_names=organelle_names,
                                                                                             list_obj_segs=organelles,
                                                                                             list_intensity_img=intensities, 
                                                                                             list_region_names=region_names,
                                                                                             list_region_segs=regions, 
                                                                                             mask=mask,
                                                                                             dist_centering_obj=dist_centering_obj,
                                                                                             dist_num_bins=dist_num_bins,
                                                                                             dist_center_on=dist_center_on,
                                                                                             dist_keep_center_as_bin=dist_keep_center_as_bin,
                                                                                             dist_zernike_degrees=dist_zernike_degrees,
                                                                                             scale=scale_tup,
                                                                                             include_contact_dist=include_contact_dist)

In [None]:
org_metrics

In [None]:
contact_metrics

In [None]:
dist_metrics

In [None]:
region_metrics

- repeat steps **`3`** and **`4`** and run `make_all_metrics_tables` for all images in `img_file_list`

In [None]:
# containers to collect data tabels
org_tabs = []
contact_tabs = []
dist_tabs = []
region_tabs = []

for img_f in img_file_list:
    filez = find_segmentation_tiff_files(img_f, segs_to_collect, seg_path, test_suffix)

    # read in raw file and metadata
    img_data, meta_dict = read_czi_image(filez["raw"])

    # create intensities from raw file as list based on the channel order provided
    intensities = [img_data[ch] for ch in organelle_channels]

    # define the scale
    # if scale is True:
    if scale:
        scale_tup = meta_dict['scale']
    else:
        scale_tup = None

    # load regions as a list based on order in list (should match order in "masks" file)
    masks = read_tiff_image(filez[masks_file_name]) 
    regions = [masks[r] for r, region in enumerate(region_names)] #TODO: add in option for multiple mask files

    # store organelle images as list
    organelles = [read_tiff_image(filez[org]) for org in organelle_names]

    org_metrics, contact_metrics, dist_metrics, region_metrics = make_all_metrics_tables(source_file=img_f,
                                                                                        list_obj_names=organelle_names,
                                                                                        list_obj_segs=organelles,
                                                                                        list_intensity_img=intensities, 
                                                                                        list_region_names=region_names,
                                                                                        list_region_segs=regions, 
                                                                                        mask=mask,
                                                                                        dist_centering_obj=dist_centering_obj,
                                                                                        dist_num_bins=dist_num_bins,
                                                                                        dist_center_on=dist_center_on,
                                                                                        dist_keep_center_as_bin=dist_keep_center_as_bin,
                                                                                        dist_zernike_degrees=dist_zernike_degrees,
                                                                                        scale=scale_tup,
                                                                                        include_contact_dist=include_contact_dist)
    org_tabs.append(org_metrics)
    contact_tabs.append(contact_metrics)
    dist_tabs.append(dist_metrics)
    region_tabs.append(region_metrics)

In [None]:
read_tiff_image(filez[masks_file_name]) 

## **`6` - Combine all per image tables into four comprehensive tables**

- batch organelle morphology table

In [None]:
final_org = pd.concat(org_tabs, ignore_index=True)
final_org

- batch organelle interactions table

In [None]:
final_contact = pd.concat(contact_tabs, ignore_index=True)
final_contact

- batch distribution measurments table

In [None]:
final_dist = pd.concat(dist_tabs, ignore_index=True)
final_dist

- batch cell region morphology table

In [None]:
final_region = pd.concat(region_tabs, ignore_index=True)
final_region

## **`7` - Export comprehensive tables as .csv files**

In [None]:
org_csv_path = out_data_path / f"{out_file_name}_organelles.csv"
final_org.to_csv(org_csv_path)

contact_csv_path = out_data_path / f"{out_file_name}_contacts.csv"
final_contact.to_csv(contact_csv_path)

dist_csv_path = out_data_path / f"{out_file_name}_distributions.csv"
final_dist.to_csv(dist_csv_path)

region_csv_path = out_data_path / f"{out_file_name}_regions.csv"
final_region.to_csv(region_csv_path)

## **Define `_batch_process_quantification` function**

In [None]:
# for a list of "prefixes"  collect stats + cross stats masked by cytosol (including nuclei masked by cellmask)

# NOTE: the convex hull regionprops error is a know issue that occurs when the objects being measured have too few voxels. 
# Here's the github reference:https://github.com/scikit-image/scikit-image/issues/5363

warnings.simplefilter("ignore", UserWarning)
warnings.simplefilter("ignore", RuntimeWarning)

def _batch_process_quantification(out_file_name: str,
                                  seg_path: Union[Path,str],
                                  out_path: Union[Path, str], 
                                  raw_path: Union[Path,str], 
                                  raw_file_type: str,
                                  organelle_names: List[str],
                                  organelle_channels: List[int],
                                  region_names: List[str],
                                  masks_file_name: str,
                                  mask: str,
                                  dist_centering_obj:str, 
                                  dist_num_bins: int,
                                  dist_center_on: bool=False,
                                  dist_keep_center_as_bin: bool=True,
                                  dist_zernike_degrees: Union[int, None]=None,
                                  include_contact_dist: bool = True,
                                  scale:bool=True,
                                  seg_suffix:Union[str, None]=None) -> int :
    """  
    batch process segmentation quantification (morphology, distribution, contacts); this function is currently optimized to process images from one file folder per image type (e.g., raw, segmentation)
    the output csv files are saved to the indicated out_path folder

    Parameters:
    ----------
    out_file_name: str
        the prefix to use when naming the output datatables
    seg_path: Union[Path,str]
        Path or str to the folder that contains the segmentation tiff files
    out_path: Union[Path, str]
        Path or str to the folder that the output datatables will be saved to
    raw_path: Union[Path,str]
        Path or str to the folder that contains the raw image files
    raw_file_type: str
        the file type of the raw data; ex - ".tiff", ".czi"
    organelle_names: List[str]
        a list of all organelle names that will be analyzed; the names should be the same as the suffix used to name each of the tiff segmentation files
        Note: the intensity measurements collect per region (from get_region_morphology_3D function) will only be from channels associated to these organelles 
    organelle_channels: List[int]
        a list of channel indices associated to respective organelle staining in the raw image; the indices should listed in same order in which the respective segmentation name is listed in organelle_names
    region_names: List[str]
        a list of regions, or masks, to measure; the order should correlate to the order of the channels in the "masks" output segmentation file
    masks_file_name: str
        the suffix of the "masks" segmentation file; ex- "masks_B", "masks", etc.
        this function currently does not accept indivial region segmentations 
    mask: str
        the name of the region to use as the mask when measuring the organelles; this should be one of the names listed in regions list; usually this will be the "cell" mask
    dist_centering_obj:str
        the name of the region or object to use as the centering object in the get_XY_distribution function
    dist_num_bins: int
        the number of bins for the get_XY_distribution function
    dist_center_on: bool=False,
        for get_XY_distribution:
        True = distribute the bins from the center of the centering object
        False = distribute the bins from the edge of the centering object
    dist_keep_center_as_bin: bool=True
        for get_XY_distribution:
        True = include the centering object area when creating the bins
        False = do not include the centering object area when creating the bins
    dist_zernike_degrees: Union[int, None]=None
        for get_XY_distribution:
        the number of zernike degrees to include for the zernike shape descriptors; if None, the zernike measurements will not 
        be included in the output
    include_contact_dist:bool=True
        whether to include the distribution of contact sites in get_contact_metrics_3d(); True = include contact distribution
    scale:bool=True
        a tuple that contains the real world dimensions for each dimension in the image (Z, Y, X)
    seg_suffix:Union[str, None]=None
        any additional text that is included in the segmentation tiff files between the file stem and the segmentation suffix


    Returns:
    ----------
    count: int
        the number of images processed
        
    """
    start = time.time()
    count = 0

    if isinstance(raw_path, str): raw_path = Path(raw_path)
    if isinstance(seg_path, str): seg_path = Path(seg_path)
    if isinstance(out_path, str): out_path = Path(out_path)
    
    if not Path.exists(out_path):
        Path.mkdir(out_path)
        print(f"making {out_path}")
    
    # reading list of files from the raw path
    img_file_list = list_image_files(raw_path, raw_file_type)

    # list of segmentation files to collect
    segs_to_collect = organelle_names + [masks_file_name]

    # containers to collect data tabels
    org_tabs = []
    contact_tabs = []
    dist_tabs = []
    region_tabs = []
    for img_f in img_file_list:
        count = count + 1
        filez = find_segmentation_tiff_files(img_f, segs_to_collect, seg_path, seg_suffix)

        # read in raw file and metadata
        img_data, meta_dict = read_czi_image(filez["raw"])

        # create intensities from raw file as list based on the channel order provided
        intensities = [img_data[ch] for ch in organelle_channels]

        # define the scale
        # if scale is True:
        if scale:
            scale_tup = meta_dict['scale']
        else:
            scale_tup = None

        # load regions as a list based on order in list (should match order in "masks" file)
        masks = read_tiff_image(filez[masks_file_name]) 
        regions = [masks[r] for r, region in enumerate(region_names)] #TODO: add in option for multiple mask files

        # store organelle images as list
        organelles = [read_tiff_image(filez[org]) for org in organelle_names]

        org_metrics, contact_metrics, dist_metrics, region_metrics = make_all_metrics_tables(source_file=img_f,
                                                                                             list_obj_names=organelle_names,
                                                                                             list_obj_segs=organelles,
                                                                                             list_intensity_img=intensities, 
                                                                                             list_region_names=region_names,
                                                                                             list_region_segs=regions, 
                                                                                             mask=mask,
                                                                                             dist_centering_obj=dist_centering_obj,
                                                                                             dist_num_bins=dist_num_bins,
                                                                                             dist_center_on=dist_center_on,
                                                                                             dist_keep_center_as_bin=dist_keep_center_as_bin,
                                                                                             dist_zernike_degrees=dist_zernike_degrees,
                                                                                             scale=scale_tup,
                                                                                             include_contact_dist=include_contact_dist)

        org_tabs.append(org_metrics)
        contact_tabs.append(contact_metrics)
        dist_tabs.append(dist_metrics)
        region_tabs.append(region_metrics)
        end2 = time.time()
        print(f"Completed processing for {count} images in {(end2-start)/60} mins.")

    final_org = pd.concat(org_tabs, ignore_index=True)
    final_contact = pd.concat(contact_tabs, ignore_index=True)
    final_dist = pd.concat(dist_tabs, ignore_index=True)
    final_region = pd.concat(region_tabs, ignore_index=True)

    org_csv_path = out_path / f"{out_file_name}_organelles.csv"
    final_org.to_csv(org_csv_path)

    contact_csv_path = out_path / f"{out_file_name}_contacts.csv"
    final_contact.to_csv(contact_csv_path)

    dist_csv_path = out_path / f"{out_file_name}_distributions.csv"
    final_dist.to_csv(dist_csv_path)

    region_csv_path = out_path / f"{out_file_name}_regions.csv"
    final_region.to_csv(region_csv_path)

    end = time.time()
    print(f"Quantification for {count} files is COMPLETE! Files saved to '{out_path}'.")
    print(f"It took {(end - start)/60} minutes to quantify these files.")
    return count

## **Run `_batch_process_quantification` function**

In [None]:
# all the imaging data goes here.
data_root_path = Path(os.getcwd()).parents[1] / "sample_data" /  "batch_example"
# linearly unmixed ".czi" files are here
raw_data_path = data_root_path / "raw"
# save output ".tiff" files here
seg_data_path = data_root_path / "out"
seg_suffix = "-20250204_ex-"
# save stats here
out_data_path = data_root_path / "quant"

# names of organelles and regions to quantify
organelle_names = ["LD","ER","golgi","lyso","mito","perox"]
region_names = ["nuc", "cell", "cyto"]

# channels for each organelle listed in the order they appear above
organelle_channels = [0,1,2,3,4,5]

n_files = _batch_process_quantification(out_file_name = "example_prototype",
                                  seg_path=seg_data_path,
                                  out_path=out_data_path, 
                                  raw_path=raw_data_path, 
                                  raw_file_type=".tiff",
                                  organelle_names=organelle_names,
                                  organelle_channels=organelle_channels,
                                  region_names=region_names,
                                  masks_file_name="masks",
                                  mask="cell",
                                  dist_centering_obj="nuc", 
                                  dist_num_bins=5,
                                  dist_center_on=False,
                                  dist_keep_center_as_bin=True,
                                  dist_zernike_degrees=9,
                                  include_contact_dist=True,
                                  scale=True,
                                  seg_suffix=seg_suffix)