# &#x1F50D; Checking segmentation outputs from Organelle-Segmenter-Plugin

### &#x1F4D6; **How to:** 

Advance through each block of code sequentially by pressing `Shift`+`Enter`.

If a block of code contains &#x1F53D; follow the written instructions to fill in the blanks below that line before running it.
```python
#### USER INPUT REQUIRED ###
``` 

________________
## 	**INPUTS**

#### &#x1F3C3; **Run code; no user input required**

&#x1F453; **FYI:** This code block loads all of the necessary python packages and functions you will need for this notebook. Additionally, a [Napari](https://napari.org/stable/) window will open; this is where you will be able to visual the segmentations.

In [1]:
# top level imports
from pathlib import Path
import os, sys
from typing import Optional, Union, Dict, List
import itertools 
import glob

import warnings
import time

import numpy as np
import pandas as pd

import napari

### import local python functions in ../infer_subc
sys.path.append(os.path.abspath((os.path.join(os.getcwd(), '..'))))

from infer_subc.core.file_io import (read_czi_image,
                                        export_inferred_organelle,
                                        import_inferred_organelle,
                                        export_tiff,
                                        list_image_files,
                                        read_tiff_image)



from infer_subc.constants import *
from infer_subc.utils.stats import *
from infer_subc.utils.stats_helpers import *
from infer_subc.utils.stats import _assert_uint16_labels
from infer_subc.core.img import label_uint16


import time
%load_ext autoreload
%autoreload 2

viewer = napari.Viewer()

# TODO: include the file_type option in the main import_inferred_organelle() function
def _import_inferred_organelle(name: str, meta_dict: Dict, out_data_path: Path, file_type: str) -> Union[np.ndarray, None]:
    """
    read inferred organelle from ome.tif file

    Parameters
    ------------
    name: str
        name of organelle.  i.e. nuc, lyso, etc.
    meta_dict:
        dictionary of meta-data (ome) from original file
    out_data_path:
        Path object of directory where tiffs are read from
    file_type: 
        The type of file you want to import as a string (ex - ".tif", ".tiff", ".czi", etc.)

    Returns
    -------------
    exported file name

    """

    # copy the original file name to meta
    img_name = Path(meta_dict["file_name"])  #
    # add params to metadata
    if name is None:
        pass
    else:
        organelle_fname = f"{img_name.stem}-{name}{file_type}"

        organelle_path = out_data_path / organelle_fname

        if Path.exists(organelle_path):
            # organelle_obj, _meta_dict = read_ome_image(organelle_path)
            organelle_obj = read_tiff_image(organelle_path)  # .squeeze()
            print(f"loaded  inferred {len(organelle_obj.shape)}D `{name}`  from {out_data_path} ")
            return organelle_obj
        else:
            print(f"`{name}` object not found: {organelle_path}")
            raise FileNotFoundError(f"`{name}` object not found: {organelle_path}")

__________________________
## **QUALITY CHECK OF SEGMENTATIONS**

#### &#x1F6D1; &#x270D; **User Input Required:**

In [2]:
#### USER INPUT REQUIRED ###
# Copy and paste the paths to the folders where your data is saved inside the quotation marks below. 
# If you have more than one segmentation data folder, include it in the segmentation_data_2 line. If not, type None wihtout quotation marks
# NOTE: for windows, use "/" 
raw_data = "D:/Experiments (C2-117 - current)/C2-121/C2-121_deconvolution"
segmentation_data = "D:/Experiments (C2-117 - current)/C2-121/20230921_C2-121_3D-analysis/20230921_C2-121_segmentation"

location_tosave_edited_segmentations = "D:/Experiments (C2-117 - current)/C2-121/20230921_C2-121_3D-analysis/20240102_C2-121_segmentation-edits"
location_tosave_fullset_gooddata = "D:/Experiments (C2-117 - current)/C2-121/20230921_C2-121_3D-analysis/C2-121_good-segs"

# In quotation marks, include the extension of the file type for your SEGMENTATION and RAW images
raw_file_type = ".tiff"
seg_file_type = ".tiff"

# In quotation marks, write the suffix associated to each segmentation file. If you don't have that image 
mask_suffix = "masks_A"
lyso_suffix = "lyso"
mito_suffix = "mito"
golgi_suffix = "golgi"
perox_suffix = "perox"
ER_suffix = "ER"
LD_suffix = "LD"

In [3]:
#### Optional - USER INPUT REQUIRED ###
# If your segmentations are saved in more than one folder, fill in the information below about the second file location. If not, type None wihtout quotation marks in all of the lines below.
# Copy and paste the paths to the folders where your data is saved inside the quotation marks below. 
segmentation_data_2 = None

# In quotation marks, write the suffix associated to each segmentation file; if 
mask_suffix_2 = None
lyso_suffix_2 = None
mito_suffix_2 = None
golgi_suffix_2 = None
perox_suffix_2 = None
ER_suffix_2 = None
LD_suffix_2 = None

### &#x1F3C3; **Run code; no user input required**

In [4]:
raw_file_list = list_image_files(Path(raw_data),".tiff")

pd.set_option('display.max_colwidth', None)
pd.DataFrame({"Image Name":raw_file_list})

Unnamed: 0,Image Name
0,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 2_cell 1_50uM NaAsO_Linear unmixing_0_cmle.ome.tiff
1,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 2_cell 2_50uM NaAsO_Linear unmixing_0_cmle.ome.tiff
2,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 2_cell 3_50uM NaAsO_Linear unmixing_0_cmle.ome.tiff
3,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 2_cell 4_50uM NaAsO_Linear unmixing_0_cmle.ome.tiff
4,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 2_cell 5_50uM NaAsO_Linear unmixing_0_cmle.ome.tiff
5,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 3_cell 1_25nM TG_Linear unmixing_0_cmle.ome.tiff
6,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 3_cell 2_25nM TG_Linear unmixing_0_cmle.ome.tiff
7,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 3_cell 3_25nM TG_Linear unmixing_0_cmle.ome.tiff
8,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 3_cell 4_25nM TG_Linear unmixing_0_cmle.ome.tiff
9,D:\Experiments (C2-117 - current)\C2-121\C2-121_deconvolution\20230727_C2-121_conditioned_well 3_cell 5_25nM TG_Linear unmixing_0_cmle.ome.tiff


#### &#x1F6D1; &#x270D; **User Input required:**
&#x1F53C; Use the list  above to determine the index of the image you would like to look at.

In [10]:
#### USER INPUT REQUIRED ###
# Utilizing the list above as reference, change this index number (left column in table) to select a specific image
num = 0

### &#x1F3C3; **Run code; no user input required**

In [11]:
raw_img_data, raw_meta_dict = read_czi_image(raw_file_list[num])
print("Image name:")
print(raw_meta_dict['name'][0].split(" :: ")[0])

mask_seg = _import_inferred_organelle(mask_suffix, raw_meta_dict, Path(segmentation_data), seg_file_type)
lyso_seg = _import_inferred_organelle(lyso_suffix, raw_meta_dict, Path(segmentation_data), seg_file_type)
mito_seg = _import_inferred_organelle(mito_suffix, raw_meta_dict, Path(segmentation_data), seg_file_type)
golgi_seg = _import_inferred_organelle(golgi_suffix, raw_meta_dict, Path(segmentation_data), seg_file_type)
perox_seg = _import_inferred_organelle(perox_suffix, raw_meta_dict, Path(segmentation_data), seg_file_type)
ER_seg = _import_inferred_organelle(ER_suffix, raw_meta_dict, Path(segmentation_data), seg_file_type)
LD_seg = _import_inferred_organelle(LD_suffix, raw_meta_dict, Path(segmentation_data), seg_file_type)

if segmentation_data_2 is not None:
    mask_seg = _import_inferred_organelle(mask_suffix, raw_meta_dict, Path(segmentation_data_2), seg_file_type)
    lyso_seg = _import_inferred_organelle(lyso_suffix, raw_meta_dict, Path(segmentation_data_2), seg_file_type)
    mito_seg = _import_inferred_organelle(mito_suffix, raw_meta_dict, Path(segmentation_data_2), seg_file_type)
    golgi_seg = _import_inferred_organelle(golgi_suffix, raw_meta_dict, Path(segmentation_data_2), seg_file_type)
    perox_seg = _import_inferred_organelle(perox_suffix, raw_meta_dict, Path(segmentation_data_2), seg_file_type)
    ER_seg = _import_inferred_organelle(ER_suffix, raw_meta_dict, Path(segmentation_data_2), seg_file_type)
    LD_seg = _import_inferred_organelle(LD_suffix, raw_meta_dict, Path(segmentation_data_2), seg_file_type)

viewer.layers.clear()
viewer.add_image(raw_img_data[0], name='LD_raw', blending='additive')
viewer.add_image(LD_seg, opacity=0.3, colormap='magenta')
viewer.add_image(raw_img_data[1], name='ER_raw', blending='additive')
viewer.add_image(ER_seg, opacity=0.3, colormap='red')
viewer.add_image(raw_img_data[2], name='GL_raw', blending='additive')
viewer.add_image(golgi_seg, opacity=0.3, colormap='yellow')
viewer.add_image(raw_img_data[3], name='LS_raw', blending='additive')
viewer.add_image(lyso_seg, opacity=0.3, colormap='cyan')
viewer.add_image(raw_img_data[4], name='MT_raw', blending='additive')
viewer.add_image(mito_seg, opacity=0.3, colormap='green')
viewer.add_image(raw_img_data[5], name='PO_raw', blending='additive')
viewer.add_image(perox_seg, opacity=0.3, colormap='bop orange')
viewer.add_image(mask_seg, opacity=0.3)

Image name:
20230727_C2-121_conditioned_well 2_cell 1_50uM NaAsO_Linear unmixing_0_cmle.ome
loaded  inferred 4D `masks_A`  from D:\Experiments (C2-117 - current)\C2-121\20230921_C2-121_3D-analysis\20230921_C2-121_segmentation 
loaded  inferred 3D `lyso`  from D:\Experiments (C2-117 - current)\C2-121\20230921_C2-121_3D-analysis\20230921_C2-121_segmentation 
loaded  inferred 3D `mito`  from D:\Experiments (C2-117 - current)\C2-121\20230921_C2-121_3D-analysis\20230921_C2-121_segmentation 
loaded  inferred 3D `golgi`  from D:\Experiments (C2-117 - current)\C2-121\20230921_C2-121_3D-analysis\20230921_C2-121_segmentation 
loaded  inferred 3D `perox`  from D:\Experiments (C2-117 - current)\C2-121\20230921_C2-121_3D-analysis\20230921_C2-121_segmentation 
loaded  inferred 3D `ER`  from D:\Experiments (C2-117 - current)\C2-121\20230921_C2-121_3D-analysis\20230921_C2-121_segmentation 
loaded  inferred 3D `LD`  from D:\Experiments (C2-117 - current)\C2-121\20230921_C2-121_3D-analysis\20230921_C2-1

<Image layer 'mask_seg' at 0x2831562b580>

### &#x1F6D1; **STOP: Use the `Napari` window to review all of the segmentations for this image.** &#x1F50E;

#### At this point, take note of which segmentations need to be edited, if any. 

#### Once you are finished reviewing the images, continue on to the next sections to 1) Edit the segmentation (if necessary) or 2) Save the final set of segmentations for this image in a new folder. This will make preparing for quantitative analysis much simpler.

__________________________
## **EDITING SEGMENTATIONS**

#### &#x1F6D1; &#x270D; **User Input:**

In [21]:
#### USER INPUT REQUIRED ###
# Indicate which segmentations need editing by typing True. If the segmentations are good and do not need editing, indicate False.
edit_cell = False
edit_nuc = False
edit_LD = False 
edit_ER = False
edit_golgi = False
edit_lyso = False
edit_mito = False
edit_perox = False

### &#x1F3C3; **Run code; no user input required** 
### &#x1F440; **See code block output for instructions**

In [13]:
if edit_cell is True:
    viewer.layers.clear()
    viewer.add_image(raw_img_data)
    viewer.add_labels(mask_seg[1])
    print("Head to the Napari window!")
    print("You can edit your segmentation as needed there.")
    print("Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'")
elif edit_cell is False:
    print("Continue - run the next block of code")
else:
    print("There is an error somewhere!")

Head to the Napari window!
You can edit your segmentation as needed there.
Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'


In [14]:
if edit_nuc is True:
    viewer.layers.clear()
    viewer.add_image(raw_img_data)
    viewer.add_labels(mask_seg[2])
    print("Head to the Napari window!")
    print("You can edit your segmentation as needed there.")
    print("Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'")
elif edit_nuc is False:
    print("Continue - run the next block of code")
else:
    print("There is an error somewhere!")

Continue - run the next block of code


In [15]:
if edit_LD is True:
    viewer.layers.clear()
    viewer.add_image(raw_img_data)
    viewer.add_labels(LD_seg)
    print("Head to the Napari window!")
    print("You can edit your segmentation as needed there.")
    print("Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'")
elif edit_LD is False:
    print("Continue - run the next block of code")
else:
    print("There is an error somewhere!")

Continue - run the next block of code


In [16]:
if edit_ER is True:
    viewer.layers.clear()
    viewer.add_image(raw_img_data)
    viewer.add_labels(ER_seg)
    print("Head to the Napari window!")
    print("You can edit your segmentation as needed there.")
    print("Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'")
elif edit_ER is False:
    print("Continue - run the next block of code")
else:
    print("There is an error somewhere!")

Continue - run the next block of code


In [17]:
if edit_golgi is True:
    viewer.layers.clear()
    viewer.add_image(raw_img_data)
    viewer.add_labels(golgi_seg)
    print("Head to the Napari window!")
    print("You can edit your segmentation as needed there.")
    print("Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'")
elif edit_golgi is False:
    print("Continue - run the next block of code")
else:
    print("There is an error somewhere!")

Continue - run the next block of code


In [18]:
if edit_lyso is True:
    viewer.layers.clear()
    viewer.add_image(raw_img_data)
    viewer.add_labels(lyso_seg)
    print("Head to the Napari window!")
    print("You can edit your segmentation as needed there.")
    print("Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'")
elif edit_lyso is False:
    print("Continue - run the next block of code")
else:
    print("There is an error somewhere!")

Continue - run the next block of code


In [19]:
if edit_mito is True:
    viewer.layers.clear()
    viewer.add_image(raw_img_data)
    viewer.add_labels(mito_seg)
    print("Head to the Napari window!")
    print("You can edit your segmentation as needed there.")
    print("Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'")
elif edit_mito is False:
    print("Continue - run the next block of code")
else:
    print("There is an error somewhere!")

Continue - run the next block of code


In [20]:
if edit_perox is True:
    viewer.layers.clear()
    viewer.add_image(raw_img_data)
    viewer.add_labels(perox_seg)
    print("Head to the Napari window!")
    print("You can edit your segmentation as needed there.")
    print("Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'")
elif edit_perox is False:
    print("Continue - run the next block of code")
else:
    print("There is an error somewhere!")

Continue - run the next block of code


__________________
# **SAVE ALL CORRECT SEGMENTATIONS** - into one folder for quantification

In [23]:
if edit_cell is True:
    cell_seg = _import_inferred_organelle("cell", raw_meta_dict, location_tosave_edited_segmentations, seg_file_type)
    out_file_n = export_inferred_organelle(cell_seg, "cell", raw_meta_dict, Path(location_tosave_fullset_gooddata))
elif edit_cell is False:
    cell_seg = mask_seg[1]
    out_file_n = export_inferred_organelle(cell_seg, "cell", raw_meta_dict, Path(location_tosave_fullset_gooddata))
else:
    print("There is an error somewhere!")

saved file: 20230727_C2-121_conditioned_well 2_cell 1_50uM NaAsO_Linear unmixing_0_cmle.ome-cell-copy


In [None]:
if edit_nuc is True:
    nuc_seg = _import_inferred_organelle("nuc", raw_meta_dict, location_tosave_edited_segmentations, seg_file_type)
    out_file_n = export_inferred_organelle(cell_seg, "nuc", raw_meta_dict, Path(location_tosave_fullset_gooddata))
elif edit_cell is False:
    nuc_seg = mask_seg[2]
    out_file_n = export_inferred_organelle(cell_seg, "nuc", raw_meta_dict, Path(location_tosave_fullset_gooddata))
else:
    print("There is an error somewhere!")

Head to the Napari window!
You can edit your segmentation as needed there.
Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'


In [None]:
if edit_LD is True:
    LD_seg = _import_inferred_organelle("LD", raw_meta_dict, location_tosave_edited_segmentations, seg_file_type)
    out_file_n = export_inferred_organelle(LD_seg, "LD", raw_meta_dict, Path(location_tosave_fullset_gooddata))
elif edit_LD is False:
    out_file_n = export_inferred_organelle(LD_seg, "LD", raw_meta_dict, Path(location_tosave_fullset_gooddata))
else:
    print("There is an error somewhere!")

Head to the Napari window!
You can edit your segmentation as needed there.
Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'


In [None]:
if edit_ER is True:
    ER_seg = _import_inferred_organelle("ER", raw_meta_dict, location_tosave_edited_segmentations, seg_file_type)
    out_file_n = export_inferred_organelle(ER_seg, "ER", raw_meta_dict, Path(location_tosave_fullset_gooddata))
elif edit_ER is False:
    out_file_n = export_inferred_organelle(ER_seg, "ER", raw_meta_dict, Path(location_tosave_fullset_gooddata))
else:
    print("There is an error somewhere!")

Head to the Napari window!
You can edit your segmentation as needed there.
Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'


In [None]:
if edit_golgi is True:
    golgi_seg = _import_inferred_organelle("golgi", raw_meta_dict, location_tosave_edited_segmentations, seg_file_type)
    out_file_n = export_inferred_organelle(golgi_seg, "golgi", raw_meta_dict, Path(location_tosave_fullset_gooddata))
elif edit_golgi is False:
    out_file_n = export_inferred_organelle(golgi_seg, "golgi", raw_meta_dict, Path(location_tosave_fullset_gooddata))
else:
    print("There is an error somewhere!")

Head to the Napari window!
You can edit your segmentation as needed there.
Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'


In [None]:
if edit_lyso is True:
    lyso_seg = _import_inferred_organelle("lyso", raw_meta_dict, location_tosave_edited_segmentations, seg_file_type)
    out_file_n = export_inferred_organelle(lyso_seg, "lyso", raw_meta_dict, Path(location_tosave_fullset_gooddata))
elif edit_lyso is False:
    out_file_n = export_inferred_organelle(lyso_seg, "lyso", raw_meta_dict, Path(location_tosave_fullset_gooddata))
else:
    print("There is an error somewhere!")

Head to the Napari window!
You can edit your segmentation as needed there.
Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'


In [None]:
if edit_mito is True:
    mito_seg = _import_inferred_organelle("mito", raw_meta_dict, location_tosave_edited_segmentations, seg_file_type)
    out_file_n = export_inferred_organelle(mito_seg, "mito", raw_meta_dict, Path(location_tosave_fullset_gooddata))
elif edit_mito is False:
    out_file_n = export_inferred_organelle(mito_seg, "mito", raw_meta_dict, Path(location_tosave_fullset_gooddata))
else:
    print("There is an error somewhere!")

Head to the Napari window!
You can edit your segmentation as needed there.
Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'


In [None]:
if edit_perox is True:
    perox_seg = _import_inferred_organelle("perox", raw_meta_dict, location_tosave_edited_segmentations, seg_file_type)
    out_file_n = export_inferred_organelle(perox_seg, "perox", raw_meta_dict, Path(location_tosave_fullset_gooddata))
elif edit_perox is False:
    out_file_n = export_inferred_organelle(perox_seg, "perox", raw_meta_dict, Path(location_tosave_fullset_gooddata))
else:
    print("There is an error somewhere!")

Head to the Napari window!
You can edit your segmentation as needed there.
Be sure to save the new segmentation using File > Save in the Napari window. You should save it to the folder you listed as 'location_tosave_edited_segmentations'


__________________________
## ⚠️ **WORK IN PROGRESS:** Quantifying segmentation

In [None]:
def check_for_existing_combo(contact, contact_list, splitter):
    for ctc in contact_list:
        if sorted(contact) == sorted(ctc.split(splitter)):
            return(ctc.split(splitter))
    return contact

In [25]:
# for convex hull errors
warnings.simplefilter("ignore", UserWarning)
warnings.simplefilter("ignore", RuntimeWarning)

def _batch_process_quantification(out_file_name: str,
                                  seg_path: Union[Path,str],
                                  out_path: Union[Path, str], 
                                  raw_path: Union[Path,str], 
                                  raw_file_type: str,
                                  organelle_names: List[str],
                                  organelle_channels: List[int],
                                  region_names: List[str],
                                  masks_file_name: list[str],
                                  mask: str,
                                  dist_centering_obj:str, 
                                  dist_num_bins: int,
                                  dist_center_on: bool=False,
                                  dist_keep_center_as_bin: bool=True,
                                  dist_zernike_degrees: Union[int, None]=None,
                                  include_contact_dist: bool = True,
                                  scale:bool=True,
                                  seg_suffix:Union[str, None]=None,
                                  splitter: str = '_') -> int :
    """  
    batch process segmentation quantification (morphology, distribution, contacts); this function is currently optimized to process images from one file folder per image type (e.g., raw, segmentation)
    the output csv files are saved to the indicated out_path folder

    Parameters:
    ----------
    out_file_name: str
        the prefix to use when naming the output datatables
    seg_path: Union[Path,str]
        Path or str to the folder that contains the segmentation tiff files
    out_path: Union[Path, str]
        Path or str to the folder that the output datatables will be saved to
    raw_path: Union[Path,str]
        Path or str to the folder that contains the raw image files
    raw_file_type: str
        the file type of the raw data; ex - ".tiff", ".czi"
    organelle_names: List[str]
        a list of all organelle names that will be analyzed; the names should be the same as the suffix used to name each of the tiff segmentation files
        Note: the intensity measurements collect per region (from get_region_morphology_3D function) will only be from channels associated to these organelles 
    organelle_channels: List[int]
        a list of channel indices associated to respective organelle staining in the raw image; the indices should listed in same order in which the respective segmentation name is listed in organelle_names
    region_names: List[str]
        a list of regions, or masks, to measure; the order should correlate to the order of the channels in the "masks" output segmentation file
    masks_file_name: str
        the suffix of the "masks" segmentation file; ex- "masks_B", "masks", etc.
        this function currently does not accept indivial region segmentations 
    mask: str
        the name of the region to use as the mask when measuring the organelles; this should be one of the names listed in regions list; usually this will be the "cell" mask
    dist_centering_obj:str
        the name of the region or object to use as the centering object in the get_XY_distribution function
    dist_num_bins: int
        the number of bins for the get_XY_distribution function
    dist_center_on: bool=False,
        for get_XY_distribution:
        True = distribute the bins from the center of the centering object
        False = distribute the bins from the edge of the centering object
    dist_keep_center_as_bin: bool=True
        for get_XY_distribution:
        True = include the centering object area when creating the bins
        False = do not include the centering object area when creating the bins
    dist_zernike_degrees: Union[int, None]=None
        for get_XY_distribution:
        the number of zernike degrees to include for the zernike shape descriptors; if None, the zernike measurements will not 
        be included in the output
    include_contact_dist:bool=True
        whether to include the distribution of contact sites in get_contact_metrics_3d(); True = include contact distribution
    scale:bool=True
        a tuple that contains the real world dimensions for each dimension in the image (Z, Y, X)
    seg_suffix:Union[str, None]=None
        any additional text that is included in the segmentation tiff files between the file stem and the segmentation suffix
        TODO: this can't be None!!! need to update!!!


    Returns:
    ----------
    count: int
        the number of images processed
        
    """
    start = time.time()
    count = 0

    if isinstance(raw_path, str): raw_path = Path(raw_path)
    if isinstance(seg_path, str): seg_path = Path(seg_path)
    if isinstance(out_path, str): out_path = Path(out_path)
    
    if not Path.exists(out_path):
        Path.mkdir(out_path)
        print(f"making {out_path}")
    
    # reading list of files from the raw path
    img_file_list = list_image_files(raw_path, raw_file_type)

    # list of segmentation files to collect
    segs_to_collect = organelle_names + masks_file_name

    # containers to collect data tabels
    org_tabs = []
    contact_tabs = []
    dist_tabs = []
    region_tabs = []
    for img_f in img_file_list:
        count = count + 1
        filez = find_segmentation_tiff_files(img_f, segs_to_collect, seg_path, seg_suffix)

        # read in raw file and metadata
        img_data, meta_dict = read_czi_image(filez["raw"])

        # create intensities from raw file as list based on the channel order provided
        intensities = [img_data[ch] for ch in organelle_channels]

        # define the scale
        if scale is True:
            scale_tup = meta_dict['scale']
        else:
            scale_tup = None

        # load regions as a list based on order in list (should match order in "masks" file)
        # masks = read_tiff_image(filez[masks_file_name]) 
        # regions = [masks[r] for r, region in enumerate(region_names)]
        regions= [read_tiff_image(filez[masks_file_name[0]]), read_tiff_image(filez[masks_file_name[1]])]

        # store organelle images as list
        organelles = [read_tiff_image(filez[org]) for org in organelle_names]

        org_metrics, contact_metrics, dist_metrics, region_metrics = make_all_metrics_tables(source_file=img_f,
                                                                                             list_obj_names=organelle_names,
                                                                                             list_obj_segs=organelles,
                                                                                             list_intensity_img=intensities, 
                                                                                             list_region_names=region_names,
                                                                                             list_region_segs=regions, 
                                                                                             mask=mask,
                                                                                             dist_centering_obj=dist_centering_obj,
                                                                                             dist_num_bins=dist_num_bins,
                                                                                             dist_center_on=dist_center_on,
                                                                                             dist_keep_center_as_bin=dist_keep_center_as_bin,
                                                                                             dist_zernike_degrees=dist_zernike_degrees,
                                                                                             scale=scale_tup,
                                                                                             include_contact_dist=include_contact_dist,
                                                                                             splitter=splitter)

        org_tabs.append(org_metrics)
        contact_tabs.append(contact_metrics)
        dist_tabs.append(dist_metrics)
        region_tabs.append(region_metrics)
        end2 = time.time()
        print(f"Completed processing for {count} images in {(end2-start)/60} mins.")

    final_org = pd.concat(org_tabs, ignore_index=True)
    final_contact = pd.concat(contact_tabs, ignore_index=True)
    final_dist = pd.concat(dist_tabs, ignore_index=True)
    final_region = pd.concat(region_tabs, ignore_index=True)

    org_csv_path = out_path / f"{out_file_name}organelles.csv"
    final_org.to_csv(org_csv_path)

    contact_csv_path = out_path / f"{out_file_name}contacts.csv"
    final_contact.to_csv(contact_csv_path)

    dist_csv_path = out_path / f"{out_file_name}distributions.csv"
    final_dist.to_csv(dist_csv_path)

    region_csv_path = out_path / f"{out_file_name}regions.csv"
    final_region.to_csv(region_csv_path)

    end = time.time()
    print(f"Quantification for {count} files is COMPLETE! Files saved to '{out_path}'.")
    print(f"It took {(end - start)/60} minutes to quantify these files.")
    return count

In [26]:
seg=_batch_process_quantification(out_file_name= "20231117_prelim",
                                  seg_path="C:/Users/zscoman/Documents/Python Scripts/Infer-subc-2D/out",
                                  out_path="C:/Users/zscoman/Documents/Python Scripts/Infer-subc-2D/data/test", 
                                  raw_path="C:/Users/zscoman/Documents/Python Scripts/Infer-subc-2D/raw/shannon",
                                  raw_file_type = ".tiff",
                                  organelle_names = ['LD', 'ER', 'golgi', 'lyso', 'mito', 'perox'],
                                  organelle_channels= [0,1,2,3,4,5],
                                  region_names= ['nuc', 'cell'],
                                  masks_file_name= ['nuc', 'cell'],
                                  mask= 'cell',
                                  dist_centering_obj='nuc', 
                                  dist_num_bins=5,
                                  dist_center_on=False,
                                  dist_keep_center_as_bin=True,
                                  dist_zernike_degrees=None,
                                  include_contact_dist= True,
                                  scale=True,
                                  seg_suffix="-",
                                  splitter='X')

WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
WTF!!  how did we have missing labels?
It took 11.197751875718435 minutes to quantify one image.
Completed processing for 1 images in 11.630062663555146 mins.
WTF!!  how did we have missing labels?
WTF!!  how did we have

__________________________
## Collecting Summary Stats across multiple experiments

In [30]:
def _batch_summary_stats(csv_path_list: List[str],
                         out_path: str,
                         out_preffix: str,
                         splitter: str='X'):
    """" 
    csv_path_list: List[str],
        A list of path strings where .csv files to analyze are located.
    out_path: str,
        A path string where the summary data file will be output to
    out_preffix: str
        The prefix used to name the output file.    
    """
    ds_count = 0
    fl_count = 0
    ###################
    # Read in the csv files and combine them into one of each type
    ###################
    org_tabs = []
    contact_tabs = []
    dist_tabs = []
    region_tabs = []

    for loc in csv_path_list:
        print(loc)
        ds_count = ds_count + 1
        loc=Path(loc)
        files_store = sorted(loc.glob("*.csv"))
        for file in files_store:
            fl_count = fl_count + 1
            stem = file.stem

            org = "organelles"
            contacts = "contacts"
            dist = "distributions"
            regions = "regions"

            if org in stem:
                test_orgs = pd.read_csv(file, index_col=0)
                test_orgs.insert(0, "dataset", stem[:-11])
                org_tabs.append(test_orgs)
            if contacts in stem:
                test_contact = pd.read_csv(file, index_col=0)
                test_contact.insert(0, "dataset", stem[:-9])
                contact_tabs.append(test_contact)
            if dist in stem:
                test_dist = pd.read_csv(file, index_col=0)
                test_dist.insert(0, "dataset", stem[:-14])
                dist_tabs.append(test_dist)
            if regions in stem:
                test_regions = pd.read_csv(file, index_col=0)
                test_regions.insert(0, "dataset", stem[:-8])
                region_tabs.append(test_regions)
            
    org_df = pd.concat(org_tabs,axis=0, join='outer')
    contacts_df = pd.concat(contact_tabs,axis=0, join='outer')
    dist_df = pd.concat(dist_tabs,axis=0, join='outer')
    regions_df = pd.concat(region_tabs,axis=0, join='outer')
    ##########################
    # List organelles in cell
    ###########################
    all_orgs = list(set(org_df.loc[:, 'object'].tolist()))

    ###################
    # adding new metrics to the original sheets
    ###################
    # TODO: include these labels when creating the original sheets
    contact_cnt = contacts_df[["dataset", "image_name", "object", "label", "volume"]]
    ctc = contact_cnt["object"].values.tolist()
    ##############################################################################
    #  Creating New methods of storing A & B
    ###############################################################################
    # len(max(contact_cnt["object"].str.split('X'), key=len))) provides max number of organelles involved in contact
    contact_cnt[[f"org{cha}" for cha in string.ascii_uppercase[:(len(max(contact_cnt["object"].str.split(splitter), key=len)))]]] = contact_cnt["object"].str.split(splitter, expand=True)
    contact_cnt[[f"{cha}_ID" for cha in string.ascii_uppercase[:(len(max(contact_cnt["label"].str.split('_'), key=len)))]]] = contact_cnt["label"].str.split('_', expand=True)
    #iterating from a to val
    unstacked_cont = []
    for cha in string.ascii_uppercase[:len(max(contact_cnt["object"].str.split(splitter), key=len))]:
        valid = (contact_cnt[f"org{cha}"] != None) & (contact_cnt[f"{cha}_ID"] != None)
        contact_cnt[f"{cha}"] = None
        contact_cnt.loc[valid, f"{cha}"] = contact_cnt[f"org{cha}"] + "_" + contact_cnt[f"{cha}_ID"]
        contact_cnt_percell = contact_cnt[["dataset", "image_name", f"org{cha}", f"{cha}_ID", "object", "volume"]].groupby(["dataset", "image_name", f"org{cha}", f"{cha}_ID", "object"]).agg(["count", "sum"])
        contact_cnt_percell.columns = ["_".join(col_name).rstrip('_') for col_name in contact_cnt_percell.columns.to_flat_index()]
        unstacked = contact_cnt_percell.unstack(level='object')
        unstacked.columns = ["_".join(col_name).rstrip('_') for col_name in unstacked.columns.to_flat_index()]
        unstacked = unstacked.reset_index()
        for col in unstacked.columns:
            if col.startswith("volume_count_"):
                newname = col.split("_")[-1] + "_count"
                unstacked.rename(columns={col:newname}, inplace=True)
            if col.startswith("volume_sum_"):
                newname = col.split("_")[-1] + "_volume"
                unstacked.rename(columns={col:newname}, inplace=True)
        unstacked.rename(columns={f"org{cha}":"object", f"{cha}_ID":"label"}, inplace=True)
        unstacked.set_index(['dataset', 'image_name', 'object', 'label'])    
        unstacked_cont.append(unstacked)
    contact_cnt = pd.concat(unstacked_cont, axis=0).sort_index(axis=0)
    contact_cnt = contact_cnt.groupby(['dataset', 'image_name', 'object', 'label']).sum().reset_index()                 #adds together all duplicates at the index, then resets the index
    contact_cnt['label']=contact_cnt['label'].astype("Int64")  
    org_df = pd.merge(org_df, contact_cnt, how='left', on=['dataset', 'image_name', 'object', 'label'], sort=True)
    org_df[contact_cnt.columns] = org_df[contact_cnt.columns].fillna(0)

    ###################
    # summary stat group
    ###################
    group_by = ['dataset', 'image_name', 'object']
    sharedcolumns = ["SA_to_volume_ratio", "equivalent_diameter", "extent", "euler_number", "solidity", "axis_major_length"]
    ag_func_standard = ['mean', 'median', 'std']

    ###################
    # summarize shared measurements between org_df and contacts_df
    ###################
    org_cont_tabs = []
    for tab in [org_df, contacts_df]:
        tab1 = tab[group_by + ['volume']].groupby(group_by).agg(['count', 'sum'] + ag_func_standard)
        tab2 = tab[group_by + ['surface_area']].groupby(group_by).agg(['sum'] + ag_func_standard)
        tab3 = tab[group_by + sharedcolumns].groupby(group_by).agg(ag_func_standard)
        shared_metrics = pd.merge(tab1, tab2, 'outer', on=group_by)
        shared_metrics = pd.merge(shared_metrics, tab3, 'outer', on=group_by)
        org_cont_tabs.append(shared_metrics)

    org_summary = org_cont_tabs[0]
    contact_summary = org_cont_tabs[1]

    ###################
    # group metrics from regions_df similar to the above
    ###################
    regions_summary = regions_df[group_by + ['volume', 'surface_area'] + sharedcolumns].set_index(group_by)

    ###################
    # summarize extra metrics from org_df
    ###################
    columns2 = [col for col in org_df.columns if col.endswith(("_count", "_volume"))]
    contact_counts_summary = org_df[group_by + columns2].groupby(group_by).agg(['sum'] + ag_func_standard)
    org_summary = pd.merge(org_summary, contact_counts_summary, 'outer', on=group_by)#left_on=group_by, right_on=True)

    ###################
    # summarize distribution measurements
    ###################
    # organelle distributions
    hist_dfs = []
    for ind in dist_df.index:
        selection = dist_df.loc[[ind]]
        bins_df = pd.DataFrame()
        wedges_df = pd.DataFrame()
        Z_df = pd.DataFrame()

        bins_df[['bins', 'masks', 'obj']] = selection[['XY_bins', 'XY_mask_vox_cnt_perbin', 'XY_obj_vox_cnt_perbin']]
        wedges_df[['bins', 'masks', 'obj']] = selection[['XY_wedges', 'XY_mask_vox_cnt_perwedge', 'XY_obj_vox_cnt_perwedge']]
        Z_df[['bins', 'masks', 'obj']] = selection[['Z_slices', 'Z_mask_vox_cnt', 'Z_obj_vox_cnt']]

        dfs = [selection[['dataset', 'image_name', 'object']].reset_index()]
        for df, prefix in zip([bins_df, wedges_df, Z_df], ["XY_bins_", "XY_wedges_", "Z_slices_"]):
            single_df = pd.DataFrame(list(zip(df["bins"].values[0][1:-1].split(", "), 
                                            df["obj"].values[0][1:-1].split(", "), 
                                            df["masks"].values[0][1:-1].split(", "))), columns =['bins', 'obj', 'mask']).astype(int)

            if "Z_" in prefix:
                single_df =  single_df.drop(single_df[single_df['mask'] == 0].index)
                single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)
        
            single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
            single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
            single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100

            # single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
            # single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
            # single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100

            # if "Z_" in prefix:
            #     single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)


            sumstats_df = pd.DataFrame()

            s = single_df['bins'].repeat(single_df['obj_norm'])
            sumstats_df['hist_mean']=[s.mean()]
            sumstats_df['hist_median']=[s.median()]
            if single_df['obj_norm'].sum() != 0: sumstats_df['hist_mode']=[s.mode()[0]]
            else: sumstats_df['hist_mode']=['NaN']
            sumstats_df['hist_min']=[s.min()]
            sumstats_df['hist_max']=[s.max()]
            sumstats_df['hist_range']=[s.max() - s.min()]
            sumstats_df['hist_stdev']=[s.std()]
            sumstats_df['hist_skew']=[s.skew()]
            sumstats_df['hist_kurtosis']=[s.kurtosis()]
            sumstats_df['hist_var']=[s.var()]
            sumstats_df.columns = [prefix+col for col in sumstats_df.columns]
            dfs.append(sumstats_df.reset_index())
        combined_df = pd.concat(dfs, axis=1).drop(columns="index")
        hist_dfs.append(combined_df)
    dist_org_summary = pd.concat(hist_dfs, ignore_index=True)

    # nucleus distribution
    nuc_dist_df = dist_df[["dataset", "image_name", 
                        "XY_bins", "XY_center_vox_cnt_perbin", "XY_mask_vox_cnt_perbin",
                        "XY_wedges", "XY_center_vox_cnt_perwedge", "XY_mask_vox_cnt_perwedge",
                        "Z_slices", "Z_center_vox_cnt", "Z_mask_vox_cnt"]].set_index(["dataset", "image_name"])
    nuc_hist_dfs = []
    for idx in nuc_dist_df.index.unique():
        selection = nuc_dist_df.loc[idx].iloc[[0]].reset_index()
        bins_df = pd.DataFrame()
        wedges_df = pd.DataFrame()
        Z_df = pd.DataFrame()

        bins_df[['bins', 'center', 'masks']] = selection[['XY_bins', 'XY_center_vox_cnt_perbin', 'XY_mask_vox_cnt_perbin']]
        wedges_df[['bins', 'center', 'masks']] = selection[['XY_wedges', 'XY_center_vox_cnt_perwedge', 'XY_mask_vox_cnt_perwedge']]
        Z_df[['bins', 'center', 'masks']] = selection[['Z_slices', 'Z_center_vox_cnt', 'Z_mask_vox_cnt']]

        dfs = [selection[['dataset', 'image_name']]]
        for df, prefix in zip([bins_df, wedges_df, Z_df], ["XY_bins_", "XY_wedges_", "Z_slices_"]):
            single_df = pd.DataFrame(list(zip(df["bins"].values[0][1:-1].split(", "), 
                                            df["masks"].values[0][1:-1].split(", "),
                                            df["center"].values[0][1:-1].split(", "))), columns =['bins', 'mask', 'obj']).astype(int)
            # single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
            # single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
            # single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100
            # if "Z_" in prefix:
            #     single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)

            if "Z_" in prefix:
                single_df =  single_df.drop(single_df[single_df['mask'] == 0].index)
                single_df['bins'] = (single_df["bins"]/max(single_df.bins)*10).apply(np.floor)
        
            single_df['mask_fract'] = single_df['mask']/single_df['mask'].max()
            single_df['obj_norm'] = (single_df["obj"]/single_df["mask_fract"]).fillna(0)
            single_df['portion_per_bin'] = (single_df["obj"] / single_df["obj"].sum())*100

            sumstats_df = pd.DataFrame()

            s = single_df['bins'].repeat(single_df['obj_norm'])
            sumstats_df['hist_mean']=[s.mean()]
            sumstats_df['hist_median']=[s.median()]
            if single_df['obj_norm'].sum() != 0: sumstats_df['hist_mode']=[s.mode()[0]]
            else: sumstats_df['hist_mode']=['NaN']
            sumstats_df['hist_min']=[s.min()]
            sumstats_df['hist_max']=[s.max()]
            sumstats_df['hist_range']=[s.max() - s.min()]
            sumstats_df['hist_stdev']=[s.std()]
            sumstats_df['hist_skew']=[s.skew()]
            sumstats_df['hist_kurtosis']=[s.kurtosis()]
            sumstats_df['hist_var']=[s.var()]
            sumstats_df.columns = [prefix+col for col in sumstats_df.columns]
            dfs.append(sumstats_df.reset_index())
        combined_df = pd.concat(dfs, axis=1).drop(columns="index")
        nuc_hist_dfs.append(combined_df)
    dist_center_summary = pd.concat(nuc_hist_dfs, ignore_index=True)
    dist_center_summary.insert(2, column="object", value="nuc")

    dist_summary = pd.concat([dist_org_summary, dist_center_summary], axis=0).set_index(group_by).sort_index()

    ###################
    # add normalization
    ###################
    # organelle area fraction
    area_fractions = []
    for idx in org_summary.index.unique():
        org_vol = org_summary.loc[idx][('volume', 'sum')]
        cell_vol = regions_summary.loc[idx[:-1] + ('cell',)]["volume"]
        afrac = org_vol/cell_vol
        area_fractions.append(afrac)
    org_summary[('volume', 'fraction')] = area_fractions
    # TODO: add in line to reorder the level=0 columns here

    # contact sites volume normalized
    # norm_toA_list = []
    # norm_toB_list = []
    norm_to_list = {}
    for col in contact_summary.index:
        for idx, cha in enumerate(string.ascii_uppercase[:len(max(contact_summary.index.get_level_values('object').str.split(splitter), key=len))]):
            if cha not in norm_to_list:
                norm_to_list[f"{cha}"] = []
            if ((idx+1) <= len(col[-1].split(splitter))):
                norm_to_list[f"{cha}"].append(contact_summary.loc[col][('volume', 'sum')]/org_summary.loc[col[:-1]+(col[-1].split(splitter)[idx],)][('volume', 'sum')])
            else:
                norm_to_list[f"{cha}"].append(None)
    for cha in string.ascii_uppercase[:len(max(contact_summary.index.get_level_values('object').str.split(splitter), key=len))]:
        contact_summary[('volume', f'norm_to_{cha}')] = norm_to_list[f"{cha}"]
        # norm_toA_list.append(contact_summary.loc[col][('volume', 'sum')]/org_summary.loc[col[:-1]+(col[-1].split(splitter)[0],)][('volume', 'sum')])
        # norm_toB_list.append(contact_summary.loc[col][('volume', 'sum')]/org_summary.loc[col[:-1]+(col[-1].split(splitter)[1],)][('volume', 'sum')])
    # contact_summary[('volume', 'norm_to_A')] = norm_toA_list
    # contact_summary[('volume', 'norm_to_B')] = norm_toB_list

    # number and area of individuals organelle involved in contact
    cont_cnt = org_df[group_by]
    cont_cnt[[col.split('_')[0] for col in org_df.columns if col.endswith(("_count"))]] = org_df[[col for col in org_df.columns if col.endswith(("_count"))]].astype(bool)
    cont_cnt_perorg = cont_cnt.groupby(group_by).agg('sum')
    cont_cnt_perorg.columns = pd.MultiIndex.from_product([cont_cnt_perorg.columns, ['count_in']])
    for col in cont_cnt_perorg.columns:
        cont_cnt_perorg[(col[0], 'num_fraction_in')] = cont_cnt_perorg[col].values/org_summary[('volume', 'count')].values
    cont_cnt_perorg.sort_index(axis=1, inplace=True)
    org_summary = pd.merge(org_summary, cont_cnt_perorg, on=group_by, how='outer')


    ###################
    # flatten datasheets and combine
    # TODO: restructure this so that all of the datasheets and unstacked and then reorded based on shared level 0 columns before flattening
    ###################
    # org flattening
    org_final = org_summary.unstack(-1)
    for col in org_final.columns:
        if col[1] in ('count_in', 'num_fraction_in') or col[0].endswith(('_count', '_volume')):
            if col[2] not in col[0]:
                org_final.drop(col,axis=1, inplace=True)
    ########################################################################
    # MAKING new_col_order flexible to work with any organelle input values and combo number
    #######################################################################
    new_col_order = ['dataset', 'image_name', 'object', 'volume', 'surface_area', 'SA_to_volume_ratio', 
                     'equivalent_diameter', 'extent', 'euler_number', 'solidity', 'axis_major_length'] 
    all_combos = []
    for n in list(map(lambda x:x+2, (range(len(all_orgs)-1)))):
            for o in itertools.combinations(all_orgs, n):
                all_combos.append(check_for_existing_combo(o, ctc, splitter))
    combos = [splitter.join(cont) for cont in all_combos]
    for combo in combos:
        new_col_order += [f"{combo}", f"{combo}_count", f"{combo}_volume"]
    new_cols = org_final.columns.reindex(new_col_order, level=0)
    org_final = org_final.reindex(columns=new_cols[0])
    org_final.columns = ["_".join((col_name[-1], col_name[1], col_name[0])) for col_name in org_final.columns.to_flat_index()]

    #renaming, filling "NaN" with 0 when needed, and removing ER_std columns
    for col in org_final.columns:
        if '_count_in_' or '_fraction_in_' in col:
            org_final[col] = org_final[col].fillna(0)
        if col.endswith(("_count_volume","_sum_volume", "_mean_volume", "_median_volume")):
            org_final[col] = org_final[col].fillna(0)
        if col.endswith("_count_volume"):
            org_final.rename(columns={col:col.split("_")[0]+"_count"}, inplace=True)
        if col.startswith("ER_std_"):
            org_final.drop(columns=[col], inplace=True)
    org_final = org_final.reset_index()

    # contacts flattened
    contact_final = contact_summary.unstack(-1)
    contact_final.columns = ["_".join((col_name[-1], col_name[1], col_name[0])) for col_name in contact_final.columns.to_flat_index()]

    #renaming and filling "NaN" with 0 when needed
    for col in contact_final.columns:
        if col.endswith(("_count_volume","_sum_volume", "_mean_volume", "_median_volume")):
            contact_final[col] = contact_final[col].fillna(0)
        if col.endswith("_count_volume"):
            contact_final.rename(columns={col:col.split("_")[0]+"_count"}, inplace=True)
    contact_final = contact_final.reset_index()

    # distributions flattened
    dist_final = dist_summary.unstack(-1)
    dist_final.columns = ["_".join((col_name[1], col_name[0])) for col_name in dist_final.columns.to_flat_index()]
    dist_final = dist_final.reset_index()

    # regions flattened & normalization added
    regions_final = regions_summary.unstack(-1)
    regions_final.columns = ["_".join((col_name[1], col_name[0])) for col_name in regions_final.columns.to_flat_index()]
    regions_final['nuc_area_fraction'] = regions_final['nuc_volume'] / regions_final['cell_volume']
    regions_final = regions_final.reset_index()

    # combining them all
    combined = pd.merge(org_final, contact_final, on=["dataset", "image_name"], how="outer")
    combined = pd.merge(combined, dist_final, on=["dataset", "image_name"], how="outer")
    combined = pd.merge(combined, regions_final, on=["dataset", "image_name"], how="outer").set_index(["dataset", "image_name"])
    combined.columns = [col.replace('sum', 'total') for col in combined.columns]

    ###################
    # export summary sheets
    ###################
    org_summary.to_csv(out_path + f"/{out_preffix}per_org_summarystats.csv")
    contact_summary.to_csv(out_path + f"/{out_preffix}per_contact_summarystats.csv")
    dist_summary.to_csv(out_path + f"/{out_preffix}distribution_summarystats.csv")
    regions_summary.to_csv(out_path + f"/{out_preffix}per_region_summarystats.csv")
    combined.to_csv(out_path + f"/{out_preffix}summarystats_combined.csv")

    print(f"Processing of {fl_count} files from {ds_count} dataset(s) is complete.")
    return f"{fl_count} files from {ds_count} dataset(s) were processed"

In [31]:
out=_batch_summary_stats(csv_path_list=["C:/Users/zscoman/Documents/Python Scripts/Infer-subc-2D/data/test"],
                         out_path="C:/Users/zscoman/Documents/Python Scripts/Infer-subc-2D/data/test/sumstat",
                         out_preffix="20231117_prelim_",
                         splitter="X")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  contact_cnt[["orgA", "orgB"]] = contact_cnt["object"].str.split('X', expand=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  contact_cnt[["orgA", "orgB"]] = contact_cnt["object"].str.split('X', expand=True)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  contact_cnt[["A_ID", "B_ID"]] = contact_c

Processing of 4 files from 1 dataset(s) is complete.
