# Full Image Analysis Pipeline

Our open-source python library `mplex-image` enables automation and scaling while documenting the steps of multiplex image processing and analysis. This notebook can be used to reproduce our Figure 3 analysis of tissue micorarrays - including metadata extraction from microscopy files, image registration, autofluorescence subtraction, cell segmentation and feature extraction, normalization and clustering to identify tumor cell types.

Currently, we have implemented pipelines for [cycIF](https://gitlab.com/engje/cmif/-/blob/master/mplex_image/cmif.py), [Codex](https://gitlab.com/engje/cmif/-/blob/master/mplex_image/codex.py) and [Macsima (beta)](https://gitlab.com/engje/cmif/-/blob/master/mplex_image/mics.py) multiplex imaging platforms. This is an example of the cycIF pipeline.

In [4]:
# Import libraries
from mplex_image import preprocess, mpimage, cmif, process

# 1 Set paths

Specify full file path to directories containing code and images (.czi and .tiff). Then, specify a list of slide to be processed `ls_sample`

In [6]:
# Set Paths
codedir = '/home/groups/graylab_share/OMERO.rdsStore/engje/Data/cycIF_ValidationStudies/cycIF_Validation'
czidir = f'{codedir}/Images/PipelineExample/czi'
tiffdir = f'{codedir}/Images/PipelineExample/PipelineExample/RawImages'
qcdir = f'{codedir}/Images/PipelineExample/QC'
regdir = f'{codedir}/Images/PipelineExample/RegisteredImages'
subdir = f'{codedir}/Images/PipelineExample/SubtractedRegisteredImages'
segdir = f'{codedir}/Images/PipelineExample/Segmentation'

preprocess.cmif_mkdir([tiffdir,qcdir,regdir,segdir,subdir])

# List slides to be processed
ls_sample = ['JE-TMA-41','JE-TMA-42', 'JE-TMA-43']

# 2   Export exposure time metadata from .czi
The Zeiss Axioscan full slide scanner outputs Carl Zeiss Images (.czi). These and other proprietary file formats may be opened with the [python-bioformats](https://pypi.org/project/python-bioformats/) package. Metadata, such as exposure time, is read from the image file rather than manually recored (which is tedious and error prone). Each subfolder within the directory `czidir` contains all .czi files from the entire cycIF expeiment for a single slide.

**Note:** Due to large size of czi images, these are hosted on synapse.org

In [None]:
for s_sample in ls_sample:
    #parse file names
    df_img = cmif.parse_czi(f'{czidir}/{s_sample}',b_scenes=True)
    #scenes
    cmif.exposure_times_scenes(df_img, codedir, czidir=f'{czidir}/{s_sample}', s_end='.czi')

# 3 QC Raw Images
Export all raw tiffs as 16 bit original images using the microscope software. Put in separate folders,
one per slide, within RawImages folder inside of `codedir`. This section will then generate overviews of all rounds, for QC purposes.

In [None]:
preprocess.cmif_mkdir([f'{qcdir}/RawImages'])

for s_sample in ls_sample:
    os.chdir(f'{tiffdir}/{s_sample}')
    #investigate tissues
    df_img = mpimage.parse_org(s_end = "ORG.tif",type='raw')
    cmif.visualize_raw_images(df_img,qcdir,color='c1')


# 4 Register Images
This section creates a Matlab script for registration and starts a job on our server (OHSU exacloud).
During registration, we have the funtionality to crop the tissue by entering the `new_scene_id : '[upperleftXcoord upperleftYcoord width height]'` as a key : value pair in the `d_register` dictionary of dictionaries. Leaving the dictionary blank will start registration without cropping. Each registered scene goes to a separate folder.

In [None]:
d_register = {'JE-TMA-41':{}, #TMA registration
    'JE-TMA-42':{},
    'JE-TMA-43':{},
    #Example of tissue registration with cropping
    'NP029':{1:'[0 2000 15000 18000]',2:'[15000 14000 15000 6000]'}, #large registration
 }
ls_order = ['R1','R0','R2','R3','R4','R5','R5Q','R6','R7','R8','R9','R10','R11','R12','R12Q'] 

for key,item in d_register.items():
   #run registration
   cmif.run_registration_matlab(d_register, ls_order, f'{tiffdir}/{key}', f'{regdir}/')


# 5   Check Registration

This section will then generate overviews of all rounds of each registered image stack, for QC purposes.


In [None]:
cmif.visualize_reg_images(f'{regdir}',qcdir,color='c1')

# 6a  Create AF Subtracted Images

Images acquired of background autofluorescence by are scaled by exposure time and subtracted from the respective channel, producing a new image. `d_channel` specifies the name of the background marker to subtract from each channel. `ls_exclude` lists which markers upon which to not perform any AF subtraction (typically c5 images).

Images are output to an AFSubtracted folder within each registered scene's  **separate folder**. The scenes are subsequently combined into a single folder in **6b**.

In [None]:
#parameters
d_channel = {'c2':'R5Qc2','c3':'R5Qc3','c4':'R5Qc4','c5':'R5Qc5'}
ls_exclude=['DAPI','BMP2', 'CD20', 'CD3', 'CD44', 'CD45', 'CK19',
 'ColI', 'Ecad', 'FoxP3', 'GRNZB', 'LamAC', 'PgR', 'R0c5', 'R12Qc5',]

#subtract
df_img, df_exp,df_markers,df_copy = cmif.autofluorescence_subtract(regdir,codedir,d_channel,ls_exclude)


##  6b  Move AF Subtracted Images

Move all scenes from the same slide to **one folder**, for segmentation.


In [None]:
for s_sample in ls_sample:
    cmif.move_af_img(s_sample, regdir, subdir, dirtype='tma',b_move=True)

##  7   Prep for Segmentation

First, rename filenames that use non-standard marker names.

Then, all inputs are for segmentation are automatically generated, including copying any additional images needed (e.g. DAPI channels), verifying segmentation thresholds, and preparing input files for segmentation (i.e. RoundCyclesTable.txt and Cluster.java specifically for Guillaume Thibault’s segmentation software).

**Note:** For the analysis presented in our paper, we used a Java-based software for cell segmentation and feature extraction. However, we have implemented the open-source segmentation algorithm [Cellpose](https://gitlab.com/engje/cmif/-/blob/master/mplex_image/segment.py) on our cycIF data as a python-based alternative.

In [None]:
#parameters
d_rename = {}
dapi_copy={'-R12_':112}
marker_copy ={} #'CK19':['CK5','CK7'],'CD44':['CD45']
d_segment = {'CK19':1002,'CK5':1002,'CD45':2002,'Ecad':1302,'CD44':3002,'CK7':1002,'CK14':1002}

cmif.rename_files(d_rename,dir=subdir,b_test=True)
cmif.copy_files(dir=subdir,dapi_copy=dapi_copy, marker_copy=marker_copy,b_test=False)
cmif.segmentation_thresholds(subdir,qcdir, d_segment)
cmif.segmentation_inputs(subdir,segdir,d_segment,tma_bool=False,b_start=False, i_counter=5)

##  8  Get Dataframe

After segmentation with Guillaume's software, the output is separate feature files for each marker. Here we extract all markers' features from the separate .txt files into single .tsv's; one for the single cell fluorescence intensity (MeanIntensity.tsv) and others for the cell locations (CentroidX.tsv, CentroidY.tsv).


In [None]:
for s_sample in ls_sample:
    cmif.extract_dataframe(s_sample, segdir,qcdir)

## 9  Filter Data

Post-processing includes filtering out lost cells based on last round DAPI staining and selection of dataframe columns based on desired biomarker sub-cellular location (we have standardized marker names and subcellular locations for our panel).

In [None]:
#parameters
s_dapi = 'DAPI12_Nuclei'
dapi_thresh = 1000
d_channel = {}
ls_exclude=[]

for s_sample in ls_sample:
    cmif.prepare_dataframe(s_sample,s_dapi,dapi_thresh,d_channel,ls_exclude,segdir, codedir)

# 10 Normalization 
[Next notebook](https://github.com/engjen/cycIF_Validation/blob/master/Fig3_Normalize_JE-TMA-reps_cluster_analysis.ipynb) demonstrates how to normalize by background and apply kmeans clustering for cell type definition. 

