# cmIF Pipeline

Author: engje

Date: 2020-12-13

License: GPLv3

Language: Python3

Description: cmIF is a Python3 library for automated image processing and analysis of multiplex immunofluorescence images


# Table of contents <a name="contents"></a>
1. [Import Libraries](#lib)
2. [Metadata](#meta)
3. [QC Images](#qc1)
4. [Register Images](#reg)
5. [QC Registration](#qcreg)
6. [Autofluorescence Subtraction](#afsub)
7. [Segmentation](#seg)
8. [Feature Extraction](#feat)
8. [Filter Data](#filter)

# Example Code

## 1   Import Libraries and Set Paths <a name="lib"></a>

[contents](#contents)

Within the root directory, cmIF will auto-generate a standardized folder structure. Folders are for Raw Images (unregistered tiffs single channel),
 QC outputs, Registered Images, Autofluorescence Subtracted Images, and Segmentation outputs. So only
 `codedir` and `rootdir` variables should be modified, retaining the standardized folder structure.
 
To run this example code, download Raw Images and czis from https://www.synapse.org/#!Synapse:syn22315952

Proper folder naming/structure will be: 

 - `rootdir`/data/PipelineExample/RawImages/
 - `rootdir`/data/PipelineExample/czis/

List the names of the slides to be processed as `ls_sample`. Each slide may have 1 or more scenes acquired during scanning.


In [None]:
# Import libraries
import os
import sys
import numpy as np
import pandas as pd
import shutil
import matplotlib.pyplot as plt
import re
from skimage import io

# Set Paths
#codedir = '/home/groups/graylab_share/OMERO.rdsStore/engje/Data/cmIF'
codedir = os.getcwd()
rootdir = f'{codedir}/Data/PipelineExample'
tiffdir = f'{rootdir}/RawImages'
qcdir = f'{rootdir}/QC'
regdir = f'{rootdir}/RegisteredImages'
subdir = f'{rootdir}/SubtractedRegisteredImages'
segdir = f'{rootdir}/Segmentation'
czidir = f'{rootdir}/czis'

In [None]:
# Start Preprocessing
os.chdir(codedir)
from mplex_image import preprocess, mpimage, cmif
preprocess.cmif_mkdir([tiffdir,qcdir,regdir,segdir,subdir])

# List slides to be processed
ls_sample = ['BC44290-146'
 ]


## 2   QC and export metadata from .czi <a name="meta"></a>

[contents](#contents)

"Level-1" i.e. original microscope image file QC is performed (check number of files + naming), changing file names if there are any typos. 
Important metadata, including exposure time (used to normalize images for autofluorescence subtraction) and
coordinates of czi scenes is exported to csv. 

FILE STRUCTURE: All images from each sample should be in subfolders named by sample ID within `czidir`.


In [None]:
from mplex_image import metadata
import javabridge
import bioformats
javabridge.start_vm(class_path=bioformats.JARS)
for s_sample in ls_sample:
    os.chdir(f'{czidir}/{s_sample}')
    #1 rename undescores to dot to match convention (done)
    d_rename = mpimage.underscore_to_dot(s_sample,s_end='.czi')
    d_rename.update({'HER2_ER':'HER2.ER'})
    preprocess.dchange_fname(d_rename) #,b_test=False)

    #2 Check files/naming
    df_img = cmif.parse_czi(f'{czidir}/{s_sample}',b_scenes=True)
    cmif.count_images(df_img)
    preprocess.check_names(df_img,s_type='czi')
    #Example: change file name and change back
    d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'} 
    preprocess.dchange_fname(d_rename)#,b_test=False)
    
    #3 Export useful imaging metadata (done)
    df_img = metadata.scene_position(f'{czidir}/{s_sample}',type='r')
    #df_img.to_csv(f'{codedir}/{s_sample}_ScenePositions.csv')
    metadata.exposure_times_slide(df_img,codedir,czidir=f'{czidir}/{s_sample}')
javabridge.kill_vm()

## 3   QC Tiff Images <a name="qc1"></a>

[contents](#contents)

Unregistered raw tiff format is 16 bit (`uint16`) single channel grayscale tiff, for example, exported from .czi using Zeiss' Zen software. 
This section QC's files/naming and generates overviews of all rounds for visual inspection.

FILE STRUCTURE: Each sample's tiff image stack should be in a separate subfolder named by sample ID within `rootdir`/RawImages. 


In [None]:
# 2 Raw tiffs: check/change names
for s_sample in ls_sample: 
    os.chdir(f'{tiffdir}/{s_sample}')
    #Example: change file name and change back
    d_rename = {'CK5R':'CK5Rename','CK5Rename':'CK5R'} 
    preprocess.dchange_fname(d_rename) #,b_test=False)
    #sort and count images 
    df_img = mpimage.parse_org(s_end = "ORG.tif",type='raw') 
    cmif.count_images(df_img[df_img.slide==s_sample])
    preprocess.check_names(df_img,s_type='tiff')

In [None]:
# 3 QC Raw tiffs: visual inspection #
preprocess.cmif_mkdir([f'{qcdir}/RawImages'])

for s_sample in ls_sample:
    os.chdir(f'{tiffdir}/{s_sample}')
    #investigate tissues
    df_img = mpimage.parse_org(s_end="ORG.tif",type='raw')
    cmif.visualize_raw_images(df_img,qcdir,color='c1')

## 4   Register Images <a name="reg"></a>

[contents](#contents)

This section registers all images to round 1 (R1), based on DAPI staining in each round. 

FILE STRUCTURE: Registered tiff images are generated and saved in `regdir` in subfolders named by sample ID and scene.


In [None]:
for s_sample in ls_sample:
    cmif.registration_python(s_sample,tiffdir,regdir,qcdir)

In [None]:
#### 4 (optional) Register Matlab ####
'''
d_register = {#'BC44290-146':{},
    'BC44290-146':{1:'[2000 7000 500 500]',2:'[1800 9000 500 500]'}, #[x y w h] (crop)
 }
ls_order = ['R1','R0','R2','R3','R4','R5','R6','R7','R8','R8Q'] 

for key,item in d_register.items():
   #run registration
   cmif.run_registration_matlab(d_register, ls_order, f'{tiffdir}/{key}', f'{regdir}/')
'''
#### 

## 5   Check Registration <a name="qcreg"></a>

[contents](#contents)

This section generates overviews of all rounds of each registered image stack, for QC purposes.


In [None]:
cmif.visualize_reg_images(f'{regdir}',qcdir,color='c1')

## 6 Create AF Subtracted Images <a name="afsub"></a>

[contents](#contents)

Images acquired of background autofluorescence by are scaled by exposure time and subtracted from the respective channel, producing a new image. `d_channel` specifies the name of the background marker to subtract from each channel. `ls_exclude` lists which markers for which AF subtraction is skipped(typically c5 images).
A companion csv listing channels and corresponding markers is generated to reflect this fully processed (i.e. level-2) image data.
 
FILE STRUCTURE: AF substracted tiff images are output to `subdir` in subfolders nemed by sample ID and scene


In [None]:
#parameters
d_channel = {'c2':'R8Qc2','c3':'R8Qc3','c4':'R8Qc4','c5':'R8Qc5'}
d_early={'c2':'R0c2','c3':'R0c3','c4':'R0c4','c5':'R0c5'}

for s_sample in ls_sample:
    preprocess.cmif_mkdir([f'{subdir}/{s_sample}'])
    os.chdir(f'{regdir}')
    for s_file in os.listdir():
        if s_file.find(s_sample) > -1:
            os.chdir(s_file)
            df_img = mpimage.parse_org()
            ls_exclude = sorted(set(df_img[df_img.color=='c5'].marker)) + ['DAPI'] + [item for key, item in d_channel.items()] + [item for key, item in d_early.items()]
            #subtract
            df_markers = cmif.autofluorescence_subtract(s_sample,df_img,f'{codedir}/data/PipelineExample',d_channel,ls_exclude,subdir=f'{subdir}/{s_sample}',d_early=d_early) #
            os.chdir('..')
#generate channel/marker metadata csv
cmif.metadata_table(regdir,segdir)

##  7  Cellpose Segmentation <a name="seg"></a>

[contents](#contents)

The generalist segmentation algorithim `cellpose` is used for nuclear and cell segmentation. Custom python/numba code matches labelled nuclei and cells that overlap from the two segmentation results.
Note: the `cellpose_segment_job` is a spawner that starts a job for each slide/scene on the server. 

FILE STRUCTURE: Labelled tiffs (`uint32`) with each pixel's grayscale value reflecting the label, are output as "Nuclei Segmentation Basins" and "Matched Cell Segmentation Basins" in `segdir` sample ID subfolders.
 


In [None]:
#change kernal to cellpose
import os
#os.chdir('/home/groups/graylab_share/OMERO.rdsStore/engje/Data/cmIF')
from mplex_image import segment
# Set Paths
#codedir = '/home/groups/graylab_share/OMERO.rdsStore/engje/Data/cmIF'
codedir = os.getcwd()
rootdir = f'{codedir}/Data/PipelineExample'
regdir = f'{rootdir}/RegisteredImages'
segdir = f'{rootdir}/Segmentation'
ls_sample = ['BC44290-146']

nuc_diam = 30
cell_diam = 30 

s_seg_markers = "['Ecad']"
s_type = 'cell' #'nuclei'#

print(f'Predicting {s_type}')
for s_sample in ls_sample:
    segment.segment_spawner(s_sample,segdir,regdir,nuc_diam,cell_diam,
                            s_type,s_seg_markers,s_job='short',s_match='both')

##  8  Extract features <a name="feat"></a>

[contents](#contents)

Extract intesity, shape and location features from each AF subtracted image. 


In [None]:
from mplex_image import features

nuc_diam = 30
cell_diam = 30 
ls_seg_markers = ['Ecad']

for s_sample in ls_sample: 
    df_sample, df_thresh = features.extract_cellpose_features(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam)
    df_sample.to_csv(f'{segdir}/features_{s_sample}_MeanIntensity_Centroid_Shape.csv')
    df_thresh.to_csv(f'{segdir}/thresh_{s_sample}_ThresholdLi.csv')

In [None]:
from mplex_image import features

nuc_diam = 30
cell_diam = 30 
ls_seg_markers = ['Ecad']
ls_membrane = ['HER2','EGFR']

for s_sample in ls_sample: 
    df_sample = features.extract_bright_features(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam,ls_membrane)
    df_sample.to_csv(f'{segdir}/features_{s_sample}_BrightMeanIntensity.csv')

## 9  Filter Data <a name="filter"></a>

[contents](#contents)

Post-processing includes filtering out lost cells based on last round DAPI staining and selection of dataframe columns based on desired biomarker sub-cellular location.



In [None]:
from mplex_image import process, features
#parameters
nuc_diam = 30
cell_diam = 30 
ls_seg_markers = ['Ecad']
s_thresh='Ecad'
ls_membrane = []
ls_marker_cyto = ['CK14','CK5','CK17','CK19','CK7','CK5R','Ecad','HER2']
ls_custom = []
ls_filter = ['DAPI8Q_nuclei','DAPI2_nuclei']
ls_shrunk = ['Vim_perinuc3']
man_thresh = 600

#filtering
for s_sample in ls_sample: 
    os.chdir(segdir)
    #replace nas, select segmentation region and filter cells negative for dapi
    df_mi_full,df_img_all = process.filter_cellpose_df(s_sample,segdir,qcdir,s_thresh,ls_membrane,ls_marker_cyto,
     ls_custom,ls_filter,ls_shrunk,man_thresh)
    break
    #Expand nuclei without matching cell labels for cells touching analysis
    se_neg = df_mi_full[df_mi_full.slide == s_sample].loc[:,f'{s_thresh}_negative']
    labels,combine,dd_result = features.combine_labels(s_sample, segdir, subdir, ls_seg_markers, nuc_diam, cell_diam, se_neg)
    process.marker_table(df_img_all,qcdir)