## REDSEA python version 0.0.1

Translated from Yunhao Bai's MATLAB code by Bokai Zhu.

Some minor difference with Yunhao's MATLAB (subject to update in future version):

1. Does not filter the positive nuclear identity (cells) (because that part of code is in "mibisegmentByDeepProbWithPerm3.m"). But can be easily added by user.

2. Does not produce the sanity plot, since it should be outside of the compensation function. OPTIONAL add later

3. Does not produce FCS file at the end. Instead produce the 4 fcs file in a matrix style (pandas format), easier for later usage

Edited by Ben Caiello, with the intent of allowing it to work with current DeepCell segmentation masks / the steinbock pipeline. The algorithm is not identical, as that is not possible (I think) given the differenes in the masks of old DeepCell and new DeepCell (+ / - zero boundaries changes the effective pixel width of the boundary measurement step).

Dataset I had been using for CD3 / CD20 compensation: https://zenodo.org/records/8023452

Passed through steinbock to derive .tiffs and masks, then fed into this script

The directory structure required for the script as-written is the one naturally produced by steinbock + the REDSEA massDS.csv file:

A master directory with two folders: \img & \masks - each containing .tiffs with the original images and the DeepCell generated masks, repectively, with matching file names - and 1 .csv file (massDS.csv)

The massDS file is the same structure as the main REDSEA package: it must contain a 'Label' column with your channel labels / numbers.

The script has not been thoroughly tested, but should produce outputs and seems to be doing what it is supposed to with the limited testing so far.

In [1]:
# Package Imports
import PIL
from PIL import Image, ImageSequence, ImageOps
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import skimage 
import skimage.measure
import skimage.morphology
import glob
from scipy.io import loadmat
import time

import tifffile
import os

In [2]:
## Functions
# helper function 1
def ismember(a, b):
    bind = {}
    for i, elt in enumerate(b):
        if elt not in bind:
            bind[elt] = i
    return [bind.get(itm, None) for itm in a]  # None can be replaced by any other "not in b" value

# helper function 2

def printProgressBar (iteration, total, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█', printEnd = "\r"):
    percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
    filledLength = int(length * iteration // total)
    bar = fill * filledLength + '-' * (length - filledLength)
    print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = printEnd)
    # Print New Line on Complete
    if iteration == total: 
        print()
        
def file_reader(mainPath,file_name):
    massDS_path = mainPath + '\\massDS.csv' # csv file location, need
    pathTiff = mainPath + '\\img\\' + file_name + '.tiff' #  tiff location, links to a single .tiff
    pathMat = mainPath + '\\masks\\' + file_name + '.tiff' # corresponding .tiff's mask
        
    ####### Read in of files:
    ### read in metadata csv
    massDS = pd.read_csv(massDS_path) # read the mass csv
    clusterChannels = massDS['Label'] # only get the label column
    #print(massDS.head())

    #### should be inside the function
    normChannelsInds = ismember(normChannels,massDS['Label'])
    channelNormIdentity = np.zeros((len(massDS['Label']),1))
    # make a flag for compensation
    for i in range(len(normChannelsInds)):
        channelNormIdentity[normChannelsInds[i]] = 1 
    
    #### this part reads in multichannel tiff file
    # read in the image and transform into a 'countsNoNoise' matrix
    array_list=[]
    for channel in clusterChannels:
        t=tifffile.imread(pathTiff)[channel]
        array_list.append(t)
    countsNoNoise=np.stack(array_list,axis=2) # count matrices in the image
    
    ###### Read in segmentation .tiff
        # Define the boundary region
        #### these code is just entire translation of redsea matlab v1.0
    Segmentation = tifffile.imread(pathMat).astype('int')
    cellNum = np.max(Segmentation) # how many labels
    stats = skimage.measure.regionprops(Segmentation+1) # get the regional props for all the labels
    channelNum = len(clusterChannels) # how many channels
    
    ### make empty container matrices
    data = np.zeros((cellNum + 1,channelNum))
    dataScaleSize = np.zeros((cellNum + 1,channelNum))
    cellSizes = np.zeros((cellNum + 1,1))
    
    # this part extract counts data from the whole cell regions, for each individual cells etc
    
    for i in range(cellNum + 1): # for each cell (label)
        label_counts=[countsNoNoise[coord[0],coord[1],:] for coord in stats[i].coords] # all channel count for this cell
        data[i,0:channelNum] = np.sum(label_counts, axis=0) #  sum the counts for this cell
        dataScaleSize[i,0:channelNum] = np.sum(label_counts, axis=0) / stats[i].area # scaled by size
        cellSizes[i] = stats[i].area # cell sizes

    return clusterChannels, channelNormIdentity, countsNoNoise, Segmentation, cellNum, channelNum, data, cellSizes, dataScaleSize

In [3]:
######## Edit these for your run!
# file locations
mainPath = 'C:\\Users\\caiello\\Desktop\\RedSEA practice\\practice' # main folder must contain \img and \masks folders (matching steinbock output) and a massDS.csv file (matching the REDSEA documentation)
file_name = 'RNANeg_Tonsil_003' #do not include the .tiff file type!
pathResults = mainPath + '/intensities' # output location, not needed. Set as intensities  to match steinbock output
try:
    os.listdir(pathResults)
except:
    os.mkdir(pathResults)

# parameters for compensation (change as desired)
REDSEAChecker = 1 # 1 means subtract+ reinforce
elementShape = 2 # star, 1 == square size
elementSize = 2 # star or square extension size

#  select which channel to normalize
normChannels = [13,18]

In [4]:
## File reader
clusterChannels, channelNormIdentity, countsNoNoise, Segmentation, cellNum, channelNum, data, cellSizes, dataScaleSize = file_reader(mainPath, file_name)

In [94]:
# this block is for computing cell cell matrix
[rowNum, colNum] = Segmentation.shape
cellPairMap = np.zeros(((cellNum + 1),(cellNum + 1))) # cell-cell shared perimeter matrix container

# start looping the mask and produce the cell-cell contact matrix
for i in range(rowNum-995):
    if i == 0:
        a = 0
        c = 2
    elif i == 999:
        a = 1
        c = 1
    else:
        a = 1
        c = 2
    for j in range(colNum):
        if j == 0:
            b = 0
            d = 2
        elif j == 999:
            b = 1
            d = 1
        else:
            b = 1
            d = 2
        tempMatrix = Segmentation[i-a:i+c,j-b:j+d] # the 3x3 window, centered on the point i,j
        #print(tempMatrix)
        tempFactors = np.unique(tempMatrix).astype('int') #unique
        #print(tempFactors)
        centerpoint_value = Segmentation[i,j]
        #print(centerpoint_value)
        for k in tempFactors:
            if k != centerpoint_value: # only add to the cellPairMap for the centerpoint pixel -- this prevents multiplicate counting
                #print("trigger")
                cellPairMap[centerpoint_value,k] = cellPairMap[centerpoint_value,k] + 1  
    
# converting the cell cell maps to fraction of cell - cell boundary (not of total cell boundary [!?] -- as in boundary with empty space not counted)
cellPairNorm = np.zeros(((cellNum+1),(cellNum+1)))
for i in np.arange(0,len(cellPairMap)):
    if np.sum(cellPairMap[i]) > 0:
        cellPairNorm[i] = - (cellPairMap[i] / np.sum(cellPairMap[i]))
cellPairNorm = cellPairNorm[1:,1:]
cellPairNorm = cellPairNorm + REDSEAChecker*np.identity(cellNum ) 


0
100
200
300
400
500
600
700
800
900


In [178]:
# converting the cell cell maps to fraction of total cell boundary
cellPairNorm = np.zeros(((cellNum),(cellNum)))
for i in np.arange(0,len(cellPairMap)):
    if np.sum(cellPairMap[i]) > 0:
        cellPairNorm[i] = - (cellPairMap[i] / np.sum(cellPairMap[i]))
cellPairNorm = cellPairNorm + REDSEAChecker*np.identity(cellNum ) 

In [179]:
# now starts the calculation of signals from pixels along the boudnary of cells
MIBIdataNearEdge1 = np.zeros((cellNum+1,channelNum))

# start the boundary region selection and count extraction

##### A List of Items
items = list(range(cellNum))
l = len(items)
printProgressBar(0, l, prefix = 'Progress:', suffix = 'Complete', length = 50) # progress bar
#####

######pre-calculated shape
if elementShape==1: # square
    square=skimage.morphology.square(2*elementSize+1)
    square_loc=np.where(square==1)
elif elementShape==2: # diamond
    diam=skimage.morphology.diamond(elementSize) # create diamond shapte based on elementSize
    diam_loc=np.where(diam==1)
else:
    print("Error elementShape Value not recognized.")
############

for i in range(cellNum):
    [tempRow,tempCol] = np.where(Segmentation==i)
    # sequence in row not col, should not affect the code
    for j in range(len(tempRow)):
        label_in_shape=[] # empty list in case
        # make sure not expand outside
        if (elementSize-1<tempRow[j]) and (tempRow[j]<rowNum-elementSize-2) and (elementSize-1<tempCol[j]) and (tempCol[j]<colNum-elementSize-2):
            ini_point = [tempRow[j]-elementSize,tempCol[j]-elementSize] # corrected top-left point
        
            if elementShape==1: # square
                square_loc_ini_x=[item + ini_point[0] for item in square_loc[0]]
                square_loc_ini_y=[item + ini_point[1] for item in square_loc[1]]
                
                label_in_shape=[Segmentation[square_loc_ini_x[k],square_loc_ini_y[k]] for k in range(len(square_loc_ini_x))]
                
            elif elementShape==2: # diamond
                diam_loc_ini_x=[item + ini_point[0] for item in diam_loc[0]]
                diam_loc_ini_y=[item + ini_point[1] for item in diam_loc[1]]
                # finish add to ini point
            
                label_in_shape=[Segmentation[diam_loc_ini_x[k],diam_loc_ini_y[k]] for k in range(len(diam_loc_ini_x))]
        is_border_px = len(np.unique(label_in_shape))
        
        if is_border_px > 1:
            MIBIdataNearEdge1[i,:] = MIBIdataNearEdge1[i,:] + countsNoNoise[tempRow[j],tempCol[j],:]
    
    # Update Progress Bar
    printProgressBar(i + 1, l, prefix = 'Progress:', suffix = 'Complete', length = 50)

Progress: |██████████████████████████████████████████████████| 100.0% Complete


In [180]:
## some final formatting

MIBIdataNorm2 = np.transpose(np.dot(np.transpose(MIBIdataNearEdge1[1:,:]),cellPairNorm))
#this is boundary signal subtracted by cell neighbor boundary
MIBIdataNorm2 = MIBIdataNorm2 + data[1:,:] # reinforce onto the whole cell signal (original signal)

MIBIdataNorm2[MIBIdataNorm2<0] = 0 # clear out the negative ones
# flip the channelNormIdentity for calculation
rev_channelNormIdentity=np.ones_like(channelNormIdentity)-channelNormIdentity
# composite the normalized channels with non-normalized channels
# MIBIdataNorm2 is the matrix to return
MIBIdataNorm2 = data[1:,:] * np.transpose(np.tile(rev_channelNormIdentity,(1,cellNum))) + MIBIdataNorm2 * np.transpose(np.tile(channelNormIdentity,(1,cellNum)))

# the function should return 4 variables
# matrix 
dataCells = data[1:,:]
dataScaleSizeCells = dataScaleSize[1:,:]
dataCompenCells = MIBIdataNorm2
dataCompenScaleSizeCells = MIBIdataNorm2 / cellSizes[1:,:]

In [181]:
pd.DataFrame(MIBIdataNorm2)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
0,1.000000,78.828690,12.615225,10.586385,10.444904,4.314006,1.000000,13.590752,9.477484,7.092767,...,5.521774,326.012238,27.469957,1.000000,2.502252,65.404167,101.510239,46.205772,349.782501,27.799742
1,2.238261,48.568752,34.356354,15.317507,10.116325,10.347881,2.508494,25.276258,3.728220,12.830654,...,2.000000,31.105236,3.788799,3.259425,3.904353,70.878876,127.610497,62.491692,324.343109,27.380899
2,0.000000,81.393707,38.840900,25.010725,15.143080,12.755048,2.351359,44.119675,10.092233,13.211883,...,3.777344,59.632507,129.460630,7.681790,5.605404,151.992416,331.484283,42.435482,337.694916,45.108646
3,3.377076,50.418316,38.201962,39.593826,14.107013,20.665142,8.682616,46.688625,11.928615,13.528472,...,12.557582,829.166992,0.000000,4.591694,7.306455,154.755600,299.676147,46.029228,328.386169,51.463367
4,0.000000,14.487697,39.902687,17.820940,7.289336,15.570030,3.000000,34.761581,12.915440,15.391790,...,8.686342,93.085739,26.011152,2.664537,2.435485,117.412102,200.876099,52.715302,772.614197,34.189800
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13749,1.463400,32.101341,30.793402,24.400457,9.151209,17.819540,7.756796,52.625019,7.026352,18.249481,...,12.035808,190.909912,27.988697,2.000000,4.169919,224.819717,401.608032,36.694202,150.810715,45.278805
13750,1.000000,11.589388,19.235193,5.817363,2.617914,35.178493,2.000000,9.232656,3.614749,5.398106,...,2.876085,884.055115,13.209761,1.000000,1.101651,38.677975,60.653595,10.021107,44.378403,7.650525
13751,2.223242,12.349504,35.774395,21.579807,12.742961,17.634525,3.665629,26.570738,7.000000,18.816246,...,11.590495,918.474487,57.320562,5.050480,8.937737,488.315155,876.028442,57.053375,201.684067,44.232994
13752,1.000000,23.426706,37.320736,20.365849,6.942724,15.570030,1.665629,35.171719,15.218881,17.665155,...,7.817998,97.477692,18.159078,2.000000,2.368718,118.646683,232.555679,35.117500,183.426834,22.206812


In [195]:
# create the final matrixs ( 4 types of them)
labelVec = [i for i in np.arange(1,cellNum + 1,1)]

# get cell sizes
cellSizesVec_flat = [item for sublist in cellSizes[1:,:] for item in sublist] # flat the list

# produce the matrices

## first dataframe
dataL = pd.DataFrame({'Object':labelVec, 'cell_size':cellSizesVec_flat})
dataCells_df=pd.DataFrame(dataCells)
dataCells_df.columns = clusterChannels
dataL_full = pd.concat((dataL,dataCells_df),axis=1)
### second
dataScaleSizeL_df=pd.DataFrame(dataScaleSizeCells)
dataScaleSizeL_df.columns = clusterChannels
dataScaleSizeL_full = pd.concat((dataL,dataScaleSizeL_df),axis=1)
### third
dataCompenL_df=pd.DataFrame(dataCompenCells)
dataCompenL_df.columns = clusterChannels
dataCompenL_full = pd.concat((dataL,dataCompenL_df),axis=1)
### forth
dataCompenScaleSizeL_df=pd.DataFrame(dataCompenScaleSizeCells)
dataCompenScaleSizeL_df.columns = clusterChannels
dataCompenScaleSizeL_full = pd.concat((dataL,dataCompenScaleSizeL_df),axis=1)

In [196]:
dataScaleSizeL_full.head() # orignal counts extracted from tiff files, but scaled by cell size

Unnamed: 0,Object,cell_size,0,1,2,3,4,5,6,7,...,16,17,18,19,20,21,22,23,24,25
0,1,16.0,0.0625,4.926793,0.788452,0.661649,0.652807,0.269625,0.0625,0.849422,...,0.345111,20.375765,8.192933,0.0625,0.156391,4.08776,6.34439,2.887861,21.861406,1.737484
1,2,14.0,0.159876,3.469197,2.454025,1.094108,0.722595,0.739134,0.179178,1.805447,...,0.142857,2.221802,4.049692,0.232816,0.278882,5.062777,9.115035,4.463692,23.167364,1.955778
2,3,25.0,0.0,3.255748,1.553636,1.000429,0.605723,0.510202,0.094054,1.764787,...,0.151094,2.3853,9.33205,0.307272,0.224216,6.079697,13.259372,1.697419,13.507796,1.804346
3,4,30.0,0.112569,1.680611,1.273399,1.319794,0.470234,0.688838,0.289421,1.556288,...,0.418586,27.638899,2.014158,0.153056,0.243548,5.15852,9.989205,1.534308,10.946206,1.715446
4,5,22.0,0.0,0.658532,1.813758,0.810043,0.331333,0.707729,0.136364,1.580072,...,0.394834,4.23117,1.833733,0.121115,0.110704,5.336914,9.130732,2.39615,35.118828,1.554082


In [197]:
dataCompenScaleSizeL_full.head() # redsea compensated counts, but scaled by cell size

Unnamed: 0,Object,cell_size,0,1,2,3,4,5,6,7,...,16,17,18,19,20,21,22,23,24,25
0,1,16.0,0.0625,4.926793,0.788452,0.661649,0.652807,0.269625,0.0625,0.849422,...,0.345111,20.375765,1.716872,0.0625,0.156391,4.08776,6.34439,2.887861,21.861406,1.737484
1,2,14.0,0.159876,3.469197,2.454025,1.094108,0.722595,0.739134,0.179178,1.805447,...,0.142857,2.221803,0.270629,0.232816,0.278882,5.062777,9.115035,4.463692,23.167365,1.955779
2,3,25.0,0.0,3.255748,1.553636,1.000429,0.605723,0.510202,0.094054,1.764787,...,0.151094,2.3853,5.178425,0.307272,0.224216,6.079697,13.259371,1.697419,13.507797,1.804346
3,4,30.0,0.112569,1.680611,1.273399,1.319794,0.470234,0.688838,0.289421,1.556288,...,0.418586,27.6389,0.0,0.153056,0.243548,5.15852,9.989205,1.534308,10.946206,1.715446
4,5,22.0,0.0,0.658532,1.813759,0.810043,0.331333,0.707729,0.136364,1.580072,...,0.394834,4.23117,1.182325,0.121115,0.110704,5.336914,9.130732,2.39615,35.118827,1.554082


In [199]:
###output scaled files
dataScaleSizeL_full.to_csv(pathResults + '\\' + file_name + "REDSEA_pre_compensation.csv", index = False)
dataCompenScaleSizeL_full.to_csv(pathResults + '\\' + file_name + ".csv", index = False)