# Comparison of Standard IF to Cyclic IF <a name="contents"></a>

**Question:** How does the signal-to-background ratio and staining specificity compare in standard IF vs. cyclic IF?

**Samples:**

[Single vs Cyclic](#svc)
- Tissue ID 44290: HER2+/ER+ breast cancer. Section 112 stained with a 5 round cyclic IF protocol, sections 113 to 116 stained with the same antibodies in a standard IF protocol.
- Tissue ID 44294: Adjacent normal breast from the patient above. Section 116 is cyclic, section 117 to 120 are standard IF.

**Method**: For each stain, pixel intensity was manually thresholded to separate the positive pixels from the negative, in Extended1_single_vs_cyclic.ipynb notebook. We also automatically estimated dynamic range using the 5th and 98th percentile of single cell intensity in the tissue, plotted dyamic range, and compared SBR and estimated dynamic range.

[Tissue Loss](#tissue)
- Biomax TMA 808L2: multiple tissue tumor and normal, for tissue loss analysis



In [None]:
#load libraries
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import sys
import numpy as np
import os
from scipy.stats.stats import pearsonr
import itertools
from scipy import stats
import seaborn as sns
matplotlib.rcParams.update({'font.size': 16})

## Notes 44290 (HER2+ Breast Cancer)
The single stains have much worse edge effect.
Define 2 ROIs with similar composition, avoiding edge regions. (x,y,w,h)

44290-112: (cyclic)
- (3200,5500,1500,1500) ROI1 large tumor nest upper left
- (5000,8800,1500,1800) ROI2 large tumor nests lower right
- only two regions in 44290 are suitable, due to autofluorescent regions that are different between single and cyclic stains. 

44290-113 (single PCNA, HER2, ER, CD45)
- (800,3050,1500,1500)
- (2650,6400,1500,1800)

44290-114 (single pHH3, CK14, CD44, CK5)
- (900,2900,1500,1500)
- (2650,6250,1500,1800)

44290-115 (single Vim, CK7, PD1, Lamin A/C)
- (900,2950,1500,1500)
- (2850,6200,1500,1800)

44290-116 (single aSMA, CD68, Ki67, Ecad)
- (850,3000,1500,1500)
- (2650,6250,1500,1800)

## Notes 44294 (Normal Breast)
The single stains on normal tissue don't have such bad edge effect, although they do have to autoflourescence in the DAPI negative areas (i.e. middle of the duct). We can use the whole segmented images, as segmentation based on DAPI will ignore most of the worst autofluorescence areas.

44294-116 was not cropped before segmentation, while the other tissues were.
Therefore, I will use an ROI to make 44294-116 the same size as the other tissues.

44294-116:
- (7200,3580,12725,12365) (scene001)

## Load Data <a name="svc"></a>

For single versus cyclic

[contents](#contents)

In [None]:
#set location of files
#os.chdir('/home/groups/graylab_share/OMERO.rdsStore/engje/Data/cycIF_ValidationStudies/cycIF_Validation')
codedir = os.getcwd()

In [None]:
#import dataframes with mean intensity/centroid from segmentation and feature extraction
ls_slide =  ['44290-112', '44290-113', '44290-114', '44290-115', '44290-116', 
             '44294-116', '44294-117', '44294-118', '44294-119', '44294-120']
df_mi = pd.DataFrame()
for s_slide in ls_slide:
    df_mi = df_mi.append(pd.read_csv(f'{codedir}/Data/features_{s_slide}_singlevscyclic.csv',index_col=0))
#add the slide/scene information 
df_mi['scene'] = [item.split('_')[1] for item in df_mi.index]
df_mi['slide'] = [item.split('_')[0] for item in df_mi.index]

In [None]:
#load threshold, SNR data
rootdir = '/home/groups/graylab_share/OMERO.rdsStore/engje/Data/cycIF_ValidationStudies/cycIF_Validation'
df_t = pd.read_csv(f'{rootdir}/Metadata/44290/SNR_single_vs_cyclic.csv',index_col=0)

#load metadata from feature extraction
df_m = pd.read_csv(f'{rootdir}/Metadata/44290/metadata_single_vs_cyclic.csv',index_col=0)
k=1
df_sp = pd.DataFrame()
for s_thresh in ['minimum','minimum90','minimum110']: #
    filename = f'{rootdir}/Data/single_vs_cyclic_SBR_SP{k}_{s_thresh}.csv'
    if os.path.exists(filename):
        df = pd.read_csv(filename,index_col='filename')
        df['threshold_level'] = s_thresh
        df_sp = df_sp.append(df)
#alternative regions
df_alt = pd.DataFrame()
for s_type in ['tum','norm']:
    df = pd.read_csv(f'{rootdir}/Data/single_vs_cyclic_SBR_SP1_minimum_alt_{s_type}.csv', index_col='filename')
    df['tum_norm'] = s_type
    df_alt = df_alt.append(df)

In [None]:
#import dataframes with mean intensity/centroid from segmentation and feature extraction
#df_mi =pd.read_csv(f'{codedir}/BigData/features_singlevscyclic.csv',index_col=0)
#df_mi['slide'] = [item.split('_')[0] for item in df_mi.index]
#for s_slide in sorted(set(df_mi.slide)):
#    df_mi[df_mi.slide==s_slide].to_csv(f'{codedir}/Data/features_{s_slide}_singlevscyclic.csv')

In [None]:
#annotation

df_t['experiment'] = [item.split('_')[0] for item in df_t.condition]
df_t['experimenttype'] = [item.split('_')[2] for item in df_t.condition]
df_t = df_t.sort_values(['color','rounds'])

#calculate actual signal (ie measure signal minus background)
df_t['meanpos'] = df_t.meanpos - df_t.meanneg

#add info
print(len(df_sp))
print(len(df_t))
df_sp = df_sp.merge(df_t.loc[:,['experiment', 'experimenttype','exposure', 'refexp','minimum']],left_index=True,right_index=True)
df_sp = df_sp.reset_index().rename({'level_0':'filename'},axis=1).drop(['Unnamed: 0','index'],axis=1)
print(len(df_sp))
df_alt = df_alt.merge(df_t.loc[:,['experiment', 'experimenttype','exposure', 'refexp','minimum']],left_index=True,right_index=True)
df_alt = df_alt.reset_index().rename({'level_0':'filename'},axis=1).drop(['Unnamed: 0','index'],axis=1)

## super pixel SBR


the authors talk about “quantification of the signal” using a threshold, but this is not quantification of the signal, it’s a count of cells considered as “positive” without taking in consideration the full range of value of the signal

- compare SBR and estimated dynamic range

how generalizable thresholds are to other regions of the images

- use superpixels to split image into regions

- test another ROI in normal breast

In [None]:
#calculate SBR
for s_measure in ['mean_intensity','quartiles-0','quartiles-1','quartiles-2','quartiles-3','quartiles-4']:
    df_sp.loc[:,f'{s_measure}_SBR'] = df_sp.loc[:,f'{s_measure}_fg']/df_sp.loc[:,'mean_intensity_bg']
df_sp['dynamic_range_SBR'] = df_sp.loc[:,'quartiles-4_SBR'] - df_sp.loc[:,'quartiles-0_SBR']
df_sp['dynamic_range'] = df_sp.loc[:,'quartiles-4_fg'] - df_sp.loc[:,'quartiles-0_fg'] #
df_sp['tissue_label'] = df_sp.experimenttype + '_' + df_sp.label.astype('str')

In [None]:
df_sp.columns
ls_order_sp = ['LaminAC','CK14','ER','CD45','aSMA','PD1','CD68',
               'PCNA', 'Her2','CK5', 'CD44','Vim','Ki67',  'Ecad',
             'CK7',  
            ]

In [None]:
#plot
d_measure = {'mean_intensity_SBR':'SBR',
    'dynamic_range':'Dynamic Range',
    'mean_intensity_fg':'Mean Intensity',
            'mean_intensity_bg':'Background Intensity'
            }
d_result = {}
d_plot = {}
for tu_level in [('minimum','minimum')]:#
    for s_measure, s_name in d_measure.items():   
        df_result = pd.DataFrame()
        df_plot = pd.DataFrame()
        for s_marker in df_sp.marker.unique():
            #s_marker = 'CK7'
            df_marker = df_sp[df_sp.marker==s_marker].sort_values(by='experimenttype')
            if not len(df_marker==2):
                print(s_marker)
            df = df_marker[df_marker.experiment=='cyclic'].merge(df_marker[df_marker.experiment=='single'],on='tissue_label',suffixes=('_cyclic','_single'))
            d_rename = dict(zip([i for i in range(len(df))],[item.split('_')[0] for item in df.tissue_label]))
            se_sing = df.loc[df.loc[:,'threshold_level_single']==tu_level[0],f'{s_measure}_single'].rename(d_rename)
            se_cyc = df.loc[df.loc[:,'threshold_level_cyclic']==tu_level[1],f'{s_measure}_cyclic'].rename(d_rename)

            df_plot[s_marker] = (se_sing/se_cyc)
            se_sing.name = f'{s_marker}_single'
            se_cyc.name = f'{s_marker}_cyclic'
            df_result=df_result.append(se_sing)
            df_result=df_result.append(se_cyc)
            #break
        df_plot.drop(['CD20','CD8','CD4','CK19','pH3'],axis=1,inplace=True)
        #barplot
        s_title = f'Relative {s_name}: Superpixel {k},{tu_level[0].replace("minimum","")}'.replace('Superpixel 1,','Std./CyCIF')
        s_ylabel = f"Std./CycIF Intensity"
        s_figname =f'{rootdir}/Figures/{s_measure}_Ratio_Single-Cyclic_0-3_manual_suppix{k}_{tu_level[0]}.png'
        fig, ax = plt.subplots(figsize=(7,4),dpi=300)
        sns.barplot(data=df_plot,ax=ax,palette='muted',order=ls_order_sp,ci='sd')
        sns.stripplot(data=df_plot,ax=ax,palette='dark',order=ls_order_sp)
        ax.set_ylabel(s_ylabel)
        ax.set_xlabel("Marker")
        ax.set_ylim(0,3.5)
        ax.axhline(1,color='black')
        labels=ax.get_xticklabels()
        ax.set_xticklabels(labels, rotation=90)
        ax.set_title(s_title)
        plt.tight_layout()
        fig.savefig(s_figname)
        if s_measure == 'mean_intensity_SBR':
            ax.set_ylim(0,2.4)
            ax.set_title(s_title) #,fontweight='bold',pad=10
            fig.savefig(s_figname.replace('.png','_zoom.png'))
        d_result.update({s_measure:df_result})
        d_plot.update({s_measure:df_plot})
        #break
        

In [None]:
#alternative region
ls_col = ['CD44', 'CD45', 'CK14', 'CK5', 'CK7', 'ER', 'Ecad','LaminAC', 'aSMA','Vim',] #good markers in norm breast
for s_measure in ['mean_intensity','quartiles-0','quartiles-1','quartiles-2','quartiles-3','quartiles-4']:
    df_alt.loc[:,f'{s_measure}_SBR'] = df_alt.loc[:,f'{s_measure}_fg']/df_alt.loc[:,'mean_intensity_bg']
df_alt['dynamic_range_SBR'] = df_alt.loc[:,'quartiles-4_SBR'] - df_alt.loc[:,'quartiles-0_SBR']
df_alt['dynamic_range'] = df_alt.loc[:,'quartiles-4_fg'] - df_alt.loc[:,'quartiles-0_fg'] #
df_alt['tissue_label'] = df_alt.experimenttype + '_' + df_alt.label.astype('str')

d_marker = {'normal':['CD44', 'CD45', 'CD68', 'CK14', 'CK5', 'CK7', 'ER', 'Ecad', 'Ki67',
       'LaminAC', 'PCNA',  'Vim', 'aSMA'], #'PD1',
            'tumor':['CD44', 'CD45', 'CD68',  'CK7',  'Ecad',  'Ki67', #'Her2',
       'LaminAC', 'PCNA', 'PD1', 'Vim','aSMA' ]#'ER', #pd1 has some lamAC bleed through
           }
for s_type in ['normal','tumor']:
    df_type = df_alt[df_alt.experimenttype == s_type]
    ls_col = d_marker[s_type]
    for s_measure in ['mean_intensity_fg','mean_intensity_bg']: #'dynamic_range','mean_intensity_SBR',
        print(s_measure)
        df_plot_alt = pd.DataFrame()
        for s_marker in ls_col: #ls_marker:
            df_marker = df_type[df_type.marker==s_marker]
            df = df_marker[df_marker.experiment=='cyclic'].merge(df_marker[df_marker.experiment=='single'],on='tissue_label',suffixes=('_cyclic','_single'))
            d_rename = dict(zip([i for i in range(len(df))],[item.split('_')[0] for item in df.tissue_label]))
            se_sing = df.loc[:,f'{s_measure}_single'].rename(d_rename)
            se_cyc = df.loc[:,f'{s_measure}_cyclic'].rename(d_rename)
            df_plot_alt[s_marker] = (se_sing/se_cyc)
        df_plot = d_plot[s_measure]        
        df=pd.DataFrame()
        df['ROI1'] = df_plot.loc[s_type,ls_col]
        df['ROI2'] = df_plot_alt.loc[s_type,ls_col]
        fig,ax=plt.subplots(figsize=(6,4),dpi=300)#
        sns.scatterplot(x='ROI1', y='ROI2', data=df,hue=df.index,palette='tab20', ax=ax)
        ax.legend(ncol=2,bbox_to_anchor=(1,1.05),fontsize=12,title='Marker')
        #ax.set_ylim(0,2.4)
        #ax.set_xlim(0,2.4)
        r, pvalue = pearsonr(x=df.ROI1, y=df.ROI2)
        ax.set_title(f'{d_measure[s_measure]} {s_type}\n R = {r:.02}\n p = {pvalue:.02}')
        plt.tight_layout()
        fig.savefig(f'./Figures/Scatterplot_single_vs_cyclic_alt_{s_measure}_{s_type}.png')
        print(r)
        print(pvalue)
        #break
    #break

In [None]:
for s_measure, df_result in d_result.items(): 
    df_result.rename({0:'tumor',1:'normal'},axis=1,inplace=True)
    try:
        df_result.drop(['pH3_single','pH3_cyclic'],inplace=True)
    except:
        print('')
    df = df_result.unstack().reset_index().dropna().rename({'level_0':'tissue','level_1':'marker_type',0:'SBR'},axis=1)
    df['type'] = [item.split('_')[1] for item in df.marker_type]
    df['marker'] = [item.split('_')[0] for item in df.marker_type]
    df['marker_tissue'] = df.marker + '_' + df.tissue
    #plot
    df_plot = df[df.type=='single'].loc[:,['marker_tissue','SBR']].merge(df[df.type=='cyclic'].loc[:,['marker_tissue','SBR']],on='marker_tissue',suffixes=('_single','_cyclic'))
    df_plot['marker_tissue'] = df_plot.marker_tissue.replace({'CK5_tumor':'CK5_normal'})
    df_plot['marker'] = [item.split('_')[0] for item in df_plot.marker_tissue]
    df_plot['tissue'] = [item.split('_')[1] for item in df_plot.marker_tissue]
    df_plot.sort_values('marker_tissue',inplace=True)
    df_plot.marker = df_plot.marker.replace({'Her2':'HER2'})
    fig,ax=plt.subplots(figsize=(5.5,4),dpi=300)#
    sns.scatterplot(x='SBR_cyclic', y='SBR_single', data=df_plot,hue='marker',palette='tab20', ax=ax)#,s=15
    ax.legend(ncol=2,bbox_to_anchor=(1,1.05),fontsize=12,title='Marker')
    r, pvalue = pearsonr(x=df_plot.SBR_cyclic, y=df_plot.SBR_single)
    ax.set_ylabel('Standard IF')
    ax.set_xlabel('CyCIF')
    ax.set_ylim(0,22)
    ax.set_xlim(0,22)
    ax.set_title(f'{d_measure[s_measure]}\n R = {r:.02}\n p = {pvalue:.02}')
    plt.tight_layout()
    fig.savefig(f'./Figures/Scatterplot_single_vs_cyclic_{s_measure}.png')
    break

# Percent Positive

Thresholds were set on the images, using the pattern of staining to set an intensity value above which pixels were considered positive for a stain.

Here, the same theresholds are applied to the single cell mean intensity values. If a cell has a mean intensity above the pixel threshold, it is considered positive. Percent of positive cells over all segmented cells is calculated for each marker.

In [None]:

#cropping used for images (different from the features)
#new 20191209
d_crop_image = {'44290-112':(5000,8800,1300,1800),
 '44290-113':(5886,8111,1300,1800),
 '44290-114':(5010,9242,1300,1800),
 '44290-115':(5975,10490,1300,1800),
 '44290-116':(6336,9174,1300,1800),
 '44294-116':(9547,5459,1300,1800),
 '44294-117':(12180,6092, 1300,1800),
 '44294-118':(10901, 6582, 1300,1800),
 '44294-119':(9853,5557, 1300,1800),
 '44294-120':(10250,5510, 1300,1800),
 }
d_crop_feat = {'44290-112':(5000,8800,1300,1800),
 '44290-113':(2650,6400,1300,1800),
 '44290-114':(2650,6250,1300,1800),
 '44290-115':(2850,6200,1300,1800),
 '44290-116':(2650,6250,1300,1800),
 '44294-116':(7200,3580,12725,12365),
 '44294-117':(0,0,0,0),
 '44294-118':(0,0,0,0),
 '44294-119':(0,0,0,0),
 '44294-120':(0,0,0,0),
 }
#define rois (see notes above about ROIs to exclude autofluorescence)
df_rois = pd.DataFrame()
for s_tissue, t_coord in d_crop_feat.items():
    #print(s_tissue)
    if t_coord == (0,0,0,0):
        df_scene = df_mi[df_mi.slide==s_tissue]
    else:
        x_min = t_coord[0]
        x_max = t_coord[0] + t_coord[2]
        y_min = t_coord[1]
        y_max = t_coord[1] + t_coord[3]
        df_scene = df_mi[df_mi.slide==s_tissue]
        df_scene = df_scene.loc[(x_max > df_scene.DAPI_X) & (df_scene.DAPI_X > x_min)]
        df_scene = df_scene.loc[(y_max > df_scene.DAPI_Y) & (df_scene.DAPI_Y > y_min)]
    df_rois = df_rois.append(df_scene)

In [None]:
#calculate positive cells based on thresholding the mean intensity dataframe

#empty dataframe
df_pos = pd.DataFrame()

#for each sample
for s_index in df_t.index:
    s_exp = df_t.loc[s_index,'experiment']
    s_tissue = df_t.loc[s_index,'scene']
    s_marker = df_t.loc[s_index,'marker']
    s_type = df_t.loc[s_index,'experimenttype']
    df_m_slide = df_m[(df_m.slide==s_tissue) & (df_m.marker == s_marker)]
    if len(df_m_slide) !=1:
        print(df_m_slide)
    #when the features were extracted we had some exposure normalization ... need to undo
    i_thresh = df_t.loc[s_index,'minimum']/(df_m_slide.loc[:,'refexp']/df_m_slide.loc[:,'exposure'])
    s_marker_loc = df_rois.columns[pd.Series([item.split('_')[0]==s_marker for item in df_rois.columns])][0]
    df_pos.loc[s_marker,f'{s_exp}_{s_type}'] = (df_rois[df_rois.slide==s_tissue].loc[:,s_marker_loc] > i_thresh[0]).sum()
   

In [None]:
#generate dataframe for plotting
#tumor
df_percent = pd.DataFrame()
df_percent['tumor'] = (df_pos.dropna(axis=0).single_tumor/df_pos.dropna(axis=0).cyclic_tumor).fillna(1)
#normal
df_percent['normal'] = (df_pos.dropna(axis=0).single_normal/df_pos.dropna(axis=0).cyclic_normal).fillna(1)



df_tum = pd.DataFrame(df_percent.unstack().tumor, columns=['value'])
df_norm = pd.DataFrame(df_percent.unstack().normal, columns=['value'])
df_tum['type'] = 'tumor'
df_norm['type'] = 'normal'
df_tum['marker'] = df_tum.index.tolist()
df_norm['marker'] = df_norm.index.tolist()
df_longer = df_tum.append(df_norm)
df_longer = df_longer.drop('pH3')

ls_order = df_tum.index

#stats
print(df_longer.groupby('marker').mean().mean())
print(df_longer.groupby('marker').mean().std())


In [None]:
#barplot

s_title = 'Relative % Pos. of Standard versus CycIF'
s_ylabel = "Std./CycIF % Pos."
s_figname =f'{codedir}/Figures/Percent_Positive_Ratio_Single-Cyclic.png'
fig, ax = plt.subplots(figsize=(7,4),dpi=300)
sns.barplot(data=df_longer, x='marker', y='value', palette = 'muted',ci='sd')
sns.stripplot(data=df_longer, x='marker', y='value', palette = 'dark')
ax.set_ylabel(s_ylabel)

ax.set_xlabel("Marker")
ax.axhline(1,color='black')

labels=ax.get_xticklabels()
ax.set_xticklabels(labels, rotation=90)
ax.set_title(s_title)

plt.tight_layout()
fig.savefig(s_figname)


In [None]:
# df = df_result.unstack().reset_index().dropna().rename({'level_0':'tissue','level_1':'marker_type',0:'SBR'},axis=1)
df = df_pos.unstack().reset_index().dropna().rename({'level_0':'type_tissue','level_1':'marker',0:'Positive'},axis=1)
df = df[df.marker!='pH3']   

In [None]:
df['type'] = [item.split('_')[0] for item in df.type_tissue]
df['tissue'] = [item.split('_')[1] for item in df.type_tissue]
df['marker_tissue'] = df.marker + '_' + df.tissue
df_plot = df[df.type=='single'].loc[:,['marker_tissue','Positive']].merge(df[df.type=='cyclic'].loc[:,['marker_tissue','Positive']],on='marker_tissue',suffixes=('_single','_cyclic'))
df_plot['marker_tissue'] = df_plot.marker_tissue.replace({'CK5_tumor':'CK5_normal'})  
df_plot['marker'] = [item.split('_')[0] for item in df_plot.marker_tissue]
df_plot['tissue'] = [item.split('_')[1] for item in df_plot.marker_tissue]
df_plot.sort_values('marker_tissue',inplace=True)

In [None]:
fig,ax=plt.subplots(figsize=(3,4),dpi=300)
sns.scatterplot(x='Positive_cyclic', y='Positive_single', data=df_plot,hue='marker',palette='muted', ax=ax)
ax.legend(ncol=2,bbox_to_anchor=(1,1.05),fontsize=10,title='Marker')
ax.get_legend().remove()
r, pvalue = pearsonr(x=df_plot.Positive_cyclic, y=df_plot.Positive_single)
ax.set_ylabel('Standard IF')
ax.set_xlabel('CyCIF')
ax.ticklabel_format(axis='both',style='sci',scilimits=(0,0))
ax.set_title(f'Number Positive \n R = {r:.02}\n p = {pvalue:.02}',pad=20)
plt.tight_layout()
fig.savefig(f'./Figures/Scatterplot_single_vs_cyclic_Positive.png')

In [None]:
df

In [None]:
sns.histplot(df.Positive)

In [None]:
#calculate percent positive cells based on thresholding the mean intensity dataframe

#empty dataframe
df_pos = pd.DataFrame()

#for each sample
for s_index in df_t.index:
    s_exp = df_t.loc[s_index,'experiment']
    s_tissue = df_t.loc[s_index,'scene']
    s_marker = df_t.loc[s_index,'marker']
    s_type = df_t.loc[s_index,'experimenttype']
    df_m_slide = df_m[(df_m.slide==s_tissue) & (df_m.marker == s_marker)]
    if len(df_m_slide) !=1:
        print(df_m_slide)
    #when the features were extracted we had some exposure normalization ... need to undo
    i_thresh = df_t.loc[s_index,'minimum']/(df_m_slide.loc[:,'refexp']/df_m_slide.loc[:,'exposure'])
    s_marker_loc = df_rois.columns[pd.Series([item.split('_')[0]==s_marker for item in df_rois.columns])][0]
    df_pos.loc[s_marker,f'{s_exp}_{s_type}'] = (df_rois[df_rois.slide==s_tissue].loc[:,s_marker_loc] > i_thresh[0]).sum()/len(df_rois[df_rois.slide==s_tissue])
    #break


In [None]:
# df = df_result.unstack().reset_index().dropna().rename({'level_0':'tissue','level_1':'marker_type',0:'SBR'},axis=1)
df = df_pos.unstack().reset_index().dropna().rename({'level_0':'type_tissue','level_1':'marker',0:'Positive'},axis=1)
df = df[df.marker!='pH3']   
df['type'] = [item.split('_')[0] for item in df.type_tissue]
df['tissue'] = [item.split('_')[1] for item in df.type_tissue]
df['marker_tissue'] = df.marker + '_' + df.tissue
df_plot = df[df.type=='single'].loc[:,['marker_tissue','Positive']].merge(df[df.type=='cyclic'].loc[:,['marker_tissue','Positive']],on='marker_tissue',suffixes=('_single','_cyclic'))
df_plot['marker_tissue'] = df_plot.marker_tissue.replace({'CK5_tumor':'CK5_normal'})  
df_plot['marker'] = [item.split('_')[0] for item in df_plot.marker_tissue]
df_plot['tissue'] = [item.split('_')[1] for item in df_plot.marker_tissue]
df_plot.sort_values('marker_tissue',inplace=True)

In [None]:
fig,ax=plt.subplots(figsize=(3.2,4),dpi=300)
sns.scatterplot(x='Positive_cyclic', y='Positive_single', data=df_plot,hue='marker',palette='muted', ax=ax)
ax.legend(ncol=2,bbox_to_anchor=(1,1.05),fontsize=10,title='Marker')
ax.get_legend().remove()
r, pvalue = pearsonr(x=df_plot.Positive_cyclic, y=df_plot.Positive_single)
ax.set_ylabel('Standard IF')
ax.set_xlabel('CyCIF')
#ax.ticklabel_format(axis='both',style='sci',scilimits=(0,0))
ax.set_title(f'Fraction Positive \n R = {r:.02}\n p = {pvalue:.02}',pad=20)
ax.set_xlim(0,1)
ax.set_ylim(0,1)
plt.tight_layout()
#fig.savefig(f'./Figures/Scatterplot_single_vs_cyclic_Positive.png')

In [None]:
sns.histplot(df.Positive.rename('Fraction Positive'))

# Tissue loss <a name="tissue"></a>

[contents](#contents)

quantify tissue loss across 10 rounds of quenching, normal/tumor tissue: Biomax FDA808l-2 : Multiple tumor (24 organs )and normal (6 organs) tissue array

In [None]:
#load tissue loss data

s_slides = '808L2'# 'HER2A'#
s_in = f'{codedir}/Data/Quench/features_{s_slides}_TissueLoss.csv'
df = pd.read_csv(s_in,index_col=0,low_memory=False)

In [None]:
#add scene
df['scene'] = [item.split('_')[1] for item in df.slide_scene_new]
#tissue completely lost
ls_na = ['DAPI10_nuclei_thresh','DAPI7_nuclei_thresh','DAPI8_nuclei_thresh','DAPI9_nuclei_thresh']
df.loc[:,ls_na] = df.loc[:,ls_na].fillna(False)
for s_col in ['nuclei_area', 'nuclei_eccentricity']:
    df[f'{s_col}_Q'] = pd.qcut(df.loc[:,s_col],4,labels=["Q1", "Q2", "Q3","Q4"])

In [None]:
#load annotation
df_a = pd.read_csv(f'{codedir}/Data/FDA808l-2 specs.csv')
#grade
df_a['Grade'] = df_a.Grade.replace('-',np.nan)
#stage
d_replace = {'-':np.nan, 'IA':'I', 'III':"III", 'IIA':"II", 'IIB':'II', 'IIIA':"III", 'IB':"I",
       ' IIB G2':"II", 'IIIB':"III", 'IVB':'IV', 'IIA G3':"II", 'IIB G3 ':"II", 'IIB G2':"II"}
df_a['Stage'] = df_a.Stage.replace(d_replace)
df_a.loc[:,'Pathology'] = df_a.loc[:,'Pathology diagnosis']
df_a.loc[:,'Age_Q'] = pd.qcut(df_a.Age,4,labels=["Q1", "Q2", "Q3","Q4"])
df_a.loc[~(df_a.Source.str.contains('\*')),'Source'] = 'autopsy'
df_a.loc[df_a.Source.str.contains('\*'),'Source'] = 'resection'

In [None]:
#shorten names
d_rename = dict(zip(df.columns[df.dtypes=='bool'],[item.split('_')[0] for item in df.columns[df.dtypes=='bool']]))
df.rename(d_rename,axis=1,inplace=True)
ls_order = [ 'DAPI1', 'DAPI2', 'DAPI3', 'DAPI4',
       'DAPI5', 'DAPI6', 'DAPI7', 'DAPI8', 'DAPI9','DAPI10'] #'DAPI-1', 'DAPI0',

In [None]:
#calculate fraction per scene, round
ls_drop = ['scene006','scene012','scene013','scene016', 'scene072'] #scene6 lost in 55, 72 mis registered in 53, others few cells
df_all = pd.DataFrame()
fig, ax = plt.subplots(figsize=(8,4),dpi=300)
for s_scene in sorted(set(df.scene.unique()) - set(ls_drop)):
    df_sum = df[df.scene==s_scene].loc[:,ls_order+['slide']].groupby('slide').sum()
    se_max = df_sum.max(axis=1) #df_sum.loc[:,'DAPI1'] #
    df_plot = (df_sum.T/se_max) #normalize
    #QC
    if len (df_plot.std().loc[df_plot.std() > .1]) > 1:
        print(s_scene)
        print(df_plot.std().loc[df_plot.std() > .2].index)
    if sum(df_plot.loc['DAPI3'] - df_plot.loc['DAPI2'] > .1) > 0:
        print(s_scene)
    df_long = pd.DataFrame(df_plot.unstack()).reset_index().rename({'level_1':'Round',0:'Fraction_Cells'},axis=1)
    sns.lineplot(x='Round',y='Fraction_Cells',data=df_long,ax=ax)
    df_long['scene'] = s_scene
    df_all = df_all.append(df_long)
    #break
ax.set_ylim(0,1)
ax.set_xticks(ls_order)
ax.set_xticklabels(ls_order, rotation=90)
ax.set_title('Tissue Retention per Round, per Core')
plt.tight_layout()
fig.savefig(f'./Figures/Tissue_Loss_per_Scene_all.png')    

In [None]:
#all 
fig, ax = plt.subplots(figsize=(5,3),dpi=300)
#sns.stripplot(x='Round',y='Fraction_Cells',data=df_all.reset_index(),ax=ax,color='C0',s=1,alpha=0.3,jitter=.2)
sns.lineplot(x='Round',y='Fraction_Cells',data=df_all.reset_index(),ax=ax) #,estimator='scene'
ax.set_ylim(.8,1)
ax.set_xticks(ls_order)
ax.set_xticklabels([item.replace('DAPI','') for item in ls_order])
ax.set_ylabel('Fraction Cells')
ax.set_title('Tissue Retention by Round') #,fontweight='bold'
plt.tight_layout()
fig.savefig(f'./Figures/Tissue_Loss_per_Round.png')    

In [None]:
#cats
d_result = {}
for s_type in ['Type','Grade','Stage','Sex','Organ','Pathology','Age_Q']: #
    fig, ax = plt.subplots(figsize=(6,3),dpi=300)
    df_all['Cat'] = df_all.scene.map(dict(zip(df_a.Scene,df_a.loc[:,s_type])))
    sns.lineplot(x='Round',y='Fraction_Cells',hue='Cat',data=df_all.reset_index(),ax=ax) #,estimator='scene'
    ax.set_xticks(ls_order)
    ax.set_xticklabels([item.replace('DAPI','') for item in ls_order])
    ax.set_ylabel('Fraction Cells')
    ax.set_ylim(.75,1)
    ax.set_title(f'Tissue Retention by {s_type}',pad=10) #fontweight='bold',
    ax.legend(bbox_to_anchor=(.95,.95))
    plt.tight_layout()
    fig.savefig(f'./Figures/Tissue_Loss_per_Round_{s_type}.png')    
    #test
    df_last = df_all[df_all.Round=='DAPI10'].copy()
    df_last.index = df_all[df_all.Round=='DAPI10'].slide + '_' + df_all[df_all.Round=='DAPI10'].scene
    df_first = df_all[df_all.Round=='DAPI1'].copy()
    df_first.index = df_all[df_all.Round=='DAPI1'].slide + '_' + df_all[df_all.Round=='DAPI1'].scene
    df_first['Remaining'] = df_last.loc[:,'Fraction_Cells']/df_first.loc[:,'Fraction_Cells']
    figsize = (3.5,3)
    if s_type == 'Type':
        figsize = (4.8,3)
    fig,ax=plt.subplots(figsize=figsize,dpi=300)
    sns.boxplot(x='Cat',y='Remaining',data=df_first,order = df_first.Cat.dropna().sort_values().unique(),fliersize=0,ax=ax)
    sns.stripplot(x='Cat',y='Remaining',data=df_first,order = df_first.Cat.dropna().sort_values().unique(),palette='dark',ax=ax)
    #anova
    lls_f = [df_first.Remaining[df_first.Cat == item] for item in df_first.Cat.dropna().unique()]
    #statistic,pvalue = stats.f_oneway(*lls_f)
    statistic,pvalue = stats.kruskal(*lls_f)
    ax.set_ylabel('Fraction Cells \n After 10 Rounds')
    ax.set_title(f'{s_type} \n p = {pvalue:.02}') #pad=10,,fontweight='bold' Ten Round Retention by 
    ax.set_xlabel(s_type)
    plt.tight_layout()
    fig.savefig(f'./Figures/Tissue_Loss_10round_{s_type}.png') 
    d_result.update({s_type:df_first})
    #break

In [None]:
#surgical versus autopsy
s_type = 'Source'
df_first = d_result['Type'][d_result['Type'].Cat=='Normal'].copy()
df_first['Cat'] = df_first.scene.map(dict(zip(df_a.Scene,df_a.Source)))
#test
df_first['Remaining'] = df_last.loc[:,'Fraction_Cells']/df_first.loc[:,'Fraction_Cells']
fig,ax=plt.subplots(figsize=(3.6,3),dpi=300)
sns.boxplot(x='Cat',y='Remaining',data=df_first,order = df_first.Cat.dropna().sort_values().unique(),fliersize=0,ax=ax)
sns.stripplot(x='Cat',y='Remaining',data=df_first,order = df_first.Cat.dropna().sort_values().unique(),palette='dark',ax=ax)
#anova
lls_f = [df_first.Remaining[df_first.Cat == item] for item in df_first.Cat.dropna().unique()]
statistic,pvalue = stats.mannwhitneyu(*lls_f)
#statistic,pvalue = stats.f_oneway(*lls_f)
ax.set_ylabel('Fraction Cells \n After 10 Rounds')
ax.set_title(f'{s_type} \n p = {pvalue:.02}') #pad=10,,fontweight='bold' Ten Round Retention by 
ax.set_xlabel(s_type)
plt.tight_layout()
fig.savefig(f'./Figures/Tissue_Loss_10round_{s_type}.png') 

In [None]:
#treands between normal and malignant
for s_type in ['Sex','Organ','Pathology','Age_Q','Grade','Type','Stage']:#,
    df_first = d_result[s_type]
    df_first['Type'] = df_first.scene.map(dict(zip(df_a.Scene,df_a.Type)))
    for s_tum in ['Normal','Malignant']:
        df_plot = df_first[df_first.Type==s_tum]
        if len(df_plot.dropna().Cat.unique()) > 1:
            width = 6
            if s_tum=='Malignant':
                if s_type == 'Pathology':
                    width = 8
            fig,ax=plt.subplots(figsize=(width,len(df_plot.Cat.unique())*.3+1.5))
            order =df_plot.groupby('Cat').mean().sort_values(by='Remaining').index
            sns.stripplot(y='Cat',x='Remaining',data=df_plot,order=order,palette='muted',ax=ax)
            sns.pointplot(y='Cat',x='Remaining',data=df_plot,order=order,ax=ax,
                  palette="dark",markers="d", scale=1, ci=95)
            #anova
            lls_f = [df_plot.Remaining[df_plot.Cat == item] for item in df_plot.Cat.dropna().unique()]
            #statistic,pvalue = stats.f_oneway(*lls_f)
            statistic,pvalue = stats.kruskal(*lls_f)
            ax.set_title(f'{s_tum} Tissue \n p = {pvalue:.02}')
            ax.set_ylabel(s_type)
            ax.set_xlim(0,1.1)
            plt.tight_layout()
            fig.savefig(f'./Figures/Tissue_Loss_10round_{s_type}_{s_tum}.png') 
            #break

In [None]:
#calculate fraction per scene, round, cell size or shape
for s_col in ['nuclei_area', 'nuclei_eccentricity']:
    df[f'{s_col}_Q'] = pd.qcut(df.loc[:,s_col],4,labels=["Q1", "Q2", "Q3","Q4"])
ls_drop = ['scene006','scene012','scene013','scene016', 'scene072'] #scene6 lost in 55, 72 mis registered in 53, others few cells
df_cell = pd.DataFrame()

for s_scene in sorted(set(df.scene.unique()) - set(ls_drop)):
    #print(s_scene)
    for s_cell in ['nuclei_area_Q','nuclei_eccentricity_Q']:
        #print(s_cell)
        df_sum = df[df.scene==s_scene].loc[:,ls_order+['slide',s_cell]].groupby(['slide',s_cell]).sum()
        df_max = pd.DataFrame(df_sum.reset_index().groupby(['slide',s_cell]).max().max(axis=1))
        for s_order in ls_order:
            df_max.loc[:,s_order] = df_max.loc[:,0]
        df_max.drop(0,axis=1,inplace=True)
        df_plot = (df_sum/df_max) #normalize
        df_long = df_plot.unstack().unstack().reset_index().rename({'level_0':'Round',0:'Fraction_Cells',s_cell:'Quartile'},axis=1)
        df_long['scene'] = s_scene
        df_long['cell'] = s_cell
        df_cell = df_cell.append(df_long)


In [None]:
#cats
d_result_cell = {}
for s_type in ['nuclei_area_Q','nuclei_eccentricity_Q']: #
    fig, ax = plt.subplots(figsize=(4.5,3),dpi=300)
    #df_cell['Cat'] = df_all.scene.map(dict(zip(df_a.Scene,df_a.loc[:,s_type])))
    sns.lineplot(x='Round',y='Fraction_Cells',hue='Quartile',data=df_cell[df_cell.cell==s_type].reset_index(),ax=ax) #
    ax.set_xticks(ls_order)
    ax.set_xticklabels([item.replace('DAPI','') for item in ls_order])
    ax.set_ylabel('Fraction Cells')
    ax.set_ylim(.8,1)
    ax.set_title(f'Tissue Retention by {s_type.replace("_"," ").replace("Q","")}',pad=10) #fontweight='bold',
    ax.legend(bbox_to_anchor=(.95,.95))
    plt.tight_layout()
    fig.savefig(f'./Figures/Tissue_Loss_per_Round_{s_type}.png')    

    #test
    df_cel = df_cell[df_cell.cell==s_type]
    df_last = df_cel[df_cel.Round=='DAPI10'].copy()
    df_last.index = df_cel[df_cel.Round=='DAPI10'].slide + '_' + df_cel[df_cel.Round=='DAPI10'].scene
    df_first = df_cel[df_cel.Round=='DAPI1'].copy()
    df_first.index = df_cel[df_cel.Round=='DAPI1'].slide + '_' + df_cel[df_cel.Round=='DAPI1'].scene
    df_first['Remaining'] = df_last.loc[:,'Fraction_Cells']/df_first.loc[:,'Fraction_Cells']
    fig,ax=plt.subplots(figsize=(4,3),dpi=200)
    sns.boxplot(x='Quartile',y='Remaining',data=df_first,order = df_first.Quartile.dropna().sort_values().unique(),fliersize=0,ax=ax)
    sns.stripplot(x='Quartile',y='Remaining',data=df_first,order = df_first.Quartile.dropna().sort_values().unique(),palette='dark',ax=ax)
    #anova
    lls_f = [df_first.Remaining[df_first.Quartile == item] for item in df_first.Quartile.dropna().unique()]
    #statistic,pvalue = stats.f_oneway(*lls_f)
    statistic,pvalue = stats.kruskal(*lls_f)
    ax.set_ylabel('Fraction Cells \n After 10 Rounds')
    #ax.set_title(f'Ten Round Retention by {s_type} \n p = {pvalue:.02}',fontweight='bold') #pad=10,
    ax.set_title(f'{s_type.replace("_"," ").replace("Q","").replace("nuc"," Nuc")}\n p = {pvalue:.02}')#Ten Round Retention \n by,pad=10 ,fontweight='bold'
    plt.tight_layout()
    fig.savefig(f'./Figures/Tissue_Loss_10round_{s_type}.png') 
    d_result_cell.update({s_type:df_first})
    #break

In [None]:
#treands between normal and malignant
for s_type in ['nuclei_area_Q','nuclei_eccentricity_Q']:#,
    df_first = d_result_cell[s_type]
    df_first['Type'] = df_first.scene.map(dict(zip(df_a.Scene,df_a.Type)))
    for s_tum in ['Normal','Malignant']:
        df_plot = df_first[df_first.Type==s_tum]
        if len(df_plot.dropna().Quartile.unique()) > 1:
            fig,ax=plt.subplots(figsize=(5,len(df_plot.Quartile.unique())*.3+1.5),dpi=300)
            order =df_plot.groupby('Quartile').mean().sort_values(by='Remaining').index
            sns.stripplot(y='Quartile',x='Remaining',data=df_plot,order=order,palette='muted',ax=ax)
            sns.pointplot(y='Quartile',x='Remaining',data=df_plot,order=order,ax=ax,
                  palette="dark",markers="d", scale=1, ci=95)
            #anova
            lls_f = [df_plot.Remaining[df_plot.Quartile == item] for item in df_plot.Quartile.dropna().unique()]
            #statistic,pvalue = stats.f_oneway(*lls_f)
            statistic,pvalue = stats.kruskal(*lls_f)
            ax.set_title(f'{s_tum} Tissue \n p = {pvalue:.02}')
            ax.set_ylabel(s_type)
            ax.set_xlim(0,1.1)
            plt.tight_layout()
            fig.savefig(f'./Figures/Tissue_Loss_10round_{s_type}_{s_tum}.png') 
            #break

In [None]:
df_first