We expect T-cell and NK cell populations to be effected by the STAT4 mutation, so we will specifically look into those subpopulations. 

Here, we specifically add the Patient 1 vs Control context comparisons that wasn't originally conducted in [Notebook 02](./02_DE_ContextComparisons.ipynb). We use the same thresholds as Patient 2 vs Control in Notebook 02. 

In [1]:
suppressPackageStartupMessages({
    suppressWarnings({
        library(Seurat, quietly = T)
        library(openxlsx, quietly = T)
        library(ggpubr, quietly = T)
        library(plyr, quietly = T)
        library(dplyr, quietly = T)
    })
})

data_path = '/data3/hratch/STAT4_v2/'

In [2]:
pbmc.integrated<-readRDS(paste0(data_path, 'processed/pbmc_integrated.RDS'))
md<-pbmc.integrated@meta.data

Specify the cell types and context comparisons to test for:

In [3]:
cell.types<-c('Naive CD8+ T cells', 'CD8+ NKT-like cells', 'Natural killer  cells', 
              'Naive CD4+ T cells', 'Effector CD4+ T cells', 'Memory CD4+ T cells')
comparisons<-list(disease.effect = c('Patient.1', 'Control'))

We anticipate that there is a general upregulation of genes in Patient 2 vs the control, since STAT4 is a gain-of-function mutation.

Since we are testing differences in the same cell type across contexts, we employ DE tests that can control for technical effects. Latent variables that account for technical effects have been [shown](https://www.biorxiv.org/content/10.1101/2022.03.15.484475v1) to be effective for DE across contexts. We will first use MAST and the CDR (cellular detection rate) which has been [shown](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5) to be an effective latent variable for technical effects

Note, we do expect downregulation of genes in Patient 1 relative to Patient 2 since this is the treatment-effect.

# CDR

First, we calculate the CDR from the LogNormalized expression matrix:

In [4]:
freq<-function(expr){
    nonzero.counts<-rowSums(expr !=0 ) # get # of nonzero cells per gene
    return(nonzero.counts/dim(expr)[[2]])
}

In [5]:
expr = pbmc.integrated@assays$RNA@data # log-normalized matrix
expr<-expr[which(freq(expr)>0),] # remove invariant genes

In [6]:
thresh = 0 # calculate CDR on non-zero frequency (NOTE: code will need to be changed if setting higher thresh)
cdr<-unlist(unname(scale(colSums(expr!=thresh))[, 1])) # calculate CDR as in MAST tutorial (https://www.bioconductor.org/packages/release/bioc/vignettes/MAST/inst/doc/MAITAnalysis.html)

In [7]:
pbmc.integrated@meta.data[['cellular.detection.rate']]<-cdr # add cdr to object

# MAST

In [8]:
MAST.de<-function(cell.type, context.treat, context.base, latent.vars, min.pct, lfc.thresh){
    pbmc.subset<-subset(x = pbmc.integrated, subset = Cell.Type == ct)
    Idents(pbmc.subset)<-'orig.ident'
    
    suppressWarnings({
        suppressMessages({
            de.res<-FindMarkers(object = pbmc.subset, 
                                ident.1 = context.treat, ident.2 = context.base,
                                assay = 'RNA', only.pos = F, 
                                slot = 'data', test.use = 'MAST', 
                                latent.vars = latent.vars,
                                min.pct = min.pct, 
                                logfc.threshold = lfc.thresh 
                                              )
            })
    })
    
    names(de.res)[names(de.res) == 'p_val_adj'] <- 'bonferroni.adjusted' # rename to specify correction type
    # get the B-H to be less stringent than the native Seurat Bonferroni
    de.res[['BH.adjusted']]<-p.adjust(p = de.res$p_val, method = "BH") 
    de.res[['gene']]<-rownames(de.res)
    de.res[['Cell.Type']]<-ct
    de.res[['Comparison']]<-paste0(context.treat, '_vs_', context.base)
    
    return(de.res)
}

Since we expect fewer differences between Patient 1 vs Patient 2 relative to Patient vs Control,  we have a less stringent threshold for the minimum % of cells a gene must be expressed and lfc thresholds to include.

In [9]:
# MAST.de.res<-list()
# for (comparison in comparisons){
#     for (ct in cell.types){
#         context.treat<-comparison[[1]]
#         context.base<-comparison[[2]]
#         cond.name<-paste0(ct, '_', paste0(comparison, collapse = 'vs'))
#         if (context.base == 'Control'){
#             min.pct = 0.1
#             lfc.thresh = 0.9
#         }else{# less stringent for patient comparison bc fewer differences
#             min.pct = 0.05 
#             lfc.thresh = 0.5
#         }
#         MAST.de.res[[cond.name]]<-MAST.de(cell.type, context.treat, context.base, 
#                                       latent.vars = 'cellular.detection.rate', 
#                                          min.pct = min.pct, lfc.thresh = lfc.thresh)
#     }
# }
# saveRDS(MAST.de.res, paste0(data_path, 'processed/additional_MAST_condition-specific_DE.RDS'))
MAST.de.res<-readRDS(paste0(data_path, 'processed/additional_MAST_condition-specific_DE.RDS'))

In [10]:
de.res<-do.call("rbind", MAST.de.res)
de.res<-de.res[de.res$BH.adjusted <= 0.1,]

print('# of DE genes with BH <= 0.1 and LFC >= 0.5:')
table(de.res$Cell.Type, de.res$Comparison)

[1] "# of DE genes with BH <= 0.1 and LFC >= 0.5:"


                       
                        Patient.1_vs_Control
  CD8+ NKT-like cells                    239
  Effector CD4+ T cells                  183
  Memory CD4+ T cells                    119
  Naive CD4+ T cells                     201
  Naive CD8+ T cells                     201
  Natural killer  cells                  181

Format and use different BH thresholds for the two comparisons:

In [11]:
de.res<-do.call("rbind", MAST.de.res)

# BH threshold separately on each comparison
de.res.control<-de.res[de.res$Comparison == 'Patient.1_vs_Control', ]

de.res.control<-de.res.control[de.res.control$BH.adjusted <= 0.01,]
de.res<-de.res.control
de.res<-de.res[with(de.res, order(Cell.Type, -abs(avg_log2FC), BH.adjusted)), ]

print('# of DE genes with BH <= 0.01 and LFC >= 0.5:')
table(de.res$Cell.Type, de.res$Comparison)

[1] "# of DE genes with BH <= 0.01 and LFC >= 0.5:"


                       
                        Patient.1_vs_Control
  CD8+ NKT-like cells                    209
  Effector CD4+ T cells                  182
  Memory CD4+ T cells                     47
  Naive CD4+ T cells                     199
  Naive CD8+ T cells                     142
  Natural killer  cells                   97

In [12]:
de.res<-do.call("rbind", MAST.de.res)

# BH threshold separately on each comparison
de.res.control<-de.res[de.res$Comparison == 'Patient.1_vs_Control', ]

de.res.control<-de.res.control[de.res.control$BH.adjusted <= 0.01,]
de.res<-de.res.control
de.res<-de.res[with(de.res, order(Cell.Type, -abs(avg_log2FC), BH.adjusted)), ]

print('# of DE genes prior to filtering:')
table(de.res$Cell.Type, de.res$Comparison)

[1] "# of DE genes prior to filtering:"


                       
                        Patient.1_vs_Control
  CD8+ NKT-like cells                    209
  Effector CD4+ T cells                  182
  Memory CD4+ T cells                     47
  Naive CD4+ T cells                     199
  Naive CD8+ T cells                     142
  Natural killer  cells                   97

Save to excel file:

In [13]:
# save to excel
counter<-1
context_comparisons_workbook<-createWorkbook()
for (comparison in unique(de.res$Comparison)){
    for (cell.type in  unique(de.res$Cell.Type)){
        de.res.cl<-de.res[(de.res$Comparison == comparison) & (de.res$Cell.Type == cell.type), ]
        if (dim(de.res.cl)[[1]] > 0){rownames(de.res.cl)<-1:dim(de.res.cl)[[1]]}
        
        addWorksheet(context_comparisons_workbook, paste0(counter))
        writeData(context_comparisons_workbook, sheet = paste0(counter), x = de.res.cl)
        counter<-counter+1
    }
}
saveWorkbook(context_comparisons_workbook, overwrite = T, 
                 paste0(data_path, 'processed/', 'additional_MAST_condition-specific_DE.xlsx'))

In [16]:
library(MAST, quietly = T)
sessionInfo()

R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS/LAPACK: /home/hratch/miniconda3/envs/STAT4/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] MAST_1.25.1                 SingleCellExperiment_1.16.0
 [3] SummarizedExperiment_1.24.0 Biobase_2.54.0             
 [5] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
 [7] IRanges_2.28.0              S4Vectors_0.32.4           
 [9] BiocGenerics_0.40.0         MatrixGenerics