## 04a HSC differential analysis MAST full

Celltype:

**HSCs**


Run this model:

`zlmCond_all <- zlm(formula = ~condition + Female + n_genes, sca=sca)`

Comparisons:

all cells
- male vs female in all cells controlling for chemical treatment etc
- chemical treatments controlling for sex and n_genes

each cluster
- male vs female in all cells controlling for chemical treatment etc
- chemical treatments controlling for sex and n_genes


done with this docker image:

docker run --rm -d --name test_eva -p 8883:8888 -e JUPYTER_ENABLE_LAB=YES -v /Users/efast/Documents/:/home/jovyan/work r_scanpy:vs4


In [1]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])?  y


In [2]:
import scanpy as sc
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
from matplotlib import colors
import seaborn as sb
from gprofiler import GProfiler

import rpy2.rinterface_lib.callbacks
import logging

from rpy2.robjects import pandas2ri
import anndata2ri

In [3]:
# Ignore R warning messages
#Note: this can be commented out to get more verbose R output
rpy2.rinterface_lib.callbacks.logger.setLevel(logging.ERROR)

# Automatically convert rpy2 outputs to pandas dataframes
pandas2ri.activate()
anndata2ri.activate()
%load_ext rpy2.ipython

plt.rcParams['figure.figsize']=(8,8) #rescale figures
sc.settings.verbosity = 3
#sc.set_figure_params(dpi=200, dpi_save=300)
sc.logging.print_versions()

scanpy==1.4.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.17.3 scipy==1.3.0 pandas==0.25.3 scikit-learn==0.22.2.post1 statsmodels==0.10.0 python-igraph==0.7.1 louvain==0.6.1


In [4]:
%%R
# Load libraries from correct lib Paths for my environment - ignore this!
.libPaths(.libPaths()[c(3,2,1)])

# Load all the R libraries we will be using in the notebook
library(scran)
library(ggplot2)
library(plyr)
library(MAST)

In [5]:
# load data

adata = sc.read('./sc_objects/LT_preprocessed.h5ad', cache = True)

In [6]:
#Create new Anndata object for use in MAST with non-batch corrected data as before
adata_raw = adata.copy()
adata_raw.X = adata.raw.X
adata_raw.obs['n_genes'] = (adata_raw.X > 0).sum(1) # recompute number of genes expressed per cell
adata = None

In [7]:
adata_raw.obs.head()

Unnamed: 0,sample,n_counts,log_counts,n_genes,percent_mito,Female,Female_cat,Female_str,sex_sample,batch,rXist,leiden,umap_density_sample
AAACCCACACAGAGCA,ct,7698.0,8.948846,2663,0.049227,False,False,False,ct_false,batch1,0.078504,2,0.769843
AAACCCAGTATCGTGT,ct,8031.0,8.991189,2538,0.054656,False,False,False,ct_false,batch1,0.078504,1,0.97859
AAACCCAGTCTGTCAA,ct,9978.0,9.208138,3203,0.05021,True,True,True,ct_true,batch1,3.259153,0,0.493924
AAACCCAGTGAACTAA,ct,8042.0,8.992682,2777,0.061288,True,True,True,ct_true,batch1,3.019387,1,0.920127
AAACCCATCCAATCTT,ct,17477.0,9.769098,4695,0.052159,True,True,True,ct_true,batch1,0.525301,2,0.931675


### Run MAST on total female cells - Select genes expressed in >5% of cells (no adaptive thresholding)

In [8]:
%%R -i adata_raw

#Convert SingleCellExperiment to SingleCellAssay type as required by MAST
sca <- SceToSingleCellAssay(adata_raw, class = "SingleCellAssay")

#Scale Gene detection rate
colData(sca)$n_genes = scale(colData(sca)$n_genes)

# filter genes based on hard cutoff (have to be expressed in at least 5% of all cells)
freq_expressed <- 0.05
expressed_genes <- freq(sca) > freq_expressed
sca <- sca[expressed_genes,]

#rename the sample to condition and make the ct the control
cond<-factor(colData(sca)$sample)
cond<-relevel(cond,"ct")
colData(sca)$condition<-cond

#Create data subsets for the different subpopulations 0-activated, 1- quiescent, 2-metabolism
sca_0 <- subset(sca, with(colData(sca), leiden=='0'))
sca_1 <- subset(sca, with(colData(sca), leiden=='1'))
sca_2<- subset(sca, with(colData(sca), leiden=='2'))
sca_3<- subset(sca, with(colData(sca), leiden=='3'))
sca_4<- subset(sca, with(colData(sca), leiden=='4'))
sca_5<- subset(sca, with(colData(sca), leiden=='5'))

#Filter out non-expressed genes in the subsets
print("Dimensions before subsetting:")
print(dim(sca_0))
print(dim(sca_1))
print(dim(sca_2))
print(dim(sca_3))
print(dim(sca_4))
print(dim(sca_5))
print("")

sca_0_filt = sca_0[rowSums(assay(sca_0)) != 0, ]
sca_1_filt = sca_1[rowSums(assay(sca_1)) != 0, ]
sca_2_filt = sca_2[rowSums(assay(sca_2)) != 0, ]
sca_3_filt = sca_3[rowSums(assay(sca_3)) != 0, ]
sca_4_filt = sca_4[rowSums(assay(sca_4)) != 0, ]
sca_5_filt = sca_5[rowSums(assay(sca_5)) != 0, ]

print("Dimensions after subsetting:")
print(dim(sca_0_filt))
print(dim(sca_1_filt))
print(dim(sca_2_filt))
print(dim(sca_3_filt))
print(dim(sca_4_filt))
print(dim(sca_5_filt))

[1] "Dimensions before subsetting:"
[1] 9414 5097
[1] 9414 4942
[1] 9414 4449
[1] 9414  435
[1] 9414  262
[1] 9414  170
[1] ""
[1] "Dimensions after subsetting:"
[1] 9414 5097
[1] 9414 4942
[1] 9414 4449
[1] 9414  435
[1] 9414  262
[1] 9413  170


#### everything

background:  
`zlmCond_all <- zlm(formula = ~condition + Female + n_genes, sca=sca) # this runs the model`

a formula with the measurement variable (gene expression) on the LHS (left hand side) and 
predictors present in colData on the RHS
expression of genes controlling for cluster, condition, sex + n_genes
questions I can ask:
sex differences controlling for treatments
sex differences controlling for clusters - not necessary analyze all the clusters
overall gene expression changes in treatment


In [9]:
%%R 
#Define & run hurdle model 
zlmCond_all <- zlm(formula = ~condition + Female + n_genes, sca=sca) # this runs the model
summaryCond_all <- summary(zlmCond_all, doLRT=TRUE) # extracts the data, gives datatable with summary of fit, doLRT=TRUE extracts likelihood ratio test p-value
summaryDt_all <- summaryCond_all$datatable # reformats into a table

In [10]:
%%R
head(summaryDt_all)

       primerid component        contrast  Pr..Chisq.        ci.hi        ci.lo
1 0610009B22Rik         C      FemaleTRUE 0.965502715  0.009874287 -0.010320197
2 0610009B22Rik         C   conditionGCSF 0.022256004  0.030826080  0.002381708
3 0610009B22Rik         C conditiondmPGE2 0.259041938  0.014006607 -0.052105266
4 0610009B22Rik         C   conditionindo 0.174228770  0.019174089 -0.003466548
5 0610009B22Rik         C    conditionpIC 0.001923491 -0.013984948 -0.061834194
6 0610009B22Rik         C         n_genes 0.000000000 -0.134651440 -0.144165284
           coef            z
1 -0.0002229548  -0.04327749
2  0.0166038940   2.28818797
3 -0.0190493296  -1.12947943
4  0.0078537701   1.35977679
5 -0.0379095710  -3.10564536
6 -0.1394083622 -57.43953212


In [11]:
%%R -o female_all -o GCSF_all -o dmPGE2_all -o indo_all -o pIC_all

# reformat for female
result_all_Female <- merge(summaryDt_all[contrast=='FemaleTRUE' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='FemaleTRUE' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_Female[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
female_all = result_all_Female[result_all_Female$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
female_all = female_all[order(female_all$FDR),] # sorts the table


# reformat for GCSF
result_all_GCSF <- merge(summaryDt_all[contrast=='conditionGCSF' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionGCSF' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_GCSF[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
GCSF_all = result_all_GCSF[result_all_GCSF$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
GCSF_all = GCSF_all[order(GCSF_all$FDR),] # sorts the table


# reformat for dmPGE2
result_all_dmPGE2 <- merge(summaryDt_all[contrast=='conditiondmPGE2' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditiondmPGE2' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_dmPGE2[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
dmPGE2_all = result_all_dmPGE2[result_all_dmPGE2$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
dmPGE2_all = dmPGE2_all[order(dmPGE2_all$FDR),] # sorts the table


# reformat for indo
result_all_indo <- merge(summaryDt_all[contrast=='conditionindo' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionindo' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_indo[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
indo_all = result_all_indo[result_all_indo$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
indo_all = indo_all[order(indo_all$FDR),] # sorts the table

# reformat for pIC
result_all_pIC <- merge(summaryDt_all[contrast=='conditionpIC' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionpIC' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_pIC[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
pIC_all = result_all_pIC[result_all_pIC$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
pIC_all = pIC_all[order(pIC_all$FDR),] # sorts the table

In [12]:
%%R -o MAST_raw_all

MAST_raw_all <- summaryDt_all

In [13]:
# save files as .csvs

MAST_raw_all.to_csv('./write/LT_MAST_raw_all.csv')
female_all.to_csv('./write/LT_MAST_female_all.csv')
GCSF_all.to_csv('./write/LT_MAST_GCSF_all.csv')
pIC_all.to_csv('./write/LT_MAST_pIC_all.csv')
dmPGE2_all.to_csv('./write/LT_MAST_dmPGE2_all.csv')
indo_all.to_csv('./write/LT_MAST_indo_all.csv')

#### cluster 0

In [14]:
%%R
# list all variables 
ls()

 [1] "adata_raw"         "cond"              "dmPGE2_all"       
 [4] "expressed_genes"   "female_all"        "freq_expressed"   
 [7] "GCSF_all"          "indo_all"          "MAST_raw_all"     
[10] "pIC_all"           "result_all_dmPGE2" "result_all_Female"
[13] "result_all_GCSF"   "result_all_indo"   "result_all_pIC"   
[16] "sca"               "sca_0"             "sca_0_filt"       
[19] "sca_1"             "sca_1_filt"        "sca_2"            
[22] "sca_2_filt"        "sca_3"             "sca_3_filt"       
[25] "sca_4"             "sca_4_filt"        "sca_5"            
[28] "sca_5_filt"        "summaryCond_all"   "summaryDt_all"    
[31] "zlmCond_all"      


In [15]:
%%R
# remove previous variables

rm(zlmCond_all)
rm(summaryDt_all)
rm(summaryCond_all)
rm(MAST_raw_all)

In [16]:
%%R 
#Define & run hurdle model 
zlmCond_all <- zlm(formula = ~condition + Female + n_genes, sca=sca_0) # this runs the model
summaryCond_all <- summary(zlmCond_all, doLRT=TRUE) # extracts the data, gives datatable with summary of fit, doLRT=TRUE extracts likelihood ratio test p-value
summaryDt_all <- summaryCond_all$datatable # reformats into a table

In [17]:
%%R
head(summaryDt_all)

       primerid component        contrast    Pr..Chisq.       ci.hi
1 0610009B22Rik         C      FemaleTRUE  9.080832e-01  0.01497169
2 0610009B22Rik         C   conditionGCSF  3.637252e-02  0.04330333
3 0610009B22Rik         C conditiondmPGE2  2.037933e-01  0.10723323
4 0610009B22Rik         C   conditionindo  2.367880e-01  0.02855857
5 0610009B22Rik         C    conditionpIC  6.441267e-01  0.02865934
6 0610009B22Rik         C         n_genes 1.872675e-150 -0.11266419
         ci.lo          coef           z
1 -0.016847068 -0.0009376885  -0.1155190
2  0.001445013  0.0223741739   2.0952859
3 -0.022833151  0.0422000371   1.2718207
4 -0.007049160  0.0107547043   1.1839471
5 -0.046347547 -0.0088441031  -0.4622008
6 -0.128900364 -0.1207822792 -29.1606846


In [18]:
%%R -o female_all -o GCSF_all -o dmPGE2_all -o indo_all -o pIC_all

# reformat for female
result_all_Female <- merge(summaryDt_all[contrast=='FemaleTRUE' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='FemaleTRUE' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_Female[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
female_all = result_all_Female[result_all_Female$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
female_all = female_all[order(female_all$FDR),] # sorts the table


# reformat for GCSF
result_all_GCSF <- merge(summaryDt_all[contrast=='conditionGCSF' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionGCSF' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_GCSF[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
GCSF_all = result_all_GCSF[result_all_GCSF$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
GCSF_all = GCSF_all[order(GCSF_all$FDR),] # sorts the table


# reformat for dmPGE2
result_all_dmPGE2 <- merge(summaryDt_all[contrast=='conditiondmPGE2' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditiondmPGE2' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_dmPGE2[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
dmPGE2_all = result_all_dmPGE2[result_all_dmPGE2$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
dmPGE2_all = dmPGE2_all[order(dmPGE2_all$FDR),] # sorts the table


# reformat for indo
result_all_indo <- merge(summaryDt_all[contrast=='conditionindo' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionindo' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_indo[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
indo_all = result_all_indo[result_all_indo$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
indo_all = indo_all[order(indo_all$FDR),] # sorts the table

# reformat for pIC
result_all_pIC <- merge(summaryDt_all[contrast=='conditionpIC' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionpIC' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_pIC[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
pIC_all = result_all_pIC[result_all_pIC$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
pIC_all = pIC_all[order(pIC_all$FDR),] # sorts the table

In [19]:
%%R -o MAST_raw_all

MAST_raw_all <- summaryDt_all

In [21]:
# save files as .csvs

MAST_raw_all.to_csv('./write/LT_MAST_raw_0.csv')
female_all.to_csv('./write/LT_MAST_female_0.csv')
GCSF_all.to_csv('./write/LT_MAST_GCSF_0.csv')
pIC_all.to_csv('./write/LT_MAST_pIC_0.csv')
dmPGE2_all.to_csv('./write/LT_MAST_dmPGE2_0.csv')
indo_all.to_csv('./write/LT_MAST_indo_0.csv')

#### cluster 1

In [22]:
%%R
# list all variables 
ls()

 [1] "adata_raw"         "cond"              "dmPGE2_all"       
 [4] "expressed_genes"   "female_all"        "freq_expressed"   
 [7] "GCSF_all"          "indo_all"          "MAST_raw_all"     
[10] "pIC_all"           "result_all_dmPGE2" "result_all_Female"
[13] "result_all_GCSF"   "result_all_indo"   "result_all_pIC"   
[16] "sca"               "sca_0"             "sca_0_filt"       
[19] "sca_1"             "sca_1_filt"        "sca_2"            
[22] "sca_2_filt"        "sca_3"             "sca_3_filt"       
[25] "sca_4"             "sca_4_filt"        "sca_5"            
[28] "sca_5_filt"        "summaryCond_all"   "summaryDt_all"    
[31] "zlmCond_all"      


In [23]:
%%R
# remove previous variables

rm(zlmCond_all)
rm(summaryDt_all)
rm(summaryCond_all)
rm(MAST_raw_all)

In [24]:
%%R 
#Define & run hurdle model 
zlmCond_all <- zlm(formula = ~condition + Female + n_genes, sca=sca_1) # this runs the model
summaryCond_all <- summary(zlmCond_all, doLRT=TRUE) # extracts the data, gives datatable with summary of fit, doLRT=TRUE extracts likelihood ratio test p-value
summaryDt_all <- summaryCond_all$datatable # reformats into a table

In [25]:
%%R
head(summaryDt_all)

       primerid component        contrast    Pr..Chisq.       ci.hi
1 0610009B22Rik         C      FemaleTRUE  3.775929e-01  0.01027582
2 0610009B22Rik         C   conditionGCSF  1.421362e-01  0.04698239
3 0610009B22Rik         C conditiondmPGE2  2.551261e-01  0.07639270
4 0610009B22Rik         C   conditionindo  2.821035e-02  0.04399397
5 0610009B22Rik         C    conditionpIC  5.530358e-01  0.06729348
6 0610009B22Rik         C         n_genes 3.750762e-117 -0.16649073
         ci.lo         coef           z
1 -0.027148822 -0.008436504  -0.8836555
2 -0.006703486  0.020139451   1.4705022
3 -0.288725821 -0.106166559  -1.1398087
4  0.002533674  0.023263823   2.1995141
5 -0.035989371  0.015652052   0.5940475
6 -0.193747030 -0.180118881 -25.9042163


In [26]:
%%R -o female_all -o GCSF_all -o dmPGE2_all -o indo_all -o pIC_all

# reformat for female
result_all_Female <- merge(summaryDt_all[contrast=='FemaleTRUE' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='FemaleTRUE' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_Female[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
female_all = result_all_Female[result_all_Female$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
female_all = female_all[order(female_all$FDR),] # sorts the table


# reformat for GCSF
result_all_GCSF <- merge(summaryDt_all[contrast=='conditionGCSF' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionGCSF' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_GCSF[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
GCSF_all = result_all_GCSF[result_all_GCSF$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
GCSF_all = GCSF_all[order(GCSF_all$FDR),] # sorts the table


# reformat for dmPGE2
result_all_dmPGE2 <- merge(summaryDt_all[contrast=='conditiondmPGE2' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditiondmPGE2' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_dmPGE2[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
dmPGE2_all = result_all_dmPGE2[result_all_dmPGE2$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
dmPGE2_all = dmPGE2_all[order(dmPGE2_all$FDR),] # sorts the table


# reformat for indo
result_all_indo <- merge(summaryDt_all[contrast=='conditionindo' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionindo' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_indo[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
indo_all = result_all_indo[result_all_indo$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
indo_all = indo_all[order(indo_all$FDR),] # sorts the table

# reformat for pIC
result_all_pIC <- merge(summaryDt_all[contrast=='conditionpIC' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionpIC' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_pIC[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
pIC_all = result_all_pIC[result_all_pIC$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
pIC_all = pIC_all[order(pIC_all$FDR),] # sorts the table

In [27]:
%%R -o MAST_raw_all

MAST_raw_all <- summaryDt_all

In [28]:
# save files as .csvs

MAST_raw_all.to_csv('./write/LT_MAST_raw_1.csv')
female_all.to_csv('./write/LT_MAST_female_1.csv')
GCSF_all.to_csv('./write/LT_MAST_GCSF_1.csv')
pIC_all.to_csv('./write/LT_MAST_pIC_1.csv')
dmPGE2_all.to_csv('./write/LT_1_MAST_dmPGE2_1.csv')
indo_all.to_csv('./write/LT_MAST_indo_1.csv')

#### cluster 2

In [29]:
%%R
# list all variables 
ls()

 [1] "adata_raw"         "cond"              "dmPGE2_all"       
 [4] "expressed_genes"   "female_all"        "freq_expressed"   
 [7] "GCSF_all"          "indo_all"          "MAST_raw_all"     
[10] "pIC_all"           "result_all_dmPGE2" "result_all_Female"
[13] "result_all_GCSF"   "result_all_indo"   "result_all_pIC"   
[16] "sca"               "sca_0"             "sca_0_filt"       
[19] "sca_1"             "sca_1_filt"        "sca_2"            
[22] "sca_2_filt"        "sca_3"             "sca_3_filt"       
[25] "sca_4"             "sca_4_filt"        "sca_5"            
[28] "sca_5_filt"        "summaryCond_all"   "summaryDt_all"    
[31] "zlmCond_all"      


In [30]:
%%R
# remove previous variables

rm(zlmCond_all)
rm(summaryDt_all)
rm(summaryCond_all)
rm(MAST_raw_all)

In [31]:
%%R 
#Define & run hurdle model 
zlmCond_all <- zlm(formula = ~condition + Female + n_genes, sca=sca_2) # this runs the model
summaryCond_all <- summary(zlmCond_all, doLRT=TRUE) # extracts the data, gives datatable with summary of fit, doLRT=TRUE extracts likelihood ratio test p-value
summaryDt_all <- summaryCond_all$datatable # reformats into a table

In [32]:
%%R
head(summaryDt_all)

       primerid component        contrast    Pr..Chisq.       ci.hi
1 0610009B22Rik         C      FemaleTRUE  2.165310e-01  0.03151910
2 0610009B22Rik         C   conditionGCSF  4.952166e-01  0.03842257
3 0610009B22Rik         C conditiondmPGE2  8.794907e-01  0.06021201
4 0610009B22Rik         C   conditionindo  9.037395e-01  0.01986890
5 0610009B22Rik         C    conditionpIC  9.160002e-01  0.06635253
6 0610009B22Rik         C         n_genes 1.865121e-137 -0.13609065
         ci.lo         coef           z
1 -0.007105417  0.012206842   1.2388489
2 -0.018552293  0.009935138   0.6835475
3 -0.070331703 -0.005059846  -0.1519356
4 -0.022488040 -0.001309571  -0.1211944
5 -0.073916929 -0.003782197  -0.1056961
6 -0.155765854 -0.145928254 -29.0735670


In [33]:
%%R -o female_all -o GCSF_all -o dmPGE2_all -o indo_all -o pIC_all

# reformat for female
result_all_Female <- merge(summaryDt_all[contrast=='FemaleTRUE' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='FemaleTRUE' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_Female[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
female_all = result_all_Female[result_all_Female$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
female_all = female_all[order(female_all$FDR),] # sorts the table


# reformat for GCSF
result_all_GCSF <- merge(summaryDt_all[contrast=='conditionGCSF' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionGCSF' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_GCSF[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
GCSF_all = result_all_GCSF[result_all_GCSF$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
GCSF_all = GCSF_all[order(GCSF_all$FDR),] # sorts the table


# reformat for dmPGE2
result_all_dmPGE2 <- merge(summaryDt_all[contrast=='conditiondmPGE2' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditiondmPGE2' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_dmPGE2[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
dmPGE2_all = result_all_dmPGE2[result_all_dmPGE2$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
dmPGE2_all = dmPGE2_all[order(dmPGE2_all$FDR),] # sorts the table


# reformat for indo
result_all_indo <- merge(summaryDt_all[contrast=='conditionindo' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionindo' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_indo[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
indo_all = result_all_indo[result_all_indo$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
indo_all = indo_all[order(indo_all$FDR),] # sorts the table

# reformat for pIC
result_all_pIC <- merge(summaryDt_all[contrast=='conditionpIC' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionpIC' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_pIC[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
pIC_all = result_all_pIC[result_all_pIC$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
pIC_all = pIC_all[order(pIC_all$FDR),] # sorts the table

In [34]:
%%R -o MAST_raw_all

MAST_raw_all <- summaryDt_all

In [35]:
# save files as .csvs

MAST_raw_all.to_csv('./write/LT_MAST_raw_2.csv')
female_all.to_csv('./write/LT_MAST_female_2.csv')
GCSF_all.to_csv('./write/LT_MAST_GCSF_2.csv')
pIC_all.to_csv('./write/LT_MAST_pIC_2.csv')
dmPGE2_all.to_csv('./write/LT_MAST_dmPGE2_2.csv')
indo_all.to_csv('./write/LT_MAST_indo_2.csv')

#### cluster 3

In [36]:
%%R
# list all variables 
ls()

 [1] "adata_raw"         "cond"              "dmPGE2_all"       
 [4] "expressed_genes"   "female_all"        "freq_expressed"   
 [7] "GCSF_all"          "indo_all"          "MAST_raw_all"     
[10] "pIC_all"           "result_all_dmPGE2" "result_all_Female"
[13] "result_all_GCSF"   "result_all_indo"   "result_all_pIC"   
[16] "sca"               "sca_0"             "sca_0_filt"       
[19] "sca_1"             "sca_1_filt"        "sca_2"            
[22] "sca_2_filt"        "sca_3"             "sca_3_filt"       
[25] "sca_4"             "sca_4_filt"        "sca_5"            
[28] "sca_5_filt"        "summaryCond_all"   "summaryDt_all"    
[31] "zlmCond_all"      


In [37]:
%%R
# remove previous variables

rm(zlmCond_all)
rm(summaryDt_all)
rm(summaryCond_all)
rm(MAST_raw_all)

In [38]:
%%R 
#Define & run hurdle model 
zlmCond_all <- zlm(formula = ~condition + Female + n_genes, sca=sca_3) # this runs the model
summaryCond_all <- summary(zlmCond_all, doLRT=TRUE) # extracts the data, gives datatable with summary of fit, doLRT=TRUE extracts likelihood ratio test p-value
summaryDt_all <- summaryCond_all$datatable # reformats into a table

In [39]:
%%R
head(summaryDt_all)

       primerid component      contrast   Pr..Chisq.         ci.hi       ci.lo
1 0610009B22Rik         C    FemaleTRUE 9.782273e-01  0.0629798843 -0.06122788
2 0610009B22Rik         C conditionGCSF 1.428487e-01  0.2785634676 -0.03756231
3 0610009B22Rik         C conditionindo 7.546705e-01  0.0941983636 -0.13049953
4 0610009B22Rik         C  conditionpIC 5.368692e-02 -0.0008930019 -0.20522687
5 0610009B22Rik         C       n_genes 2.985646e-08 -0.0793501086 -0.15286381
6 0610009B22Rik         C   (Intercept)           NA  0.9329993246  0.74695159
           coef           z
1  0.0008760017  0.02764613
2  0.1205005793  1.49419511
3 -0.0181505857 -0.31664287
4 -0.1030599337 -1.97709528
5 -0.1161069582 -6.19110339
6  0.8399754578 17.69784146


In [40]:
%%R -o female_all -o GCSF_all -o dmPGE2_all -o indo_all -o pIC_all

# reformat for female
result_all_Female <- merge(summaryDt_all[contrast=='FemaleTRUE' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='FemaleTRUE' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_Female[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
female_all = result_all_Female[result_all_Female$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
female_all = female_all[order(female_all$FDR),] # sorts the table


# reformat for GCSF
result_all_GCSF <- merge(summaryDt_all[contrast=='conditionGCSF' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionGCSF' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_GCSF[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
GCSF_all = result_all_GCSF[result_all_GCSF$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
GCSF_all = GCSF_all[order(GCSF_all$FDR),] # sorts the table


# reformat for dmPGE2
result_all_dmPGE2 <- merge(summaryDt_all[contrast=='conditiondmPGE2' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditiondmPGE2' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_dmPGE2[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
dmPGE2_all = result_all_dmPGE2[result_all_dmPGE2$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
dmPGE2_all = dmPGE2_all[order(dmPGE2_all$FDR),] # sorts the table


# reformat for indo
result_all_indo <- merge(summaryDt_all[contrast=='conditionindo' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionindo' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_indo[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
indo_all = result_all_indo[result_all_indo$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
indo_all = indo_all[order(indo_all$FDR),] # sorts the table

# reformat for pIC
result_all_pIC <- merge(summaryDt_all[contrast=='conditionpIC' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionpIC' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_pIC[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
pIC_all = result_all_pIC[result_all_pIC$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
pIC_all = pIC_all[order(pIC_all$FDR),] # sorts the table

In [41]:
%%R -o MAST_raw_all

MAST_raw_all <- summaryDt_all

In [42]:
# save files as .csvs

MAST_raw_all.to_csv('./write/LT_MAST_raw_3.csv')
female_all.to_csv('./write/LT_MAST_female_3.csv')
GCSF_all.to_csv('./write/LT_MAST_GCSF_3.csv')
pIC_all.to_csv('./write/LT_MAST_pIC_3.csv')
dmPGE2_all.to_csv('./write/LT_MAST_dmPGE2_3.csv')
indo_all.to_csv('./write/LT_MAST_indo_3.csv')

#### cluster 4

In [43]:
%%R
# list all variables 
ls()

 [1] "adata_raw"         "cond"              "dmPGE2_all"       
 [4] "expressed_genes"   "female_all"        "freq_expressed"   
 [7] "GCSF_all"          "indo_all"          "MAST_raw_all"     
[10] "pIC_all"           "result_all_dmPGE2" "result_all_Female"
[13] "result_all_GCSF"   "result_all_indo"   "result_all_pIC"   
[16] "sca"               "sca_0"             "sca_0_filt"       
[19] "sca_1"             "sca_1_filt"        "sca_2"            
[22] "sca_2_filt"        "sca_3"             "sca_3_filt"       
[25] "sca_4"             "sca_4_filt"        "sca_5"            
[28] "sca_5_filt"        "summaryCond_all"   "summaryDt_all"    
[31] "zlmCond_all"      


In [44]:
%%R
# remove previous variables

rm(zlmCond_all)
rm(summaryDt_all)
rm(summaryCond_all)
rm(MAST_raw_all)

In [45]:
%%R 
#Define & run hurdle model 
zlmCond_all <- zlm(formula = ~condition + Female + n_genes, sca=sca_4) # this runs the model
summaryCond_all <- summary(zlmCond_all, doLRT=TRUE) # extracts the data, gives datatable with summary of fit, doLRT=TRUE extracts likelihood ratio test p-value
summaryDt_all <- summaryCond_all$datatable # reformats into a table

In [46]:
%%R
head(summaryDt_all)

       primerid component        contrast Pr..Chisq.       ci.hi      ci.lo
1 0610009B22Rik         C      FemaleTRUE 0.21430193  0.03596871 -0.1702462
2 0610009B22Rik         C conditiondmPGE2 1.00000000          NA         NA
3 0610009B22Rik         C         n_genes 0.02531992 -0.01195137 -0.1346908
4 0610009B22Rik         C     (Intercept)         NA  0.84075455  0.7046515
5 0610009B22Rik         D      FemaleTRUE 0.89964666  0.69450794 -0.7864429
6 0610009B22Rik         D conditiondmPGE2 0.72654579  4.55773843 -3.4646578
         coef          z
1 -0.06713873 -1.2762366
2          NA         NA
3 -0.07332109 -2.3416546
4  0.77270302 22.2547534
5 -0.04596746 -0.1216713
6  0.54654032  0.2670522


In [47]:
%%R -o female_all -o GCSF_all -o dmPGE2_all -o indo_all -o pIC_all

# reformat for female
result_all_Female <- merge(summaryDt_all[contrast=='FemaleTRUE' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='FemaleTRUE' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_Female[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
female_all = result_all_Female[result_all_Female$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
female_all = female_all[order(female_all$FDR),] # sorts the table


# reformat for GCSF
result_all_GCSF <- merge(summaryDt_all[contrast=='conditionGCSF' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionGCSF' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_GCSF[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
GCSF_all = result_all_GCSF[result_all_GCSF$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
GCSF_all = GCSF_all[order(GCSF_all$FDR),] # sorts the table


# reformat for dmPGE2
result_all_dmPGE2 <- merge(summaryDt_all[contrast=='conditiondmPGE2' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditiondmPGE2' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_dmPGE2[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
dmPGE2_all = result_all_dmPGE2[result_all_dmPGE2$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
dmPGE2_all = dmPGE2_all[order(dmPGE2_all$FDR),] # sorts the table


# reformat for indo
result_all_indo <- merge(summaryDt_all[contrast=='conditionindo' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionindo' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_indo[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
indo_all = result_all_indo[result_all_indo$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
indo_all = indo_all[order(indo_all$FDR),] # sorts the table

# reformat for pIC
result_all_pIC <- merge(summaryDt_all[contrast=='conditionpIC' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionpIC' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_pIC[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
pIC_all = result_all_pIC[result_all_pIC$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
pIC_all = pIC_all[order(pIC_all$FDR),] # sorts the table

In [48]:
%%R -o MAST_raw_all

MAST_raw_all <- summaryDt_all

In [49]:
# save files as .csvs

MAST_raw_all.to_csv('./write/LT_MAST_raw_4.csv')
female_all.to_csv('./write/LT_MAST_female_4.csv')
GCSF_all.to_csv('./write/LT_MAST_GCSF_4.csv')
pIC_all.to_csv('./write/LT_MAST_pIC_4.csv')
dmPGE2_all.to_csv('./write/LT_MAST_dmPGE2_4.csv')
indo_all.to_csv('./write/LT_MAST_indo_4.csv')

#### cluster 5

In [50]:
%%R
# list all variables 
ls()

 [1] "adata_raw"         "cond"              "dmPGE2_all"       
 [4] "expressed_genes"   "female_all"        "freq_expressed"   
 [7] "GCSF_all"          "indo_all"          "MAST_raw_all"     
[10] "pIC_all"           "result_all_dmPGE2" "result_all_Female"
[13] "result_all_GCSF"   "result_all_indo"   "result_all_pIC"   
[16] "sca"               "sca_0"             "sca_0_filt"       
[19] "sca_1"             "sca_1_filt"        "sca_2"            
[22] "sca_2_filt"        "sca_3"             "sca_3_filt"       
[25] "sca_4"             "sca_4_filt"        "sca_5"            
[28] "sca_5_filt"        "summaryCond_all"   "summaryDt_all"    
[31] "zlmCond_all"      


In [51]:
%%R
# remove previous variables

rm(zlmCond_all)
rm(summaryDt_all)
rm(summaryCond_all)
rm(MAST_raw_all)

In [52]:
%%R 
#Define & run hurdle model 
zlmCond_all <- zlm(formula = ~condition + Female + n_genes, sca=sca_5) # this runs the model
summaryCond_all <- summary(zlmCond_all, doLRT=TRUE) # extracts the data, gives datatable with summary of fit, doLRT=TRUE extracts likelihood ratio test p-value
summaryDt_all <- summaryCond_all$datatable # reformats into a table

In [53]:
%%R
head(summaryDt_all)

       primerid component      contrast   Pr..Chisq.       ci.hi       ci.lo
1 0610009B22Rik         C    FemaleTRUE 0.6007661955  0.06286179 -0.10831760
2 0610009B22Rik         C conditionGCSF 0.1738292142  0.20272763 -0.03665301
3 0610009B22Rik         C conditionindo 0.5373722731  0.06581315 -0.12581008
4 0610009B22Rik         C  conditionpIC 0.7547982009  0.29475417 -0.21414848
5 0610009B22Rik         C       n_genes 0.0001438733 -0.04747199 -0.14019854
6 0610009B22Rik         C   (Intercept)           NA  1.00878433  0.70975156
         coef          z
1 -0.02272791 -0.5204585
2  0.08303731  1.3597602
3 -0.02999846 -0.6136616
4  0.04030284  0.3104410
5 -0.09383526 -3.9667977
6  0.85926794 11.2638773


In [54]:
%%R -o female_all -o GCSF_all -o dmPGE2_all -o indo_all -o pIC_all

# reformat for female
result_all_Female <- merge(summaryDt_all[contrast=='FemaleTRUE' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='FemaleTRUE' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_Female[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
female_all = result_all_Female[result_all_Female$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
female_all = female_all[order(female_all$FDR),] # sorts the table


# reformat for GCSF
result_all_GCSF <- merge(summaryDt_all[contrast=='conditionGCSF' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionGCSF' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_GCSF[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
GCSF_all = result_all_GCSF[result_all_GCSF$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
GCSF_all = GCSF_all[order(GCSF_all$FDR),] # sorts the table


# reformat for dmPGE2
result_all_dmPGE2 <- merge(summaryDt_all[contrast=='conditiondmPGE2' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditiondmPGE2' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_dmPGE2[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
dmPGE2_all = result_all_dmPGE2[result_all_dmPGE2$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
dmPGE2_all = dmPGE2_all[order(dmPGE2_all$FDR),] # sorts the table


# reformat for indo
result_all_indo <- merge(summaryDt_all[contrast=='conditionindo' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionindo' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_indo[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
indo_all = result_all_indo[result_all_indo$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
indo_all = indo_all[order(indo_all$FDR),] # sorts the table

# reformat for pIC
result_all_pIC <- merge(summaryDt_all[contrast=='conditionpIC' & component=='H',.(primerid, `Pr(>Chisq)`)], #P-vals
                  summaryDt_all[contrast=='conditionpIC' & component=='logFC', .(primerid, coef)],
                  by='primerid') #logFC coefficients
#Correct for multiple testing (FDR correction) and filtering
result_all_pIC[,FDR:=p.adjust(`Pr(>Chisq)`, 'fdr')] # create column named FDR - probably that p.adjust function
pIC_all = result_all_pIC[result_all_pIC$FDR<0.01,, drop=F] # create new table where rows with FDR<0.01 are droped
pIC_all = pIC_all[order(pIC_all$FDR),] # sorts the table

In [55]:
%%R -o MAST_raw_all

MAST_raw_all <- summaryDt_all

In [56]:
# save files as .csvs

MAST_raw_all.to_csv('./write/LT_MAST_raw_5.csv')
female_all.to_csv('./write/LT_MAST_female_5.csv')
GCSF_all.to_csv('./write/LT_MAST_GCSF_5.csv')
pIC_all.to_csv('./write/LT_MAST_pIC_5.csv')
dmPGE2_all.to_csv('./write/LT_MAST_dmPGE2_5.csv')
indo_all.to_csv('./write/LT_MAST_indo_5.csv')

In [57]:
sc.logging.print_versions()
pd.show_versions()

scanpy==1.4.5.1 anndata==0.7.1 umap==0.3.10 numpy==1.17.3 scipy==1.3.0 pandas==0.25.3 scikit-learn==0.22.2.post1 statsmodels==0.10.0 python-igraph==0.7.1 louvain==0.6.1

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.3.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.19.76-linuxkit
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : en_US.UTF-8
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.3
numpy            : 1.17.3
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 19.3.1
setuptools       : 41.6.0.post20191101
Cython           : None
pytest           : 5.3.5
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.10.3
IPython          : 7.

In [58]:
%%R

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.7.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] parallel  stats4    tools     stats     graphics  grDevices utils    
 [8] datasets  methods   base     

other attached packages:
 [1] Matrix_1.2-17               MAST_1.12.0                
 [3] plyr_1.8.4                  ggplot2_3.2.1              
 [5] scran_1.14.1                SingleCellExperiment_1.8.0 
 [7] SummarizedExperiment_1.16.0 DelayedArray_0.12.0        
 [9] BiocParallel_1.20.0         matrixStats_