New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running MAST with random effect for subject via FindMarkers #3712
Comments
We currently do not support this, but you may try altering our MAST wrapper to incorporate a random effect. The hurdle model is specified by the
|
@esrasefik just wondering if you've gotten this to work, and if you wouldn't mind sharing your script? I am trying to do the same with my differential expression analysis. |
@jgamache014 - I unfortunately haven't been able to. I decided to switch to using MAST directly instead of using the Seurat interface given the little support the developers provided for this issue. I don't yet have a solution I can share - still working on it. Sorry! |
@esrasefik Thanks, I'll try using MAST directly as well! |
@jgamache014 @esrasefik have any of you succeeded? If not, what have been the stumbling blocks? I'd like to have a crack at it. |
This clunky solution will add this functionality to Note that it doesn't allow using empirical Bayes estimation to regularize variance, and it's considerably slower. # We want a different behavior from MASTDETest, where the latent variable is
# used as a random effect in the model
rem_MASTDETest <- function(
data.use,
cells.1,
cells.2,
latent.vars = NULL,
verbose = TRUE,
# New option - random effect variable (should be included in latent.vars)
re.var = NULL,
...
) {
# Check for MAST
if (!PackageCheck('MAST', error = FALSE)) {
stop("Please install MAST - learn more at https://github.com/RGLab/MAST")
}
group.info <- data.frame(row.names = c(cells.1, cells.2))
latent.vars <- latent.vars %||% group.info
group.info[cells.1, "group"] <- "Group1"
group.info[cells.2, "group"] <- "Group2"
group.info[, "group"] <- factor(x = group.info[, "group"])
latent.vars.names <- c("condition", colnames(x = latent.vars))
latent.vars <- cbind(latent.vars, group.info)
latent.vars$wellKey <- rownames(x = latent.vars)
fdat <- data.frame(rownames(x = data.use))
colnames(x = fdat)[1] <- "primerid"
rownames(x = fdat) <- fdat[, 1]
sca <- MAST::FromMatrix(
exprsArray = as.matrix(x = data.use),
check_sanity = FALSE,
cData = latent.vars,
fData = fdat
)
cond <- factor(x = SummarizedExperiment::colData(sca)$group)
cond <- relevel(x = cond, ref = "Group1")
SummarizedExperiment::colData(sca)$condition <- cond
# This is the main change in the code - we want ~ ( 1 | re.var) in the formula:
# ~ condition + lat.vars + (1 | re.var)
if (!is.null(re.var)) {
if (!re.var %in% latent.vars.names) {
stop("Random effect variable should be included in latent variables!")
}
latent.vars.names <- latent.vars.names[!latent.vars.names %in% re.var]
fmla <- as.formula(
object = paste0(
" ~ ", paste(latent.vars.names, collapse = "+"), glue("+ (1|{re.var})")
)
)
# print(fmla)
# We need glmer to make this work
method <- "glmer" # trying to troubleshoot this - it can clash with the already existing method var in the function call
zlmCond <- MAST::zlm(formula = fmla, sca = sca, method = "glmer", ...)
} else {
# Original code
fmla <- as.formula(
object = paste0(" ~ ", paste(latent.vars.names, collapse = "+"))
)
zlmCond <- MAST::zlm(formula = fmla, sca = sca, ...)
}
summaryCond <- MAST::summary(object = zlmCond, doLRT = 'conditionGroup2')
summaryDt <- summaryCond$datatable
# The output format is slightly different, so we need adapt the code
if(!is.null(re.var)) {
p_val <- summaryDt[summaryDt$"component" == "H", 4]$`Pr(>Chisq)`
genes.return <- summaryDt[summaryDt$"component" == "H", 1]$primerid
} else {
p_val <- summaryDt[summaryDt[, "component"] == "H", 4]
genes.return <- summaryDt[summaryDt[, "component"] == "H", 1]
}
to.return <- data.frame(p_val, row.names = genes.return)
return(to.return)
}
# This is the zlm function with a minor change to fix a buggy variable being passed possibly due to namespace clash
my_zlm <- function (formula, sca, method = "bayesglm", silent = TRUE, ebayes = TRUE,
ebayesControl = NULL, force = FALSE, hook = NULL, parallel = TRUE,
LMlike, onlyCoef = FALSE, exprs_values = MAST::assay_idx(sca)$aidx,
...)
{
dotsdata = list(...)$data
if (!is.null(dotsdata)) {
if (!missing(sca))
stop("Cannot provide both `sca` and `data`")
sca = dotsdata
}
if (!inherits(sca, "SingleCellAssay")) {
if (inherits(sca, "data.frame")) {
if (!is.null(dotsdata)) {
return(.zlm(formula, method = method, silent = silent,
...))
}
else {
return(.zlm(formula, data = sca, method = method,
silent = silent, ...))
}
}
else {
stop("`sca` must inherit from `data.frame` or `SingleCellAssay`")
}
}
if (missing(LMlike)) {
method <- match.arg(method, MAST:::methodDict[, keyword])
method <- MAST:::methodDict[keyword == method, lmMethod]
if (!is(sca, "SingleCellAssay"))
stop("'sca' must be (or inherit) 'SingleCellAssay'")
if (!is(formula, "formula"))
stop("'formula' must be class 'formula'")
Formula <- MAST:::removeResponse(formula)
priorVar <- 1
priorDOF <- 0
if (ebayes) {
# if (!methodDict[lmMethod == method, implementsEbayes])
if (!all(MAST:::methodDict[lmMethod == method, implementsEbayes]))
stop("Method", method, " does not implement empirical bayes variance shrinkage.")
ebparm <- MAST:::ebayes(t(SummarizedExperiment:::assay(sca, exprs_values)), ebayesControl,
model.matrix(Formula, SummarizedExperiment::colData(sca)))
priorVar <- ebparm[["v"]]
priorDOF <- ebparm[["df"]]
stopifnot(all(!is.na(ebparm)))
}
obj <- MAST:::new_with_repaired_slots(classname = method, design = SummarizedExperiment::colData(sca),
formula = Formula, priorVar = priorVar, priorDOF = priorDOF,
extra = list(...))
}
else {
if (!missing(formula))
warning("Ignoring formula and using model defined in 'objLMLike'")
if (!inherits(LMlike, "LMlike"))
stop("'LMlike' must inherit from class 'LMlike'")
obj <- LMlike
}
ee <- t(SummarizedExperiment:::assay(sca, exprs_values))
genes <- colnames(ee)
ng <- length(genes)
MM <- MAST:::model.matrix(obj)
coefNames <- colnames(MM)
listEE <- setNames(seq_len(ng), genes)
obj <- MAST:::fit(obj, ee[, 1], silent = silent)
nerror <- totalerr <- 0
pb = progress::progress_bar$new(total = ng, format = " Completed [:bar] :percent with :err failures")
.fitGeneSet <- function(idx) {
hookOut <- NULL
tt <- try({
obj <- MAST:::fit(obj, response = ee[, idx], silent = silent,
quick = TRUE)
if (!is.null(hook))
hookOut <- hook(obj)
nerror <- 0
})
if (is(tt, "try-error")) {
obj@fitC <- obj@fitD <- NULL
obj@fitted <- c(C = FALSE, D = FALSE)
nerror <- nerror + 1
totalerr = totalerr + 1
if (nerror > 5 & !force) {
stop("We seem to be having a lot of problems here...are your tests specified correctly? \n If you're sure, set force=TRUE.",
tt)
}
}
pb$tick(tokens = list(err = totalerr))
if (onlyCoef)
return(cbind(C = coef(obj, "C"), D = coef(obj, "D")))
summaries <- MAST:::summarize(obj)
structure(summaries, hookOut = hookOut)
}
if (!parallel || getOption("mc.cores", 1L) == 1) {
listOfSummaries <- lapply(listEE, .fitGeneSet)
}
else {
listOfSummaries <- parallel::mclapply(listEE, .fitGeneSet,
mc.preschedule = TRUE, mc.silent = silent)
}
if (onlyCoef) {
out <- do.call(abind, c(listOfSummaries, rev.along = 0))
return(aperm(out, c(3, 1, 2)))
}
cls <- sapply(listOfSummaries, function(x) class(x))
complain <- if (force)
warning
else stop
if (mean(cls == "try-error") > 0.5)
complain("Lots of errors here..something is amiss.")
hookOut <- NULL
if (!is.null(hook))
hookOut <- lapply(listOfSummaries, attr, which = "hookOut")
message("\nDone!")
summaries <- MAST:::collectSummaries(listOfSummaries)
summaries[["LMlike"]] <- obj
summaries[["sca"]] <- sca
summaries[["priorVar"]] <- obj@priorVar
summaries[["priorDOF"]] <- obj@priorDOF
summaries[["hookOut"]] <- hookOut
summaries[["exprs_values"]] <- exprs_values
summaries[["Class"]] <- "ZlmFit"
zfit <- do.call(new, as.list(summaries))
zfit
}
# We will replace the original function with ours inside the Seurat namespace
assignInNamespace('MASTDETest', rem_MASTDETest, asNamespace("Seurat"))
# getFromNamespace("MASTDETest", "Seurat")
assignInNamespace('zlm', my_zlm, asNamespace("MAST"))
# getFromNamespace("zlm", "MAST")
# And now we can call our FindMarkers with random effects model for sample_ID
# Don't forget to include ebayes = F
markers <- FindMarkers(
sobj, ident.1 = "foo", ident.2 = "bar", only.pos = TRUE,
test.use = "MAST", verbose = F
, latent.vars = "sample_ID", re.var = "sample_ID", ebayes = F
) |
@rdalbanus thanks for sharing! I also wrote a function - it's a while back now, but here it is: |
@rdalbanus Thanks for writing this script! I'm wondering why you wrote it in a way such that the random effect variable needs to also be a latent variable? |
@andrewdchen The I only added a flag that selects one of these covariates and puts it in a |
Dear Seurat Team,
I am contacting you in regards to a question about how to use your FindMarkers function to run MAST with a random effect added for subject. I have a single cell RNAseq dataset with two genotypes (4 subjects each) and I’m trying to run a cluster specific differential expression analysis between these two genotypes to identify cell-type specific DEGs. I was able to successfully run MAST for this purpose via your FindMarkers function by using the instructions provided in your vignettes. I now would like to slightly modify my approach and add a random effect for subject to avoid a simple pseudoreplication type issue while keeping other parameters (i.e., logfc.threshold, min.pct etc.) the same as my original approach. To my knowledge, it should be possible to tweak the MAST algorithm to incorporate a random effect but I couldn’t find any instructions online on using Seurat to do this. I would appreciate any recommendations you could give me.
Here is the code I used for running MAST without a random effect (ident.1 and ident.2 indicate wild type and mutant genotypes):
FindMarkers(s_obj4, ident.1 = "0_2", ident.2 = "0_1", min.pct = 0.1, logfc.threshold = 0.1, test.use = "MAST", verbose = FALSE, assay = "RNA", slot = "data", latent.vars = "nCount_RNA", pseudocount.use = 0.001)
Thank you!
Esra
The text was updated successfully, but these errors were encountered: