Running MAST with random effect for subject via FindMarkers #3712

esrasefik · 2020-11-11T18:11:32Z

Dear Seurat Team,

I am contacting you in regards to a question about how to use your FindMarkers function to run MAST with a random effect added for subject. I have a single cell RNAseq dataset with two genotypes (4 subjects each) and I’m trying to run a cluster specific differential expression analysis between these two genotypes to identify cell-type specific DEGs. I was able to successfully run MAST for this purpose via your FindMarkers function by using the instructions provided in your vignettes. I now would like to slightly modify my approach and add a random effect for subject to avoid a simple pseudoreplication type issue while keeping other parameters (i.e., logfc.threshold, min.pct etc.) the same as my original approach. To my knowledge, it should be possible to tweak the MAST algorithm to incorporate a random effect but I couldn’t find any instructions online on using Seurat to do this. I would appreciate any recommendations you could give me.

Here is the code I used for running MAST without a random effect (ident.1 and ident.2 indicate wild type and mutant genotypes):

FindMarkers(s_obj4, ident.1 = "0_2", ident.2 = "0_1", min.pct = 0.1, logfc.threshold = 0.1, test.use = "MAST", verbose = FALSE, assay = "RNA", slot = "data", latent.vars = "nCount_RNA", pseudocount.use = 0.001)

Thank you!
Esra

The text was updated successfully, but these errors were encountered:

jaisonj708 · 2020-12-05T02:35:26Z

We currently do not support this, but you may try altering our MAST wrapper to incorporate a random effect. The hurdle model is specified by the formula parameter in the zlm function, called in MASTDETest:

zlmCond <- MAST::zlm(formula = fmla, sca = sca, ...)

jgamache014 · 2021-01-27T16:03:24Z

@esrasefik just wondering if you've gotten this to work, and if you wouldn't mind sharing your script? I am trying to do the same with my differential expression analysis.

esrasefik · 2021-01-27T17:14:05Z

@jgamache014 - I unfortunately haven't been able to. I decided to switch to using MAST directly instead of using the Seurat interface given the little support the developers provided for this issue. I don't yet have a solution I can share - still working on it. Sorry!

jgamache014 · 2021-01-27T21:38:37Z

@esrasefik Thanks, I'll try using MAST directly as well!

JonThom · 2021-02-26T13:16:18Z

@jgamache014 @esrasefik have any of you succeeded? If not, what have been the stumbling blocks? I'd like to have a crack at it.

rdalbanus · 2023-01-11T22:44:39Z

This clunky solution will add this functionality to FindMarkers by making minor changes to two internal functions in Seurat and MAST. I haven't tested this extensively, but it seems to be working as expected so far. It's helpful for exploratory analyses when you only have a seurat object. However, I'd strongly suggest going through the steps in the MAST vignette (converting Seurat to SummarizedExperiment, etc) if you are planning to use this for important analyses.

Note that it doesn't allow using empirical Bayes estimation to regularize variance, and it's considerably slower.

# We want a different behavior from MASTDETest, where the latent variable is
# used as a random effect in the model
rem_MASTDETest <- function(
  data.use,
  cells.1,
  cells.2,
  latent.vars = NULL,
  verbose = TRUE,
  # New option - random effect variable (should be included in latent.vars)
  re.var = NULL,
  ...
) {
  # Check for MAST
  if (!PackageCheck('MAST', error = FALSE)) {
    stop("Please install MAST - learn more at https://github.com/RGLab/MAST")
  }
  group.info <- data.frame(row.names = c(cells.1, cells.2))
  latent.vars <- latent.vars %||% group.info
  group.info[cells.1, "group"] <- "Group1"
  group.info[cells.2, "group"] <- "Group2"
  group.info[, "group"] <- factor(x = group.info[, "group"])
  latent.vars.names <- c("condition", colnames(x = latent.vars))
  latent.vars <- cbind(latent.vars, group.info)
  latent.vars$wellKey <- rownames(x = latent.vars)
  fdat <- data.frame(rownames(x = data.use))
  colnames(x = fdat)[1] <- "primerid"
  rownames(x = fdat) <- fdat[, 1]
  sca <- MAST::FromMatrix(
    exprsArray = as.matrix(x = data.use),
    check_sanity = FALSE,
    cData = latent.vars,
    fData = fdat
  )
  cond <- factor(x = SummarizedExperiment::colData(sca)$group)
  cond <- relevel(x = cond, ref = "Group1")
  SummarizedExperiment::colData(sca)$condition <- cond

  # This is the main change in the code - we want ~ ( 1 | re.var) in the formula:
  # ~ condition + lat.vars + (1 | re.var)
  if (!is.null(re.var)) {
    if (!re.var %in% latent.vars.names) {
      stop("Random effect variable should be included in latent variables!")
    }
    latent.vars.names <- latent.vars.names[!latent.vars.names %in% re.var]
    fmla <- as.formula(
      object = paste0(
        " ~ ", paste(latent.vars.names, collapse = "+"), glue("+ (1|{re.var})")
      )
    )
    # print(fmla)
    # We need glmer to make this work
    method <-  "glmer" # trying to troubleshoot this - it can clash with the already existing method var in the function call    
    zlmCond <- MAST::zlm(formula = fmla, sca = sca, method = "glmer", ...)
  } else {
    # Original code
    fmla <- as.formula(
      object = paste0(" ~ ", paste(latent.vars.names, collapse = "+"))
    )
    zlmCond <- MAST::zlm(formula = fmla, sca = sca, ...)
  }
  summaryCond <- MAST::summary(object = zlmCond, doLRT = 'conditionGroup2')
  summaryDt <- summaryCond$datatable
  
  # The output format is slightly different, so we need adapt the code
  if(!is.null(re.var)) {
    p_val <- summaryDt[summaryDt$"component" == "H", 4]$`Pr(>Chisq)`
    genes.return <- summaryDt[summaryDt$"component" == "H", 1]$primerid
  } else {
    p_val <- summaryDt[summaryDt[, "component"] == "H", 4]
    genes.return <- summaryDt[summaryDt[, "component"] == "H", 1]
  }
  
  to.return <- data.frame(p_val, row.names = genes.return)
  return(to.return)
}

# This is the zlm function with a minor change to fix a buggy variable being passed possibly due to namespace clash
my_zlm <- function (formula, sca, method = "bayesglm", silent = TRUE, ebayes = TRUE, 
    ebayesControl = NULL, force = FALSE, hook = NULL, parallel = TRUE, 
    LMlike, onlyCoef = FALSE, exprs_values = MAST::assay_idx(sca)$aidx, 
    ...) 
{
    dotsdata = list(...)$data
    if (!is.null(dotsdata)) {
        if (!missing(sca)) 
            stop("Cannot provide both `sca` and `data`")
        sca = dotsdata
    }
    if (!inherits(sca, "SingleCellAssay")) {
        if (inherits(sca, "data.frame")) {
            if (!is.null(dotsdata)) {
                return(.zlm(formula, method = method, silent = silent, 
                  ...))
            }
            else {
                return(.zlm(formula, data = sca, method = method, 
                  silent = silent, ...))
            }
        }
        else {
            stop("`sca` must inherit from `data.frame` or `SingleCellAssay`")
        }
    }
    if (missing(LMlike)) {
        method <- match.arg(method, MAST:::methodDict[, keyword])
        method <- MAST:::methodDict[keyword == method, lmMethod]
        if (!is(sca, "SingleCellAssay")) 
            stop("'sca' must be (or inherit) 'SingleCellAssay'")
        if (!is(formula, "formula")) 
            stop("'formula' must be class 'formula'")
        Formula <- MAST:::removeResponse(formula)
        priorVar <- 1
        priorDOF <- 0
        if (ebayes) {
            # if (!methodDict[lmMethod == method, implementsEbayes]) 
          if (!all(MAST:::methodDict[lmMethod == method, implementsEbayes]))
          stop("Method", method, " does not implement empirical bayes variance shrinkage.")
            ebparm <- MAST:::ebayes(t(SummarizedExperiment:::assay(sca, exprs_values)), ebayesControl, 
                model.matrix(Formula, SummarizedExperiment::colData(sca)))
            priorVar <- ebparm[["v"]]
            priorDOF <- ebparm[["df"]]
            stopifnot(all(!is.na(ebparm)))
        }
        obj <- MAST:::new_with_repaired_slots(classname = method, design = SummarizedExperiment::colData(sca), 
            formula = Formula, priorVar = priorVar, priorDOF = priorDOF, 
            extra = list(...))
    }
    else {
        if (!missing(formula)) 
            warning("Ignoring formula and using model defined in 'objLMLike'")
        if (!inherits(LMlike, "LMlike")) 
            stop("'LMlike' must inherit from class 'LMlike'")
        obj <- LMlike
    }
    ee <- t(SummarizedExperiment:::assay(sca, exprs_values))
    genes <- colnames(ee)
    ng <- length(genes)
    MM <- MAST:::model.matrix(obj)
    coefNames <- colnames(MM)
    listEE <- setNames(seq_len(ng), genes)
    obj <- MAST:::fit(obj, ee[, 1], silent = silent)
    nerror <- totalerr <- 0
    pb = progress::progress_bar$new(total = ng, format = " Completed [:bar] :percent with :err failures")
    .fitGeneSet <- function(idx) {
        hookOut <- NULL
        tt <- try({
            obj <- MAST:::fit(obj, response = ee[, idx], silent = silent, 
                quick = TRUE)
            if (!is.null(hook)) 
                hookOut <- hook(obj)
            nerror <- 0
        })
        if (is(tt, "try-error")) {
            obj@fitC <- obj@fitD <- NULL
            obj@fitted <- c(C = FALSE, D = FALSE)
            nerror <- nerror + 1
            totalerr = totalerr + 1
            if (nerror > 5 & !force) {
                stop("We seem to be having a lot of problems here...are your tests specified correctly?  \n If you're sure, set force=TRUE.", 
                  tt)
            }
        }
        pb$tick(tokens = list(err = totalerr))
        if (onlyCoef) 
            return(cbind(C = coef(obj, "C"), D = coef(obj, "D")))
        summaries <- MAST:::summarize(obj)
        structure(summaries, hookOut = hookOut)
    }
    if (!parallel || getOption("mc.cores", 1L) == 1) {
        listOfSummaries <- lapply(listEE, .fitGeneSet)
    }
    else {
        listOfSummaries <- parallel::mclapply(listEE, .fitGeneSet, 
            mc.preschedule = TRUE, mc.silent = silent)
    }
    if (onlyCoef) {
        out <- do.call(abind, c(listOfSummaries, rev.along = 0))
        return(aperm(out, c(3, 1, 2)))
    }
    cls <- sapply(listOfSummaries, function(x) class(x))
    complain <- if (force) 
        warning
    else stop
    if (mean(cls == "try-error") > 0.5) 
        complain("Lots of errors here..something is amiss.")
    hookOut <- NULL
    if (!is.null(hook)) 
        hookOut <- lapply(listOfSummaries, attr, which = "hookOut")
    message("\nDone!")
    summaries <- MAST:::collectSummaries(listOfSummaries)
    summaries[["LMlike"]] <- obj
    summaries[["sca"]] <- sca
    summaries[["priorVar"]] <- obj@priorVar
    summaries[["priorDOF"]] <- obj@priorDOF
    summaries[["hookOut"]] <- hookOut
    summaries[["exprs_values"]] <- exprs_values
    summaries[["Class"]] <- "ZlmFit"
    zfit <- do.call(new, as.list(summaries))
    zfit
}

# We will replace the original function with ours inside the Seurat namespace
assignInNamespace('MASTDETest', rem_MASTDETest, asNamespace("Seurat"))
# getFromNamespace("MASTDETest", "Seurat")

assignInNamespace('zlm', my_zlm, asNamespace("MAST"))
# getFromNamespace("zlm", "MAST")

# And now we can call our FindMarkers with random effects model for sample_ID
# Don't forget to include ebayes = F
markers <- FindMarkers(
    sobj, ident.1 = "foo", ident.2 = "bar", only.pos = TRUE, 
    test.use = "MAST", verbose = F
    , latent.vars = "sample_ID", re.var = "sample_ID", ebayes = F
  )

JonThom · 2023-01-12T19:24:14Z

@rdalbanus thanks for sharing!

I also wrote a function - it's a while back now, but here it is:
https://github.com/CBMR-Single-Cell-Omics-Platform/SCOPfunctions/blob/main/R/differential_expression.R

andrewdchen · 2023-11-02T21:00:21Z

This clunky solution will add this functionality to FindMarkers by making minor changes to two internal functions in Seurat and MAST. I haven't tested this extensively, but it seems to be working as expected so far. It's helpful for exploratory analyses when you only have a seurat object. However, I'd strongly suggest going through the steps in the MAST vignette (converting Seurat to SummarizedExperiment, etc) if you are planning to use this for important analyses.

Note that it doesn't allow using empirical Bayes estimation to regularize variance, and it's considerably slower.

# We want a different behavior from MASTDETest, where the latent variable is
# used as a random effect in the model
rem_MASTDETest <- function(
  data.use,
  cells.1,
  cells.2,
  latent.vars = NULL,
  verbose = TRUE,
  # New option - random effect variable (should be included in latent.vars)
  re.var = NULL,
  ...
) {
  # Check for MAST
  if (!PackageCheck('MAST', error = FALSE)) {
    stop("Please install MAST - learn more at https://github.com/RGLab/MAST")
  }
  group.info <- data.frame(row.names = c(cells.1, cells.2))
  latent.vars <- latent.vars %||% group.info
  group.info[cells.1, "group"] <- "Group1"
  group.info[cells.2, "group"] <- "Group2"
  group.info[, "group"] <- factor(x = group.info[, "group"])
  latent.vars.names <- c("condition", colnames(x = latent.vars))
  latent.vars <- cbind(latent.vars, group.info)
  latent.vars$wellKey <- rownames(x = latent.vars)
  fdat <- data.frame(rownames(x = data.use))
  colnames(x = fdat)[1] <- "primerid"
  rownames(x = fdat) <- fdat[, 1]
  sca <- MAST::FromMatrix(
    exprsArray = as.matrix(x = data.use),
    check_sanity = FALSE,
    cData = latent.vars,
    fData = fdat
  )
  cond <- factor(x = SummarizedExperiment::colData(sca)$group)
  cond <- relevel(x = cond, ref = "Group1")
  SummarizedExperiment::colData(sca)$condition <- cond

  # This is the main change in the code - we want ~ ( 1 | re.var) in the formula:
  # ~ condition + lat.vars + (1 | re.var)
  if (!is.null(re.var)) {
    if (!re.var %in% latent.vars.names) {
      stop("Random effect variable should be included in latent variables!")
    }
    latent.vars.names <- latent.vars.names[!latent.vars.names %in% re.var]
    fmla <- as.formula(
      object = paste0(
        " ~ ", paste(latent.vars.names, collapse = "+"), glue("+ (1|{re.var})")
      )
    )
    # print(fmla)
    # We need glmer to make this work
    method <-  "glmer" # trying to troubleshoot this - it can clash with the already existing method var in the function call    
    zlmCond <- MAST::zlm(formula = fmla, sca = sca, method = "glmer", ...)
  } else {
    # Original code
    fmla <- as.formula(
      object = paste0(" ~ ", paste(latent.vars.names, collapse = "+"))
    )
    zlmCond <- MAST::zlm(formula = fmla, sca = sca, ...)
  }
  summaryCond <- MAST::summary(object = zlmCond, doLRT = 'conditionGroup2')
  summaryDt <- summaryCond$datatable
  
  # The output format is slightly different, so we need adapt the code
  if(!is.null(re.var)) {
    p_val <- summaryDt[summaryDt$"component" == "H", 4]$`Pr(>Chisq)`
    genes.return <- summaryDt[summaryDt$"component" == "H", 1]$primerid
  } else {
    p_val <- summaryDt[summaryDt[, "component"] == "H", 4]
    genes.return <- summaryDt[summaryDt[, "component"] == "H", 1]
  }
  
  to.return <- data.frame(p_val, row.names = genes.return)
  return(to.return)
}

# This is the zlm function with a minor change to fix a buggy variable being passed possibly due to namespace clash
my_zlm <- function (formula, sca, method = "bayesglm", silent = TRUE, ebayes = TRUE, 
    ebayesControl = NULL, force = FALSE, hook = NULL, parallel = TRUE, 
    LMlike, onlyCoef = FALSE, exprs_values = MAST::assay_idx(sca)$aidx, 
    ...) 
{
    dotsdata = list(...)$data
    if (!is.null(dotsdata)) {
        if (!missing(sca)) 
            stop("Cannot provide both `sca` and `data`")
        sca = dotsdata
    }
    if (!inherits(sca, "SingleCellAssay")) {
        if (inherits(sca, "data.frame")) {
            if (!is.null(dotsdata)) {
                return(.zlm(formula, method = method, silent = silent, 
                  ...))
            }
            else {
                return(.zlm(formula, data = sca, method = method, 
                  silent = silent, ...))
            }
        }
        else {
            stop("`sca` must inherit from `data.frame` or `SingleCellAssay`")
        }
    }
    if (missing(LMlike)) {
        method <- match.arg(method, MAST:::methodDict[, keyword])
        method <- MAST:::methodDict[keyword == method, lmMethod]
        if (!is(sca, "SingleCellAssay")) 
            stop("'sca' must be (or inherit) 'SingleCellAssay'")
        if (!is(formula, "formula")) 
            stop("'formula' must be class 'formula'")
        Formula <- MAST:::removeResponse(formula)
        priorVar <- 1
        priorDOF <- 0
        if (ebayes) {
            # if (!methodDict[lmMethod == method, implementsEbayes]) 
          if (!all(MAST:::methodDict[lmMethod == method, implementsEbayes]))
          stop("Method", method, " does not implement empirical bayes variance shrinkage.")
            ebparm <- MAST:::ebayes(t(SummarizedExperiment:::assay(sca, exprs_values)), ebayesControl, 
                model.matrix(Formula, SummarizedExperiment::colData(sca)))
            priorVar <- ebparm[["v"]]
            priorDOF <- ebparm[["df"]]
            stopifnot(all(!is.na(ebparm)))
        }
        obj <- MAST:::new_with_repaired_slots(classname = method, design = SummarizedExperiment::colData(sca), 
            formula = Formula, priorVar = priorVar, priorDOF = priorDOF, 
            extra = list(...))
    }
    else {
        if (!missing(formula)) 
            warning("Ignoring formula and using model defined in 'objLMLike'")
        if (!inherits(LMlike, "LMlike")) 
            stop("'LMlike' must inherit from class 'LMlike'")
        obj <- LMlike
    }
    ee <- t(SummarizedExperiment:::assay(sca, exprs_values))
    genes <- colnames(ee)
    ng <- length(genes)
    MM <- MAST:::model.matrix(obj)
    coefNames <- colnames(MM)
    listEE <- setNames(seq_len(ng), genes)
    obj <- MAST:::fit(obj, ee[, 1], silent = silent)
    nerror <- totalerr <- 0
    pb = progress::progress_bar$new(total = ng, format = " Completed [:bar] :percent with :err failures")
    .fitGeneSet <- function(idx) {
        hookOut <- NULL
        tt <- try({
            obj <- MAST:::fit(obj, response = ee[, idx], silent = silent, 
                quick = TRUE)
            if (!is.null(hook)) 
                hookOut <- hook(obj)
            nerror <- 0
        })
        if (is(tt, "try-error")) {
            obj@fitC <- obj@fitD <- NULL
            obj@fitted <- c(C = FALSE, D = FALSE)
            nerror <- nerror + 1
            totalerr = totalerr + 1
            if (nerror > 5 & !force) {
                stop("We seem to be having a lot of problems here...are your tests specified correctly?  \n If you're sure, set force=TRUE.", 
                  tt)
            }
        }
        pb$tick(tokens = list(err = totalerr))
        if (onlyCoef) 
            return(cbind(C = coef(obj, "C"), D = coef(obj, "D")))
        summaries <- MAST:::summarize(obj)
        structure(summaries, hookOut = hookOut)
    }
    if (!parallel || getOption("mc.cores", 1L) == 1) {
        listOfSummaries <- lapply(listEE, .fitGeneSet)
    }
    else {
        listOfSummaries <- parallel::mclapply(listEE, .fitGeneSet, 
            mc.preschedule = TRUE, mc.silent = silent)
    }
    if (onlyCoef) {
        out <- do.call(abind, c(listOfSummaries, rev.along = 0))
        return(aperm(out, c(3, 1, 2)))
    }
    cls <- sapply(listOfSummaries, function(x) class(x))
    complain <- if (force) 
        warning
    else stop
    if (mean(cls == "try-error") > 0.5) 
        complain("Lots of errors here..something is amiss.")
    hookOut <- NULL
    if (!is.null(hook)) 
        hookOut <- lapply(listOfSummaries, attr, which = "hookOut")
    message("\nDone!")
    summaries <- MAST:::collectSummaries(listOfSummaries)
    summaries[["LMlike"]] <- obj
    summaries[["sca"]] <- sca
    summaries[["priorVar"]] <- obj@priorVar
    summaries[["priorDOF"]] <- obj@priorDOF
    summaries[["hookOut"]] <- hookOut
    summaries[["exprs_values"]] <- exprs_values
    summaries[["Class"]] <- "ZlmFit"
    zfit <- do.call(new, as.list(summaries))
    zfit
}

# We will replace the original function with ours inside the Seurat namespace
assignInNamespace('MASTDETest', rem_MASTDETest, asNamespace("Seurat"))
# getFromNamespace("MASTDETest", "Seurat")

assignInNamespace('zlm', my_zlm, asNamespace("MAST"))
# getFromNamespace("zlm", "MAST")

# And now we can call our FindMarkers with random effects model for sample_ID
# Don't forget to include ebayes = F
markers <- FindMarkers(
    sobj, ident.1 = "foo", ident.2 = "bar", only.pos = TRUE, 
    test.use = "MAST", verbose = F
    , latent.vars = "sample_ID", re.var = "sample_ID", ebayes = F
  )

@rdalbanus Thanks for writing this script! I'm wondering why you wrote it in a way such that the random effect variable needs to also be a latent variable?

rdalbanus · 2023-11-02T22:01:47Z

@andrewdchen
I only did it this way to keep code changes to a minimum - you can see that there's some internal processing of the latent.vars flag, and I didn't want risk introducing bugs.

The latent.vars flag captures all the covariates you want to include in the analyses, which get converted into a formula in the fmla <- as.formula(...) line (e.g. latent.vars = c("sex", "age") results in a model ~ sex + age in the original code).

I only added a flag that selects one of these covariates and puts it in a (1 | x ) enclosing (e.g. latent.vars = c("donor", "sex", "age"), re.var = "donor" results in ~ age + sex + (1|donor)).

esrasefik changed the title ~~Running MAST with random effects via FindMarkers~~ Running MAST with random effect for subject via FindMarkers Nov 11, 2020

jaisonj708 closed this as completed Dec 5, 2020

kdzimm mentioned this issue Mar 25, 2021

simulate() error when running Power.Rmd script kdzimm/PseudoreplicationPaper#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running MAST with random effect for subject via FindMarkers #3712

Running MAST with random effect for subject via FindMarkers #3712

esrasefik commented Nov 11, 2020

jaisonj708 commented Dec 5, 2020

jgamache014 commented Jan 27, 2021

esrasefik commented Jan 27, 2021 •

edited

jgamache014 commented Jan 27, 2021

JonThom commented Feb 26, 2021

rdalbanus commented Jan 11, 2023

JonThom commented Jan 12, 2023

andrewdchen commented Nov 2, 2023

rdalbanus commented Nov 2, 2023 •

edited

Running MAST with random effect for subject via FindMarkers #3712

Running MAST with random effect for subject via FindMarkers #3712

Comments

esrasefik commented Nov 11, 2020

jaisonj708 commented Dec 5, 2020

jgamache014 commented Jan 27, 2021

esrasefik commented Jan 27, 2021 • edited

jgamache014 commented Jan 27, 2021

JonThom commented Feb 26, 2021

rdalbanus commented Jan 11, 2023

JonThom commented Jan 12, 2023

andrewdchen commented Nov 2, 2023

rdalbanus commented Nov 2, 2023 • edited

esrasefik commented Jan 27, 2021 •

edited

rdalbanus commented Nov 2, 2023 •

edited