Regularized negative binomial regression not in ScaleData and FindMarkers #268

yueqiw · 2018-01-16T22:18:27Z

Hi,

I'm interested in using the regularized negative binomial regression described in this paper: https://www.biorxiv.org/content/early/2017/09/13/105312

I saw that the RegressOutNB() and NegBinomRegDETest() functions are implemented in the Seurat source code. However, they are not included in the parameters of ScaleData() and FindMarkers() functions.

(I'm also wondering if the regularized negbinom gives better results compared to linear regression or unregularized negbinom?)

The text was updated successfully, but these errors were encountered:

satijalab · 2018-01-18T01:24:17Z

Great question! We think this is a nice method for normalizing UMI count data. While we introduce it in the preprint you cite, and do include code to run the method in Seurat, it currently runs a bit too slowly for us to recommend for most users.

You should run RegressOutNBreg() and NegBinomRegDETest() instead of FindMarkers and ScaleData. We are working on making these methods faster and will eventually incorporate them into the regular Seurat workflow. We do see a boost in performance, as the NB is the right statistical model for scRNA-seq count data, and the non-regularized version can overfit individual genes, particularly when there are <10k cells.

The difference is not massive. However, on a philosophical level, we feel like the regularized NB mode is the right way to normalize these datasets and remove sources of unwanted variation.

Thanks for the close read of our work (and source code :)).

yueqiw · 2018-04-01T20:54:31Z

Thanks for the detailed explanation. I used RegressOutNBreg() recently and it worked quite well for datasets of 1k - 10k cells and it's doesn't take long when only regressing the HVGs (even faster with mclapply).

A question I have is: why aren't the pearson residuals of RegressOutNBreg() log1p transformed for scaling? The pearson residuals of a negbinom model should still have a long tail. In another function RegressOutResid(), the residuals of poisson and negbinom undergo a log1p transform to make them less stretched out:

if (use.umi) {
    data.resid <- log1p(
      x = sweep(
        x = data.resid,
        MARGIN = 1,
        STATS = apply(X = data.resid, MARGIN = 1, FUN = min),
        FUN = "-"
      )
    )
  }

I'm wondering what's the reason for not doing log1p transform in RegressOutNBreg() (and in https://github.com/ChristophH/in-lineage/blob/master/R/lib.R)? Thanks!

Yueqi

Move metap to suggests, add install message

satijalab closed this as completed Jan 18, 2018

leonfodoulian mentioned this issue Apr 18, 2018

sequencing depth normalization #422

Closed

andrewwbutler added a commit that referenced this issue Mar 4, 2020

Merge pull request #268 from satijalab/fix/multtestDep

a7ac8fc

Move metap to suggests, add install message

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regularized negative binomial regression not in ScaleData and FindMarkers #268

Regularized negative binomial regression not in ScaleData and FindMarkers #268

yueqiw commented Jan 16, 2018

satijalab commented Jan 18, 2018

yueqiw commented Apr 1, 2018

Regularized negative binomial regression not in ScaleData and FindMarkers #268

Regularized negative binomial regression not in ScaleData and FindMarkers #268

Comments

yueqiw commented Jan 16, 2018

satijalab commented Jan 18, 2018

yueqiw commented Apr 1, 2018