Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regularized negative binomial regression not in ScaleData and FindMarkers #268

Closed
yueqiw opened this issue Jan 16, 2018 · 2 comments
Closed

Comments

@yueqiw
Copy link
Contributor

yueqiw commented Jan 16, 2018

Hi,

I'm interested in using the regularized negative binomial regression described in this paper: https://www.biorxiv.org/content/early/2017/09/13/105312

I saw that the RegressOutNB() and NegBinomRegDETest() functions are implemented in the Seurat source code. However, they are not included in the parameters of ScaleData() and FindMarkers() functions.

(I'm also wondering if the regularized negbinom gives better results compared to linear regression or unregularized negbinom?)

@satijalab
Copy link
Collaborator

Great question! We think this is a nice method for normalizing UMI count data. While we introduce it in the preprint you cite, and do include code to run the method in Seurat, it currently runs a bit too slowly for us to recommend for most users.

You should run RegressOutNBreg() and NegBinomRegDETest() instead of FindMarkers and ScaleData. We are working on making these methods faster and will eventually incorporate them into the regular Seurat workflow. We do see a boost in performance, as the NB is the right statistical model for scRNA-seq count data, and the non-regularized version can overfit individual genes, particularly when there are <10k cells.

The difference is not massive. However, on a philosophical level, we feel like the regularized NB mode is the right way to normalize these datasets and remove sources of unwanted variation.

Thanks for the close read of our work (and source code :)).

@yueqiw
Copy link
Contributor Author

yueqiw commented Apr 1, 2018

Thanks for the detailed explanation. I used RegressOutNBreg() recently and it worked quite well for datasets of 1k - 10k cells and it's doesn't take long when only regressing the HVGs (even faster with mclapply).

A question I have is: why aren't the pearson residuals of RegressOutNBreg() log1p transformed for scaling? The pearson residuals of a negbinom model should still have a long tail. In another function RegressOutResid(), the residuals of poisson and negbinom undergo a log1p transform to make them less stretched out:

if (use.umi) {
    data.resid <- log1p(
      x = sweep(
        x = data.resid,
        MARGIN = 1,
        STATS = apply(X = data.resid, MARGIN = 1, FUN = min),
        FUN = "-"
      )
    )
  }

I'm wondering what's the reason for not doing log1p transform in RegressOutNBreg() (and in https://github.com/ChristophH/in-lineage/blob/master/R/lib.R)? Thanks!

Yueqi

andrewwbutler added a commit that referenced this issue Mar 4, 2020
Move metap to suggests, add install message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant