-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RunVelocity() uses "data" rather than "counts" matrix #27
Comments
New to velocity analysis. Thanks for this note! I'm wondering, in terms of the correctness, should we use the raw counts or some normalized values for Thanks, |
Hi !
If you look at the distribution of values of emat and nmat, you will see that they are identical to the distribution of values you have in your loom files
Therefore, it seems pretty clear for me that RunVelocity provides raw counts to velocyto.R Best regards |
Hi!
I've been playing with this Seurat wrapper for a while and I found something that confuses me a little.
velocyto.R::gene.relative.velocity.estimates()
expects the input matrices to be raw count matrices (Well, at least I think so. But It really looks like that, both from the docs and from the fact that it starts with adding pseudocount and log transforming the matrices).Now
RunVelocity()
usesGetAssayData()
to pull data matrices from our "spliced" and "unspliced" assays.GetAssayData()
returns the "data" matrix (i.e.cells@assays$spliced@data
), which is identical to the "counts" matrix (i.e.cells@assays$spliced@counts
), but only as long as the user did not run any normalization procedure.If the user runs some normalization, then
RunVelocity()
will use the normalized matrix as an input for velocity calculation, which I think is incorrect. Things can get even wilder in a pretty imaginable scenario where the user normalizes and scales the "spliced" assay first, say to cluster the cells, but does not touch the "unspliced" assay, and the runsRunVelocity()
. In that case,RunVelocity()
will feedvelocyto.R::gene.relative.velocity.estimates()
with normalized "spliced" matrix and raw counts "unspliced" matrix.This is not so much of a problem when users use
Seurat::SCTransform()
, like in the Vignette. SCTransform creates a new assay and then the untouched old assays are fed intoRunVelocity()
. But if one uses some different normalization procedure, the "data" matrix gets changed, which can lead to weird results, even tough the untouched "counts" matrix is still there and could have been used.This got me confused for a while so I was thinking that using the "counts" matrix or writing a note into the documentation could prevent such confusion. Or is it actually meaningful to use the normalized matrix instead of raw counts? Thank you.
The text was updated successfully, but these errors were encountered: