Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RunVelocity() uses "data" rather than "counts" matrix #27

Open
krejciadam opened this issue Feb 4, 2020 · 2 comments
Open

RunVelocity() uses "data" rather than "counts" matrix #27

krejciadam opened this issue Feb 4, 2020 · 2 comments

Comments

@krejciadam
Copy link

Hi!
I've been playing with this Seurat wrapper for a while and I found something that confuses me a little.

velocyto.R::gene.relative.velocity.estimates() expects the input matrices to be raw count matrices (Well, at least I think so. But It really looks like that, both from the docs and from the fact that it starts with adding pseudocount and log transforming the matrices).

Now RunVelocity() uses GetAssayData() to pull data matrices from our "spliced" and "unspliced" assays. GetAssayData() returns the "data" matrix (i.e. cells@assays$spliced@data), which is identical to the "counts" matrix (i.e. cells@assays$spliced@counts), but only as long as the user did not run any normalization procedure.

If the user runs some normalization, then RunVelocity() will use the normalized matrix as an input for velocity calculation, which I think is incorrect. Things can get even wilder in a pretty imaginable scenario where the user normalizes and scales the "spliced" assay first, say to cluster the cells, but does not touch the "unspliced" assay, and the runs RunVelocity(). In that case, RunVelocity() will feed velocyto.R::gene.relative.velocity.estimates() with normalized "spliced" matrix and raw counts "unspliced" matrix.

This is not so much of a problem when users use Seurat::SCTransform(), like in the Vignette. SCTransform creates a new assay and then the untouched old assays are fed into RunVelocity(). But if one uses some different normalization procedure, the "data" matrix gets changed, which can lead to weird results, even tough the untouched "counts" matrix is still there and could have been used.

This got me confused for a while so I was thinking that using the "counts" matrix or writing a note into the documentation could prevent such confusion. Or is it actually meaningful to use the normalized matrix instead of raw counts? Thank you.

@YichaoOU
Copy link

New to velocity analysis. Thanks for this note! I'm wondering, in terms of the correctness, should we use the raw counts or some normalized values for RunVelocity()?

Thanks,
Yichao

@ljouneau
Copy link

ljouneau commented Sep 8, 2021

Hi !
@krejciadam, if you execute line by line the code contained in RunVelocity, you will see that before calling velocyto.R::gene.relative.velocity.estimates, the function creates a list containing the data (spliced-emat/unspliced-nmat) and the argument to pass to velocyto (deltaT, ...)

names(args)
[1] "emat" "nmat"
[3] "deltaT" "steady.state.cells"
[5] "kCells" "kGenes"
[7] "old.fit" "mult"
[9] "min.nmat.smat.correlation" "min.nmat.emat.correlation"
[11] "min.nmat.emat.slope" "zero.offset"
[13] "deltaT2" "fit.quantile"
[15] "diagonal.quantiles" "show.gene"
[17] "do.par" "cell.dist"
[19] "emat.size" "nmat.size"
[21] "cell.emb" "cell.colors"
[23] "expression.gradient" "residual.gradient"
[25] "n.cores" "verbose"

If you look at the distribution of values of emat and nmat, you will see that they are identical to the distribution of values you have in your loom files

summary(args[["emat"]][,1])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 3.929 0.000 529.000
summary(args[["nmat"]][,1])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 1.268 0.000 265.000
summary(loom$exon[,1])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 2.509 0.000 529.000
summary(loom$intron[,1])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.8518 0.0000 265.0000

Therefore, it seems pretty clear for me that RunVelocity provides raw counts to velocyto.R

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants