Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue32 #35

Merged
merged 2 commits into from
Oct 3, 2022
Merged

Issue32 #35

merged 2 commits into from
Oct 3, 2022

Conversation

martinjzhang
Copy link
Owner

  1. Remove genes with ct_mean<0 before calling loess, addressing Data sparsity is a critical problem for the inference stability (ex. 10x data) #32 (comment)
  2. Print info for --adj-prop

Copy link
Collaborator

@KangchengHou KangchengHou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ct_mean should by definition >= 0 because count should be >= 0. Is this because of the numerical issue? What do you think about adding a small constant like 1e-6?

@martinjzhang
Copy link
Owner Author

ct_mean should by definition >= 0 because count should be >= 0. Is this because of the numerical issue? What do you think about adding a small constant like 1e-6?

This issue occurred when both --adj-prop and --cov are on and there are genes with extremely low expression. Since we have included a constant term in the regression, the transformed data transformed_X = adata.X + COV_MAT * COV_BETA + COV_GENE_MEAN has the same gene-level mean as the original data, i.e., COV_GENE_MEAN. However, when --adj-prop is on, ct_mean corresponds to the weighted average by the reciprocal of cell type proportions. The weighted mean of transformed_X may be slightly different from COV_GENE_MEAN. This makes the ct_mean of the transformed data slightly different from that of the original data, which is strictly non-negative.

This issue is not pervasive. In the PBMC3K example, there are 46 genes with negative ct_mean. The magnitudes are around 1e-4, 2 orders smaller than the median ct_mean. So I think removing them should be fine.

@martinjzhang martinjzhang merged commit 05e480b into master Oct 3, 2022
@martinjzhang martinjzhang deleted the issue32 branch April 30, 2023 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants