You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all thank you for this nice tool.
I run nebula for differential expression analysis between 2 groups and I realised that my top (by logFC) significant genes are mostly driven by some outlier cells (see the violin plots below).
I think lowly expressed genes should be filtered out like in bulk RNA-seq methods, for example like in edgeR filterByExpr function that filters genes based on a minimum count required for at least some samples and minimum total count. Similarly, it would be useful to filter genes which does not reach certain thresholds per sample and maybe per group in order to control false positive DEGs. I was wondering what would be your suggestion regarding this?
In addition; a paper that uses nebula filters those genes afterwards from the differential gene expression results (i.e, genes that were expressed in at least 5% of cells of the compared groups were used for downstream analyses).
I am not sure if keeping those lowly expressed genes in during the analysis would have a negative effect in the statistical calculations made within nebula. Do you suggest a gene filtering before (like bulk RNA-seq methods) or is it fine filtering them after running DE analysis with nebula?
The text was updated successfully, but these errors were encountered:
Hi ayyildizd,
Thank you for your question.
In the current version of nebula, two filtering criteria are used to remove
lowly-expressed genes. One is through the argument "cpc" (counts per cell
defined as total number of counts/total number of cells), and the default
value is 0.5%. The other is through the argument "mincp" (number of cells
with a positive count) and the default value is 5. The default values are
the minimum values I suggest. The optimal value depends on the data set.
Best regards,
Liang
On Wed, Feb 21, 2024 at 10:10 AM ayyildizd ***@***.***> wrote:
First of all thank you for this nice tool.
I run nebula for differential expression analysis between 2 groups and I
realised that my top (by logFC) significant genes are mostly driven by some
outlier cells (see the violin plots below).
I think lowly expressed genes should be filtered out like in bulk RNA-seq
methods, for example like in edgeR filterByExpr function that filters genes
based on a minimum count required for at least some samples and minimum
total count. Similarly, it would be useful to filter genes which does not
reach certain thresholds per sample and maybe per group in order to control
false positive DEGs. I was wondering what would be your suggestion
regarding this?
image.png (view on web)
<https://github.com/lhe17/nebula/assets/120032067/8de6a5df-9d95-49b9-95c8-bbef53b532b5>
—
Reply to this email directly, view it on GitHub
<#42>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGDISUQFTMWO3SKG5UBZBY3YUYE4ZAVCNFSM6AAAAABDTHLDKOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DMOJZGMYDONQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
First of all thank you for this nice tool.
I run nebula for differential expression analysis between 2 groups and I realised that my top (by logFC) significant genes are mostly driven by some outlier cells (see the violin plots below).
I think lowly expressed genes should be filtered out like in bulk RNA-seq methods, for example like in edgeR filterByExpr function that filters genes based on a minimum count required for at least some samples and minimum total count. Similarly, it would be useful to filter genes which does not reach certain thresholds per sample and maybe per group in order to control false positive DEGs. I was wondering what would be your suggestion regarding this?
In addition; a paper that uses nebula filters those genes afterwards from the differential gene expression results (i.e, genes that were expressed in at least 5% of cells of the compared groups were used for downstream analyses).
I am not sure if keeping those lowly expressed genes in during the analysis would have a negative effect in the statistical calculations made within nebula. Do you suggest a gene filtering before (like bulk RNA-seq methods) or is it fine filtering them after running DE analysis with nebula?
The text was updated successfully, but these errors were encountered: