Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many subpopulations? #10

Open
MingBit opened this issue May 2, 2018 · 10 comments
Open

Too many subpopulations? #10

MingBit opened this issue May 2, 2018 · 10 comments

Comments

@MingBit
Copy link

MingBit commented May 2, 2018

Hi,

Thanks again for the great work. :)
I'm testing cellrouter with our own data (Two conditions at day 3). There are ~20 cell sub-populations identified. K = 12 which was defined by findK function. I did try other K values as well.

There are relatively few clusters identified by SC3, which seems to be close to our expectation. So I'm wondering if cellrouter tends to give many sub-populations, even though the input data is collected from two conditions at single timepoint.

Looking forward to your response.

@edroaldo
Copy link
Owner

edroaldo commented May 2, 2018 via email

@MingBit
Copy link
Author

MingBit commented May 3, 2018

Thanks for your explanation. :)
In terms of K values, as I understood, cell sub-populations are identified from the generated KNN graph and then detected by Louvain community detection method. Sub-populations will be further used for trajectory analysis.
In your tutorial example, K=5 was used for cells clustering and K=10 was used for trajectory analysis. So I'm wondering should the K values be identical in two analysis part? Otherwise, cell-subpopulations would be different.
Thanks.

@edroaldo
Copy link
Owner

edroaldo commented May 3, 2018 via email

@MingBit
Copy link
Author

MingBit commented May 4, 2018

Yesss... It's clear enough for me about choosing K values. :D
Since I currently only have SC RNA-seq datasets at the limited time point, I'm more interested in cell clustering parts. ^_^ As I noticed, some trajectory analysis tools (e.g. Wandlust, Monocle2 and Scanpy) tend to use graph-based methods(e.g. KNN+Louvain clustering) for sub-populations identification. And some other tools, including SC3, CIDR and RaceID, use either K-means or hierarchical clustering to perform cells clustering. I've played above packages a little bit and found that, for the dataset with two or three conditions at one time point, graph-based methods tend to give more clusters compared to K-means/hierarchical clustering.

So in terms of the those single time point datasets, I'm wondering that might that be better not use graph-based methods for clustering if I just wanna do cell sup-populations identification and differential gene expression analysis. Please correct me if I was mistaken. :)

Looking forward to your suggestion.

@MingBit
Copy link
Author

MingBit commented May 4, 2018

Ah! .. one more question about gene markers.
So K values were increased a bit for getting less sup-populations. I learned that differential gene expression analysis in CellRouter is performed based on mean expression values. I tried to create feature plots by plotDRExpression() for top differentially expressed genes of each sub-population.
Two normalisation methods(log, z-score) were utilised, unfortunately they look relatively gradual and discrete. So it seems that identified clusters are not optimal and DEG analysis is dramatically affected by drop-out zeros. I'm wondering Is there any other possibilities we could estimate K if findK() cannot give a optimal value. Or perhaps I have to go back for the feature preprocessing (e.g feature selection, gene Imputation)

Sry for my frequent posting... >_<...

CellRouter is very interesting for us and I'll be presenting this method in our group.. :D

@edroaldo
Copy link
Owner

edroaldo commented May 4, 2018 via email

@edroaldo
Copy link
Owner

edroaldo commented May 4, 2018 via email

@edroaldo
Copy link
Owner

edroaldo commented May 4, 2018 via email

@MingBit
Copy link
Author

MingBit commented May 6, 2018

Hey edroaldo,

I'm very looking forward to the next version of CellRouter. 👍
Hmm.. Concerning the GRN score, I'm a little bit confused by m_t,j or m_i,j. So in this formula:
screen shot 2018-05-06 at 13 05 35

m_t,j is the mean correlation of predicted targets of gene i regulated along trajectory j

And it was mentioned again here:

Moreover, if its predicted target genes are also well correlated with the differentiation trajectory, it is more likely that the regulator is important (parameter m_i,j)

I'm wondering are they actually the same? If no, what is t in m_t,j? time series? :D
Thank you and Looking forward to your reply.

@edroaldo
Copy link
Owner

edroaldo commented May 8, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants