-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random picking of cells from an object #243
Comments
Hi, I guess you can randomly sample your cells from that cluster using
Hope that helps! Best, |
Hi Leon, Related question: "SubsetData" cannot be directly used to randomly sample 1000 cells (let's say) from a larger object? SubsetData(object, cells.use = NULL, subset.name = NULL, ident.use = NULL, max.cells.per.ident. = 1000) If this new subset is not randomly sampled, then on what criteria is it sampled? Thanks! |
Hi
This can be misleading. I would rather use the
Best, |
Hi, @del2007: What you showed as an example allows you to sample randomly a maximum of 1000 cells from each cluster who's information is stored in So if you want to sample randomly 1000 cells, independent of the clusters to which those cells belong, you can simply provide a vector of cell names to the Hope that helps! Best, |
Hello All, |
Great. Happy to hear that. However, for robustness issues, I would try to resample from |
Yep! I did it three times and got very similar clustering at the end using unique set of variable genes... But using a union of the variable genes might be even more robust. My analysis is helped by the fact that the larger cluster is very homogeneous - so, random sampling of ~1000 cells is still very representative. In any case, appreciate your help. |
Hi Leon,
Thanks for the answer!
I actually did not need to randomly sample clusters but instead I wanted to randomly sample an object - for me my starting object after filtering.
This is due to having ~100k cells in my starting object so I randomly sampled 60k or 50k with the SubsetData as I mentioned to use for the downstream analysis.
For this application, using SubsetData is fine, it seems from your answers.
Again, I’d like to confirm that it randomly samples! Does it not? If I verify the subsetted object, it does have the nr of cells I asked for in max.cells.per.ident (only one ident in one starting object). For ex., 50k or 60k.
This subset also has the same exact mean and median as my original object I’m subsetting from.
Therefore I wanted to confirm: does the SubsetData blindly randomly sample? If I always end up with the same mean and median (UMI) then is it truly “random” sampling?
Does it make sense to subsample as such even? I don’t have much choice, it’s either that or my R crashes with so many cells. Thanks again for any help!
|
Yes it does randomly sample (using the However, to avoid cases where you might have different
For your last question, I suggest you read this bioRxiv paper. Of course, your case does not exactly match theirs, since they have ~1.3M cells and, therefore, more chance to maximally enrich in rare cell types, and the tissues you're studying might be very different. But this is something you can test by minimally subsetting your data (i.e. to a point where your R doesn't crash, but that you loose the less cells), and then decreasing in the number of sampled cells and see if the results remain consistent and get recapitulated by lower number of cells. Best, |
Hello All,
I am trying to compare two clusters from two different populations from two different runs... So, I would like to merge the clusters together (using MergeSeurat option) and then recluster them to find overlap/distinctions between the clusters.
However, one of the clusters has ~10-fold more number of cells than the other one. So, I am afraid that when I calculate varianble genes, the cluster with higher number of cells is going to be overrepresented. Is there a way to maybe pick a set number of cells (but randomly) from the larger cluster so that I am comparing a similar number of cells? In other words - is there a way to randomly subscluster my cells in an unsupervised manner?
The text was updated successfully, but these errors were encountered: