Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downsampling #1325

Closed
pagarwal14 opened this issue Apr 5, 2019 · 3 comments
Closed

Downsampling #1325

pagarwal14 opened this issue Apr 5, 2019 · 3 comments

Comments

@pagarwal14
Copy link

Hi,
If there are different number of cells in different conditions (or technology), are there any issues with bias in the integration workflow for clustering? I would imagine if condition A has many more cells than condition B, then the clustering would be biased towards the cluster/cell types in condition A. If this is the case, are there strategies to deal with it such as downsampling. Are there any examples in of the workflow examples?
Thanks.

  • Pankaj
@satijalab
Copy link
Collaborator

Larger datasets also tend to contain more information, so we are not inherently concerned about imbalance. However, you can certainly downsample objects if you wish (i.e. to sample 1k cells)

object.downsample = subset(object, cells = sample(Cells(object), 1000))

@yueqiw
Copy link
Contributor

yueqiw commented Apr 24, 2019

@satijalab Would you recommend doing the integration with all the data and maybe subsample unbalanced dataset for clustering?

@cgalicia1014
Copy link

Does anyone have any recommendations on whether it is better to subsample before integration or after?

-Carlos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants