Downsampling #1325

pagarwal14 · 2019-04-05T13:59:09Z

Hi,
If there are different number of cells in different conditions (or technology), are there any issues with bias in the integration workflow for clustering? I would imagine if condition A has many more cells than condition B, then the clustering would be biased towards the cluster/cell types in condition A. If this is the case, are there strategies to deal with it such as downsampling. Are there any examples in of the workflow examples?
Thanks.

Pankaj

satijalab · 2019-04-05T21:31:01Z

Larger datasets also tend to contain more information, so we are not inherently concerned about imbalance. However, you can certainly downsample objects if you wish (i.e. to sample 1k cells)

object.downsample = subset(object, cells = sample(Cells(object), 1000))

yueqiw · 2019-04-24T15:59:41Z

@satijalab Would you recommend doing the integration with all the data and maybe subsample unbalanced dataset for clustering?

cgalicia1014 · 2021-07-16T21:03:16Z

Does anyone have any recommendations on whether it is better to subsample before integration or after?

-Carlos

satijalab closed this as completed Apr 5, 2019

mass-a mentioned this issue May 25, 2021

Cholmod error 'problem too large' carmonalab/STACAS#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downsampling #1325

Downsampling #1325

pagarwal14 commented Apr 5, 2019

satijalab commented Apr 5, 2019

yueqiw commented Apr 24, 2019

cgalicia1014 commented Jul 16, 2021

Downsampling #1325

Downsampling #1325

Comments

pagarwal14 commented Apr 5, 2019

satijalab commented Apr 5, 2019

yueqiw commented Apr 24, 2019

cgalicia1014 commented Jul 16, 2021