Unreasonably high doublets rate #69

zqun1 · 2023-02-03T12:27:54Z

Dear developers,

Thank you very much for developing this useful tool. I tried it on my dataset. I used the samples = sampleID argument. However, I still have >10% doublets rate, which is unreasonable. Could you help please?

Here is my code:

bp <- SnowParam(8, RNGseed=1234) #to make the results reproducible. Unix use MulticoreParam()
bpstart(bp)
split_D<- scDblFinder(split_D,samples = 'sampleID',BPPARAM = bp) #splitD is my SCE object. 
bpstop(bp)
split_D@colData$scDblFinder.class %>% table

singlet doublet 
  31037    3260

Here are the numbers of cells for each sampleID:

split_D@colData$sampleID
4210      5831      6486      2981      5037      5525      1424      2803.

I double checked in the resulting SCE object and the scDblFinder.sample equals the sampleID.

According to 10X, each sample at this cell number should contain <5% doublets: https://kb.10xgenomics.com/hc/en-us/articles/360001378811-What-is-the-maximum-number-of-cells-that-can-be-profiled-

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocParallel_1.32.5         scDblFinder_1.13.7          SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
 [5] Biobase_2.58.0              GenomicRanges_1.50.2        GenomeInfoDb_1.34.6         IRanges_2.32.0             
 [9] S4Vectors_0.36.1            BiocGenerics_0.44.0         MatrixGenerics_1.10.0       matrixStats_0.63.0         
[13] future_1.31.0               dittoSeq_1.10.0             forcats_0.5.2               stringr_1.5.0              
[17] dplyr_1.0.10                purrr_1.0.1                 readr_2.1.3                 tidyr_1.2.1                
[21] tibble_3.1.8                ggplot2_3.4.0               tidyverse_1.3.2             plyr_1.8.8                 
[25] data.table_1.14.6           SeuratObject_4.1.3          Seurat_4.3.0

The text was updated successfully, but these errors were encountered:

plger · 2023-02-03T13:30:42Z

Hi,

I assume the sampleIDs are individual 10x captures (i.e. no cell barcoding or such)?
What kind of tissue is this? adult or developmental/trajectory-like?
Do you know how much cells were put into the machine originally?
Could you plot a distribution of the split_D$scDblFinder.score?
(FYI you should avoid using @; the colData columns can be accessed directly with split_D$whatever)

zqun1 · 2023-02-03T14:47:36Z

Thank you for the quick reply!

Yes.
They are sorted immune cells from adult mice.
I aimed for 10k cells for sequencing. For GEM generation, I input 10- 20 k cells per sample (the vert starting step). And in the end, I only captured 1.4-6.5k cells as mentioned above.
See below

p1= hist(split_D$scDblFinder.score,plot = F)
p1$density <- p1$counts/sum(p1$counts) * 100
plot(p1, freq = FALSE)

Hi,

* I assume the sampleIDs are individual 10x captures (i.e. no cell barcoding or such)? 

* What kind of tissue is this? adult or developmental/trajectory-like? 

* Do you know how much cells were put into the machine originally?

* Could you plot a distribution of the `split_D$scDblFinder.score`?
  (FYI you should avoid using `@`; the colData columns can be accessed directly with `split_D$whatever`) **Thanks**

plger · 2023-02-03T15:46:14Z

Hi,
ok this is as I thought, I'm afraid you really do have ~10% or so doublets.
The determining factor for the doublet rate is the number of cells loaded, as this influences the density and hence the probability that two are captured in the same droplet. The fact that many of these cells were for instance too damaged (or otherwise...) to pass cellranger's early QC (i.e. calls of what's a cell and what's an empty droplet) doesn't influence the doublet rate. (Note that this isn't the only possible explanation for few cells / few reads in cells)
So sorry if it's a disappointment for you, but I think scDblFinder does a nice job of finding them despite having the wrong expected doublet rate :)

zqun1 · 2023-02-03T20:57:02Z

Hi,
I see. So I should not look at the number of cells recovered from sequencing to determine the doublet rate. But for some reason, unfortunately, my recover rate is significantly lower than expected (as listed by 10X), right?

Computationally, scDblFinder only knows the number of cells I recovered from 10X. Therefore, the expected doublet rate (dbr) is probably determined by the recovered cell number, isn't it? How come the threshold for scDblFinder.score was decided so that the actual doublet rate is more than 2x of the expected rate? These questions may sound naive but I am curious 😅

plger · 2023-02-04T11:49:49Z

Hi,

Yes, you have a lower recovery rate than expected. I'm really not an expert there, but in my experience this has typically been attributable to low cell viability and/or expired/contaminated reagents (e.g. the buffer), but you'd have better luck trying to understand this with wet lab people.

Yes, scDblFinder estimates the dbr from the recovered cells. However, the thresholding is not only based on this: as described in the paper, it's also based on the ability to correctly classify artificial doublets. This often has a larger influence than the expected doublet rate, and in your case rescued the thresholding.

zqun1 · 2023-02-04T13:45:20Z

Thank you very much, plger!
You can close this issue now.

plger closed this as completed Feb 4, 2023

t-nol mentioned this issue Aug 17, 2023

Unreasonably high doublets rate #84

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unreasonably high doublets rate #69

Unreasonably high doublets rate #69

zqun1 commented Feb 3, 2023 •

edited

Loading

plger commented Feb 3, 2023

zqun1 commented Feb 3, 2023

plger commented Feb 3, 2023

zqun1 commented Feb 3, 2023

plger commented Feb 4, 2023

zqun1 commented Feb 4, 2023

Unreasonably high doublets rate #69

Unreasonably high doublets rate #69

Comments

zqun1 commented Feb 3, 2023 • edited Loading

plger commented Feb 3, 2023

zqun1 commented Feb 3, 2023

plger commented Feb 3, 2023

zqun1 commented Feb 3, 2023

plger commented Feb 4, 2023

zqun1 commented Feb 4, 2023

zqun1 commented Feb 3, 2023 •

edited

Loading