Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreasonably high doublets rate #69

Closed
zqun1 opened this issue Feb 3, 2023 · 6 comments
Closed

Unreasonably high doublets rate #69

zqun1 opened this issue Feb 3, 2023 · 6 comments

Comments

@zqun1
Copy link

zqun1 commented Feb 3, 2023

Dear developers,

Thank you very much for developing this useful tool. I tried it on my dataset. I used the samples = sampleID argument. However, I still have >10% doublets rate, which is unreasonable. Could you help please?

Here is my code:

bp <- SnowParam(8, RNGseed=1234) #to make the results reproducible. Unix use MulticoreParam()
bpstart(bp)
split_D<- scDblFinder(split_D,samples = 'sampleID',BPPARAM = bp) #splitD is my SCE object. 
bpstop(bp)
split_D@colData$scDblFinder.class %>% table
singlet doublet 
  31037    3260

Here are the numbers of cells for each sampleID:

split_D@colData$sampleID
4210      5831      6486      2981      5037      5525      1424      2803. 

I double checked in the resulting SCE object and the scDblFinder.sample equals the sampleID.

According to 10X, each sample at this cell number should contain <5% doublets: https://kb.10xgenomics.com/hc/en-us/articles/360001378811-What-is-the-maximum-number-of-cells-that-can-be-profiled-

sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocParallel_1.32.5         scDblFinder_1.13.7          SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
 [5] Biobase_2.58.0              GenomicRanges_1.50.2        GenomeInfoDb_1.34.6         IRanges_2.32.0             
 [9] S4Vectors_0.36.1            BiocGenerics_0.44.0         MatrixGenerics_1.10.0       matrixStats_0.63.0         
[13] future_1.31.0               dittoSeq_1.10.0             forcats_0.5.2               stringr_1.5.0              
[17] dplyr_1.0.10                purrr_1.0.1                 readr_2.1.3                 tidyr_1.2.1                
[21] tibble_3.1.8                ggplot2_3.4.0               tidyverse_1.3.2             plyr_1.8.8                 
[25] data.table_1.14.6           SeuratObject_4.1.3          Seurat_4.3.0          
@plger
Copy link
Owner

plger commented Feb 3, 2023

Hi,

  • I assume the sampleIDs are individual 10x captures (i.e. no cell barcoding or such)?
  • What kind of tissue is this? adult or developmental/trajectory-like?
  • Do you know how much cells were put into the machine originally?
  • Could you plot a distribution of the split_D$scDblFinder.score?
    (FYI you should avoid using @; the colData columns can be accessed directly with split_D$whatever)

@zqun1
Copy link
Author

zqun1 commented Feb 3, 2023

Thank you for the quick reply!

  1. Yes.
  2. They are sorted immune cells from adult mice.
  3. I aimed for 10k cells for sequencing. For GEM generation, I input 10- 20 k cells per sample (the vert starting step). And in the end, I only captured 1.4-6.5k cells as mentioned above.
  4. See below
p1= hist(split_D$scDblFinder.score,plot = F)
p1$density <- p1$counts/sum(p1$counts) * 100
plot(p1, freq = FALSE) 

image

Hi,

* I assume the sampleIDs are individual 10x captures (i.e. no cell barcoding or such)? 

* What kind of tissue is this? adult or developmental/trajectory-like? 

* Do you know how much cells were put into the machine originally?

* Could you plot a distribution of the `split_D$scDblFinder.score`?
  (FYI you should avoid using `@`; the colData columns can be accessed directly with `split_D$whatever`) **Thanks**

@plger
Copy link
Owner

plger commented Feb 3, 2023

Hi,
ok this is as I thought, I'm afraid you really do have ~10% or so doublets.
The determining factor for the doublet rate is the number of cells loaded, as this influences the density and hence the probability that two are captured in the same droplet. The fact that many of these cells were for instance too damaged (or otherwise...) to pass cellranger's early QC (i.e. calls of what's a cell and what's an empty droplet) doesn't influence the doublet rate. (Note that this isn't the only possible explanation for few cells / few reads in cells)
So sorry if it's a disappointment for you, but I think scDblFinder does a nice job of finding them despite having the wrong expected doublet rate :)

@zqun1
Copy link
Author

zqun1 commented Feb 3, 2023

Hi,
I see. So I should not look at the number of cells recovered from sequencing to determine the doublet rate. But for some reason, unfortunately, my recover rate is significantly lower than expected (as listed by 10X), right?

Computationally, scDblFinder only knows the number of cells I recovered from 10X. Therefore, the expected doublet rate (dbr) is probably determined by the recovered cell number, isn't it? How come the threshold for scDblFinder.score was decided so that the actual doublet rate is more than 2x of the expected rate? These questions may sound naive but I am curious 😅

@plger
Copy link
Owner

plger commented Feb 4, 2023

Hi,

Yes, you have a lower recovery rate than expected. I'm really not an expert there, but in my experience this has typically been attributable to low cell viability and/or expired/contaminated reagents (e.g. the buffer), but you'd have better luck trying to understand this with wet lab people.

Yes, scDblFinder estimates the dbr from the recovered cells. However, the thresholding is not only based on this: as described in the paper, it's also based on the ability to correctly classify artificial doublets. This often has a larger influence than the expected doublet rate, and in your case rescued the thresholding.

@zqun1
Copy link
Author

zqun1 commented Feb 4, 2023

Thank you very much, plger!
You can close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants