Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

as.matrix conversion: memory issues #10

Closed
Daliya-K opened this issue Aug 30, 2020 · 5 comments
Closed

as.matrix conversion: memory issues #10

Daliya-K opened this issue Aug 30, 2020 · 5 comments

Comments

@Daliya-K
Copy link

Hi,
I am encountering memory problems when I try to convert my Negative ADT from sparse matrix to matrix:

neg_adt_matrix = GetAssayData(NegativeObject, assay = "ADT", slot = 'counts') %>% as.matrix()
Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

Here, because no hashing was performed, I used as negative all droplets with total UMI counts between 0 and 40 and the resulting matrix is 294 x 20189901.

Could you suggest some workaround? I could probably select a smaller number of empty droplets, but I am not sure if it will be representative sample in this case?

Sessioninfo:
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /home/daliya/anaconda3/lib/libmkl_rt.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_BE.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=de_BE.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_BE.UTF-8 LC_NAME=de_BE.UTF-8 LC_ADDRESS=de_BE.UTF-8 LC_TELEPHONE=de_BE.UTF-8
[11] LC_MEASUREMENT=de_BE.UTF-8 LC_IDENTIFICATION=de_BE.UTF-8

attached base packages:
[1] grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] VennDiagram_1.6.20 futile.logger_1.4.3 dsb_0.1.0 Matrix_1.2-18 plyr_1.8.6 ggrepel_0.8.2 xlsx_0.6.3
[8] cowplot_1.0.0 SoupX_1.4.5 dplyr_1.0.0 Seurat_3.2.0 ggplot2_3.3.2

loaded via a namespace (and not attached):
[1] backports_1.1.8 igraph_1.2.5 lazyeval_0.2.2 splines_4.0.2 listenv_0.8.0 usethis_1.6.1 digest_0.6.25
[8] htmltools_0.5.0 fansi_0.4.1 magrittr_1.5 memoise_1.1.0 tensor_1.5 cluster_2.1.0 ROCR_1.0-11
[15] limma_3.44.3 remotes_2.1.1 globals_0.12.5 prettyunits_1.1.1 colorspace_1.4-1 rappdirs_0.3.1 xfun_0.15
[22] callr_3.4.3 crayon_1.3.4 jsonlite_1.7.0 textTinyR_1.1.3 spatstat_1.64-1 spatstat.data_1.4-3 survival_3.1-12
[29] zoo_1.8-8 ape_5.4 glue_1.4.1 polyclip_1.10-0 gtable_0.3.0 leiden_0.3.3 pkgbuild_1.1.0
[36] future.apply_1.6.0 abind_1.4-5 scales_1.1.1 futile.options_1.0.1 miniUI_0.1.1.1 Rcpp_1.0.5 viridisLite_0.3.0
[43] xtable_1.8-4 reticulate_1.16 rsvd_1.0.3 mclust_5.4.6 htmlwidgets_1.5.1 httr_1.4.1 RColorBrewer_1.1-2
[50] ellipsis_0.3.1 ica_1.0-2 pkgconfig_2.0.3 rJava_0.9-13 farver_2.0.3 uwot_0.1.8 deldir_0.1-28
[57] utf8_1.1.4 tidyselect_1.1.0 labeling_0.3 rlang_0.4.7 reshape2_1.4.4 later_1.1.0.1 munsell_0.5.0
[64] tools_4.0.2 cli_2.0.2 generics_0.0.2 devtools_2.3.0 ggridges_0.5.2 evaluate_0.14 stringr_1.4.0
[71] fastmap_1.0.1 yaml_2.2.1 goftest_1.2-2 processx_3.4.3 knitr_1.29 fs_1.4.2 fitdistrplus_1.1-1
[78] purrr_0.3.4 RANN_2.6.1 pbapply_1.4-2 future_1.18.0 nlme_3.1-147 mime_0.9 formatR_1.7
[85] compiler_4.0.2 rstudioapi_0.11 plotly_4.9.2.1 curl_4.3 png_0.1-7 testthat_2.3.2 spatstat.utils_1.17-0
[92] tibble_3.0.3 stringi_1.4.6 ps_1.3.3 desc_1.2.0 lattice_0.20-41 vctrs_0.3.2 pillar_1.4.6
[99] lifecycle_0.2.0 lmtest_0.9-37 RcppAnnoy_0.0.16 data.table_1.12.8 irlba_2.3.3 httpuv_1.5.4 patchwork_1.0.1.9000
[106] R6_2.4.1 promises_1.1.1 KernSmooth_2.23-17 gridExtra_2.3 sessioninfo_1.1.1 codetools_0.2-16 lambda.r_1.2.4
[113] MASS_7.3-51.6 assertthat_0.2.1 pkgload_1.1.0 xlsxjars_0.6.1 rprojroot_1.3-2 withr_2.2.0 sctransform_0.2.1
[120] mgcv_1.8-31 parallel_4.0.2 rpart_4.1-15 tidyr_1.1.0 rmarkdown_2.3 Rtsne_0.15 shiny_1.5.0
[127] base64enc_0.1-3

Thanks!
Daliya

@MattPM
Copy link
Collaborator

MattPM commented Aug 31, 2020

Hi @Daliya-K Yes you are defining too many negative drops; of those 20189901 "droplets" actually most are theoretical barcodes that are included in the cellranger output but were actually not detected and they likely have no data. You will have to apply some minimal filtering or else R is storing a huge matrix of 0s in memory. Before converting to a regular R matrix, run Matrix::colSums(log10(neg_adt_matrix)) the histogram likely has highest density by far at 0 correct? Those are barcodes in the new 10X assay but we don't have data suggesting they had any molecules of protein or mRNA during library prep so were likely not actually loaded or "seen" by the assay chemistry.

To get an unbiased background estimate you can apply some basic filtering to the raw output object. I recommend you check out the histogram image I posted in my answer to the question posted in #9 keeping in mind "region1" corresponds to droplets with no data so I define negatives with "region 2" the second peak in the log mRNA library size. Using mRNA provides some unbiased estimation because you are not thresholding with the protein library.

. I will be updating the package documentation to be more simple with some guidance on selecting background. Here is a minimal example using 10x example data: https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_protein_v3 and Seurat version 3.

library(dsb)
library(Seurat) # version 3 in example provided belo
library(tidyverse)
library(magrittr)
path_to_data = "data/10x_data/10x_pbmc5k_V3/raw_feature_bc_matrix/"
raw = Read10X(data.dir = path_to_data)

# create object with Minimal filtering (retain drops with 5 unique mRNAs detected)
s1 = CreateSeuratObject(counts = raw$`Gene Expression`,  min.cells = 10, min.features = 5)
# add some metadata 
s1$log10umi = log10(s1$nCount_RNA  + 1) 
s1$bc = rownames(s1@meta.data)

# define negative and positive drops based on an mRNA threshold to define neg and positive cells (see details below) 
hist(log10(s1$nCount_RNA  + 1), breaks = 1000)
neg_drops = s1@meta.data %>% filter(log10umi > 1.5 & log10umi < 2.79) %$% bc
positive_cells = s1@meta.data %>% filter(log10umi > 2.8 & log10umi < 4.4) %$% bc
  
# Subset protein data to create a standard R matrix of protein counts for negative droplets and cells 
neg_prot = raw$`Antibody Capture`[ ,  neg_drops ] %>% as.matrix()
positive_prot = raw$`Antibody Capture`[ , positive_cells] %>% as.matrix()

# run DSB normalization
isotypes = rownames(positive_prot)[30:32]
mtx = DSBNormalizeProtein(cell_protein_matrix = positive_prot,
                           empty_drop_matrix = neg_prot,
                           denoise.counts = TRUE, use.isotype.control = TRUE, 
                           isotype.control.name.vec = isotypes)

@MattPM
Copy link
Collaborator

MattPM commented Aug 31, 2020

And also please do let me know if the above suggestion allows you to define background that fits into memory and normalize with DSB.
Thanks, -M

@Daliya-K
Copy link
Author

Hi Matthew,

Thanks so much for the super fast and extensive reply!
I was able to apply your strategy without any further memory issues.
I just wanted to make one more remark. I've got the same error as in #6, so I applied some filtering of my ADT features (0.9 quantile cutoff). There were indeed quite some features that were very close to 0. But I noticed, that I needed to exclude those features not only from the positive adt matrix, but also from the negative one, otherwise I kept receiving the same error.
I mention this as your coment was:

update: @naity this error occurs during denoising, when you have antibodies in the cells (not the empty drops) that have close to zero counts across all cells.

Best,
Daliya

@MattPM
Copy link
Collaborator

MattPM commented Aug 31, 2020

Great! That makes sense re: filtering features the negative drops and cell containing drops need to have the same features (proteins) as rows. With big panels this will likely be an important step.

I updated the main readme for the package to make it more clear how to define the negatives without hashing data.

@Daliya-K
Copy link
Author

Daliya-K commented Sep 1, 2020

Thanks a lot for the help!

@Daliya-K Daliya-K closed this as completed Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants