Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge_Sparse_Data_All with multimodal sparse matrices #104

Closed
pankomah opened this issue May 4, 2023 · 5 comments
Closed

Merge_Sparse_Data_All with multimodal sparse matrices #104

pankomah opened this issue May 4, 2023 · 5 comments

Comments

@pankomah
Copy link

pankomah commented May 4, 2023

Hi Sam,

Fantastic package -- thanks for addressing many pain points of scRNA-seq analyis! In particular, I'm finding your Read/Write Data functions really helpful as I'm accessing data from NCBI GEO. One issue though: I just downloaded data from
and used Read10X_GEO to generate a list of lists of sparse matrices (both Gene Expression and Antibody Capture for each sample). However, when I try to use Merge_Sparse_Data_All to create a single matrix as follows:
GSE_merged <- Merge_Sparse_Data_All(matrix_list = GSE_10X)
I get the following error:
Preparing & merging matrices.
| | 0%Error in curr_s[, 2] + col_offset :
non-numeric argument to binary operator
In addition: Warning message:
In format(as.integer(ll)) : NAs introduced by coercion to integer range

Would appreciate any suggestions/assistance.

Thanks,
POA

# insert reproducible example here
sessionInfo() output
PASTE HERE sessionInfo() output
@samuel-marsh
Copy link
Owner

Hi POA,

Thanks so much for kind words and so glad package is working well for you!!

Could you post the full code that you are running so I can try and figure out where the error is? I have idea but helpful to see full code.

Thanks!!
Sam

@pankomah
Copy link
Author

pankomah commented May 5, 2023

Hi Sam,

Thanks for the quick reply and for looking into it! The code is as follows:
list.files("~/ExternalData/GSE175453) ## A set of 9 samples with barcodes/features/count matrices for both gene expression and antibody capture

GSE175453 <- Read10X_GEO(data_dir = "~/ExternalData/GSE175453/") # This works fine, and creates a list of lists with RNA and ADT data for each of the 9 samples

GSE175453_merged <- Merge_Sparse_Data_All(matrix_list = GSE175453) # This returns the following error:
Preparing & merging matrices.
| | 0%Error in curr_s[, 2] + col_offset :
non-numeric argument to binary operator
In addition: Warning message:
In format(as.integer(ll)) : NAs introduced by coercion to integer range

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods
[9] base

other attached packages:
[1] viridis_0.6.3 viridisLite_0.4.2
[3] scCustomize_1.1.1 ggExtra_0.10.0
[5] ggsignif_0.6.4 pals_1.7
[7] RColorBrewer_1.1-3 readxl_1.3.1
[9] MultiK_0.1.0 sigclust_1.1.0.1
[11] writexl_1.4.2 ggsci_3.0.0
[13] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
[15] Biobase_2.50.0 GenomicRanges_1.42.0
[17] GenomeInfoDb_1.26.7 IRanges_2.24.1
[19] S4Vectors_0.28.1 BiocGenerics_0.36.1
[21] MatrixGenerics_1.2.1 matrixStats_0.63.0
[23] magrittr_2.0.3 sctransform_0.3.5
[25] clustree_0.5.0 limma_3.46.0
[27] kableExtra_1.3.4 mgsub_1.7.3
[29] ggraph_2.1.0 RCurl_1.98-1.12
[31] cowplot_1.1.1 scales_1.2.1
[33] SeuratDisk_0.0.0.9020 presto_1.0.0
[35] lubridate_1.9.2 forcats_1.0.0
[37] stringr_1.5.0 readr_2.1.4
[39] tidyr_1.3.0 tibble_3.2.1
[41] tidyverse_2.0.0 fgsea_1.25.1
[43] devtools_2.4.5 usethis_2.1.6
[45] dittoSeq_1.11.0 data.table_1.14.8
[47] Matrix_1.5-4 purrr_1.0.1
[49] patchwork_1.1.2 glue_1.6.2
[51] dplyr_1.1.2 ggplot2_3.4.2
[53] ggthemes_4.2.4 SeuratObject_4.1.3
[55] Seurat_4.3.0 rcna_0.0.99
[57] harmony_0.1.1 Rcpp_1.0.10

loaded via a namespace (and not attached):
[1] ggprism_1.0.4 scattermore_0.8 bit64_4.0.5
[4] knitr_1.42 irlba_2.3.5.1 DelayedArray_0.16.3
[7] generics_0.1.3 callr_3.7.3 RANN_2.6.1
[10] future_1.32.0 bit_4.0.4 tzdb_0.3.0
[13] spatstat.data_3.0-1 webshot_0.5.4 xml2_1.3.4
[16] httpuv_1.6.9 xfun_0.39 hms_1.1.3
[19] evaluate_0.20 promises_1.2.0.1 fansi_1.0.4
[22] igraph_1.4.2 DBI_1.1.0 htmlwidgets_1.6.2
[25] spatstat.geom_3.1-0 paletteer_1.5.0 ellipsis_0.3.2
[28] ks_1.14.0 deldir_1.0-6 vctrs_0.6.2
[31] remotes_2.4.2 ROCR_1.0-11 abind_1.4-5
[34] cachem_1.0.8 withr_2.5.0 ggforce_0.4.1
[37] progressr_0.13.0 prettyunits_1.1.1 mclust_6.0.0
[40] goftest_1.2-3 svglite_2.1.1 cluster_2.1.0
[43] lazyeval_0.2.2 crayon_1.5.2 hdf5r_1.3.8
[46] spatstat.explore_3.1-0 pkgconfig_2.0.3 tweenr_2.0.2
[49] nlme_3.1-147 vipor_0.4.5 pkgload_1.3.2
[52] rlang_1.1.1 globals_0.16.2 lifecycle_1.0.3
[55] miniUI_0.1.1.1 dichromat_2.0-0.1 ggrastr_1.0.1
[58] cellranger_1.1.0 polyclip_1.10-4 lmtest_0.9-40
[61] Nebulosa_1.0.2 zoo_1.8-12 beeswarm_0.4.0
[64] ggridges_0.5.4 GlobalOptions_0.1.2 processx_3.8.1
[67] pheatmap_1.0.12 png_0.1-8 bitops_1.0-7
[70] KernSmooth_2.23-17 shape_1.4.6 parallelly_1.35.0
[73] spatstat.random_3.1-4 memoise_2.0.1 plyr_1.8.8
[76] ica_1.0-3 zlibbioc_1.36.0 compiler_4.0.2
[79] fitdistrplus_1.1-11 snakecase_0.11.0 cli_3.6.1
[82] XVector_0.30.0 urlchecker_1.0.1 listenv_0.9.0
[85] pbapply_1.7-0 ps_1.7.5 MASS_7.3-51.6
[88] tidyselect_1.2.0 stringi_1.7.12 ggrepel_0.9.3
[91] grid_4.0.2 fastmatch_1.1-3 tools_4.0.2
[94] timechange_0.2.0 future.apply_1.10.0 circlize_0.4.15
[97] rstudioapi_0.14 janitor_2.2.0 gridExtra_2.3
[100] farver_2.1.1 Rtsne_0.16 digest_0.6.31
[103] shiny_1.7.4 pracma_2.4.2 later_1.3.1
[106] RcppAnnoy_0.0.20 httr_1.4.5 colorspace_2.1-0
[109] rvest_1.0.3 fs_1.6.2 tensor_1.5
[112] reticulate_1.28 splines_4.0.2 uwot_0.1.14
[115] rematch2_2.1.2 spatstat.utils_3.0-2 graphlayouts_1.0.0
[118] sp_1.6-0 mapproj_1.2.11 plotly_4.10.1
[121] sessioninfo_1.2.2 systemfonts_1.0.4 xtable_1.8-4
[124] jsonlite_1.8.4 tidygraph_1.2.3 R6_2.5.1
[127] profvis_0.3.8 pillar_1.9.0 htmltools_0.5.5
[130] mime_0.12 fastmap_1.1.1 BiocParallel_1.24.1
[133] codetools_0.2-16 maps_3.4.1 pkgbuild_1.4.0
[136] mvtnorm_1.1-3 utf8_1.2.3 lattice_0.20-41
[139] spatstat.sparse_3.0-1 ggbeeswarm_0.7.2 leiden_0.4.3
[142] survival_3.1-12 rmarkdown_2.21 munsell_0.5.0
[145] GenomeInfoDbData_1.2.4 reshape2_1.4.4 gtable_0.3.3

@samuel-marsh
Copy link
Owner

Hi,

Sorry for my delayed response. So the issue is due to the presence of both Gene expression and ADT for each sample. Therefore the list returned by read function is actually a list of lists that instead of a list of matrices which is what the merge function is expects.

I will work on fox to detect the presence of multimodal data and return merged matrix for each modality. I will update here when pushed to dev branch.

Best,
Sam

@pankomah
Copy link
Author

pankomah commented May 8, 2023

Thanks very much!

@samuel-marsh
Copy link
Owner

Hi,

The fix is now deployed in develop branch (v1.1.1.9008). There is new function called Merge_Sparse_Multimodal_All. This function takes the same inputs as Merge_Sparse_Data_All but works with multi-modal data. I tested it using same dataset you were using so believe it should work just fine but if you run into any issues with it please let me know and I'll reopen the issue.

GSE175453 <- Read10X_GEO(data_dir = "~/Downloads/GSE175453_RAW/")

merged <- Merge_Sparse_Multimodal_All(matrix_list = GSE175453, add_cell_ids = names(GSE175453), cell_id_delimiter = "-")

Best,
Sam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants