Error while running PCA on 1.3 Million Brain Cells from E18 Mice #1649

BSharmi · 2019-06-07T01:16:20Z

Hello,

I was wondering if anyone encountered this error with Seurat and the 10X million cell data.
I am trying to analyze 1.3 Million Brain Cells from E18 Mice from 10X using R (https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons) . Due to data size, I used graph clustering results containing 60 clusters and tested just one cluster. I am getting an error while running PCA on the SingleCellExperiment object.

Please find below my code:

library(SingleCellExperiment)
library(Seurat)
library(BiocParallel)
BiocParallel::register(BiocParallel::MulticoreParam(workers=12))


####################################### processed result using graph clustering ########################
## set path
fpath = '/home/bsharmi6/NA_TF_project/scRNAseq_1million/'
## read clustering csv file 
cluster.df = read.csv('/home/bsharmi6/NA_TF_project/scRNAseq_1million/analysis/clustering/graphclust/clusters.csv', h=T)
## read 10x
##https://bioconductor.org/packages/release/bioc/vignettes/zinbwave/inst/doc/intro.html
my10x = se1.3M()
## select a cluster
iclust = 21
## get cluster indices
cluster.df_i = cluster.df[cluster.df$Cluster %in% iclust,]
## reduce my10x to cluster
my10x_iclust = my10x[,my10x@colData$Barcode %in% cluster.df_i$Barcode]
## create sc object
sc <- as(my10x_iclust, "SingleCellExperiment")
## runPCA
sc <- runPCA(sc, exprs_values = "counts")

I get an error at the PCA step:
Error in curl::curlfetchmemory(url, handle = handle) : Failed to connect to hsdshdflab.hdfgroup.org port 80: Connection refused

I get the following error if I try to create a Seurat object bypassing the PCA step:

seurat <- as.Seurat(sc, data = NULL)

Error in curl::curlfetchmemory(url, handle = handle) : Failed to connect to hsdshdflab.hdfgroup.org port 80: Connection refused Calls: as.Seurat ... requestfetch -> requestfetch.write_memory -> Execution halted

The size of this cluster is not very big (27998 genes and 18919 cells) so I am wondering why is it failing. If I use the randomly sampled 20k cells generated by 10X I do not have any problem creating the Seurat object. Can someone please let me know how to solve this problem?

Thank you very much

sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /apps/easybuild/software/pegasus-sandybridge/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblassandybridgep-r0.3.1.so

locale: [1] LCCTYPE=enUS.UTF-8 LCNUMERIC=C
[3] LCTIME=enUS.UTF-8 LCCOLLATE=enUS.UTF-8
[5] LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8
[7] LCPAPER=enUS.UTF-8 LCNAME=C
[9] LCADDRESS=C LCTELEPHONE=C
[11] LCMEASUREMENT=enUS.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] restfulSEData1.4.0 ExperimentHub1.8.0
[3] AnnotationHub2.14.5 loomR0.2.1.9000
[5] hdf5r1.2.0 R62.4.0
[7] scater1.10.1 dplyr0.8.1
[9] zinbwave1.4.2 biomaRt2.38.0
[11] ggplot23.1.1 magrittr1.5
[13] scRNAseq1.8.0 Seurat3.0.1
[15] TENxGenomics0.0.27 Matrix1.2-14
[17] BiocFileCache1.6.0 dbplyr1.2.2
[19] SingleCellExperiment1.4.1 restfulSE1.4.1
[21] SummarizedExperiment1.12.0 DelayedArray0.8.0
[23] BiocParallel1.16.6 matrixStats0.54.0
[25] Biobase2.42.0 GenomicRanges1.34.0
[27] GenomeInfoDb1.18.2 IRanges2.16.0
[29] S4Vectors0.20.1 BiocGenerics0.28.0

loaded via a namespace (and not attached): [1] copula0.999-19.1 bigrquery1.1.1
[3] plyr1.8.4 igraph1.2.4.1
[5] lazyeval0.2.2 splines3.5.1
[7] pspline1.0-18 listenv0.7.0
[9] digest0.6.19 foreach1.4.4
[11] htmltools0.3.6 viridis0.5.1
[13] GO.db3.7.0 gdata2.18.0
[15] memoise1.1.0 cluster2.0.7-1
[17] ROCR1.0-7 limma3.38.3
[19] annotate1.60.1 globals0.12.4
[21] stabledist0.7-1 R.utils2.8.0
[23] prettyunits1.0.2 colorspace1.3-2
[25] blob1.1.1 rappdirs0.3.1
[27] ggrepel0.8.1 crayon1.3.4
[29] RCurl1.95-4.11 jsonlite1.6
[31] genefilter1.64.0 iterators1.0.10
[33] survival2.44-1.1 zoo1.8-6
[35] ape5.3 glue1.3.1
[37] gtable0.3.0 zlibbioc1.28.0
[39] XVector0.22.0 Rhdf5lib1.4.3
[41] future.apply1.2.0 HDF5Array1.10.1
[43] scales1.0.0 mvtnorm1.0-10
[45] edgeR3.24.3 DBI1.0.0
[47] bibtex0.4.2 Rcpp1.0.1
[49] metap1.1 viridisLite0.3.0
[51] xtable1.8-2 progress1.2.0
[53] reticulate1.12 bit1.1-14
[55] rsvd1.0.1 SDMTools1.1-221.1
[57] rhdf5client1.4.1 tsne0.1-3
[59] glmnet2.0-16 htmlwidgets1.3
[61] httr1.4.0 gplots3.0.1.1
[63] RColorBrewer1.1-2 ica1.0-2
[65] pkgconfig2.0.2 XML3.98-1.15
[67] R.methodsS31.7.1 locfit1.5-9.1
[69] softImpute1.4 tidyselect0.2.5
[71] rlang0.3.4 reshape21.4.3
[73] later0.7.4 AnnotationDbi1.44.0
[75] munsell0.5.0 tools3.5.1
[77] RSQLite2.1.1 ggridges0.5.1
[79] stringr1.4.0 yaml2.2.0
[81] npsurv0.4-0 bit640.9-7
[83] fitdistrplus1.0-14 caTools1.17.1.2
[85] purrr0.3.2 RANN2.6.1
[87] pbapply1.4-0 future1.13.0
[89] nlme3.1-137 mime0.6
[91] R.oo1.22.0 compiler3.5.1
[93] beeswarm0.2.3 plotly4.9.0
[95] curl3.3 png0.1-7
[97] interactiveDisplayBase1.20.0 lsei1.2-0
[99] tibble2.1.2 pcaPP1.9-73
[101] stringi1.4.3 gsl2.1-6
[103] lattice0.20-35 pillar1.4.1
[105] ADGofTest0.3 BiocManager1.30.4
[107] Rdpack0.11-0 lmtest0.9-37
[109] data.table1.12.2 cowplot0.9.4
[111] bitops1.0-6 irlba2.3.3
[113] gbRd0.4-11 httpuv1.4.5
[115] promises1.0.1 KernSmooth2.23-15
[117] gridExtra2.3 vipor0.4.5
[119] codetools0.2-15 MASS7.3-50
[121] gtools3.8.1 assertthat0.2.1
[123] rhdf52.26.2 rjson0.2.20
[125] withr2.1.2 sctransform0.2.0
[127] GenomeInfoDbData1.2.0 hms0.4.2
[129] grid3.5.1 tidyr0.8.3
[131] DelayedMatrixStats1.4.0 Rtsne0.15
[133] numDeriv2016.8-1 shiny1.1.0
[135] ggbeeswarm_0.6.0 `

The text was updated successfully, but these errors were encountered:

jspaezp · 2019-06-13T19:31:20Z

Hello @BSharmi

I wonder if the problem is that each worker is requesting too much memory (since the size of the dataset will "increase" for each core used), have you tried running it in a single thread fashion to see if that is the problem?

Also, since the problem is a refused request in port 80, have you checked that your firewall or equivalent is not blocking the outgoing connection to the workers?

Best,
Sebastian

BSharmi · 2019-06-14T02:20:36Z

I have checked firewall and it does not seem to be blocking. Did you mean to say the error is not reproducible? Thank you. Sharmi

…

On Jun 13, 2019, at 3:31 PM, J. Sebastian Paez ***@***.***> wrote: Hello @BSharmi <https://github.com/BSharmi> I wonder if the problem is that each worker is requesting too much memory (since the size of the dataset will "increase" for each core used), have you tried running it in a single thread fashion to see if that is the problem? Also, since the problem is a refused request in port 80, have you checked that your firewall or equivalent is not blocking the outgoing connection to the workers? Best, Sebastian — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1649?email_source=notifications&email_token=ABRRGXEKOJBM4R44ME7TUDLP2KOAZA5CNFSM4HVNYLZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXUZMMQ#issuecomment-501847602>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABRRGXCC6V43N7GITYGRZOTP2KOAZANCNFSM4HVNYLZQ>.

satijalab · 2019-06-14T15:33:13Z

Unfortunately this does not appear to be related to Seurat, as we can certainly handle dataset sizes of 30k cells. note that you receive an identical error when trying to use runPCA on the SCE object - so this appears to be something more specific to your computational setup as opposed to the Seurat converter (apologies)

BSharmi · 2019-06-14T22:29:28Z

Thank you for closing the issue. While I could be wrong, and Seurat is able to handle huge data sets, I have encountered problems while using the ‘Read10X_h5’ function on the 1.3 million dataset (open issue #1644, #1644 <#1644>). Since it appears the problem is neither related to Seurat or 10X genomics, I guess I have to look into alternative options of loading large matrices. Best, Sharmi

…

On Jun 14, 2019, at 11:33 AM, satijalab ***@***.***> wrote: Closed #1649 <#1649>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1649?email_source=notifications&email_token=ABRRGXHPSYJ2ZN733E7VJZ3P2O235A5CNFSM4HVNYLZ2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOR7RZ3GI#event-2414058905>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABRRGXCVSVH7ZSVMUEG7EY3P2O235ANCNFSM4HVNYLZQ>.

paupuigdevall · 2020-11-09T14:49:33Z

Hi @BSharmi . Did you find a workaround to this problem? I'm also dealing with a 1.3 million cell datasets and I can't convert the anndata to Seurat due to the problem described in #1644

sheetalgiri · 2021-08-16T14:49:21Z

@paupuigdevall @BSharmi were you able to find any workaround to this problem?

satijalab closed this as completed Jun 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while running PCA on 1.3 Million Brain Cells from E18 Mice #1649

Error while running PCA on 1.3 Million Brain Cells from E18 Mice #1649

BSharmi commented Jun 7, 2019

jspaezp commented Jun 13, 2019

BSharmi commented Jun 14, 2019 via email

satijalab commented Jun 14, 2019

BSharmi commented Jun 14, 2019 via email

paupuigdevall commented Nov 9, 2020

sheetalgiri commented Aug 16, 2021 •

edited

Loading

Error while running PCA on 1.3 Million Brain Cells from E18 Mice #1649

Error while running PCA on 1.3 Million Brain Cells from E18 Mice #1649

Comments

BSharmi commented Jun 7, 2019

jspaezp commented Jun 13, 2019

BSharmi commented Jun 14, 2019 via email

satijalab commented Jun 14, 2019

BSharmi commented Jun 14, 2019 via email

paupuigdevall commented Nov 9, 2020

sheetalgiri commented Aug 16, 2021 • edited Loading

sheetalgiri commented Aug 16, 2021 •

edited

Loading