Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while running PCA on 1.3 Million Brain Cells from E18 Mice #1649

Closed
BSharmi opened this issue Jun 7, 2019 · 6 comments
Closed

Error while running PCA on 1.3 Million Brain Cells from E18 Mice #1649

BSharmi opened this issue Jun 7, 2019 · 6 comments

Comments

@BSharmi
Copy link

BSharmi commented Jun 7, 2019

Hello,

I was wondering if anyone encountered this error with Seurat and the 10X million cell data.
I am trying to analyze 1.3 Million Brain Cells from E18 Mice from 10X using R (https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons) . Due to data size, I used graph clustering results containing 60 clusters and tested just one cluster. I am getting an error while running PCA on the SingleCellExperiment object.

Please find below my code:

library(SingleCellExperiment)
library(Seurat)
library(BiocParallel)
BiocParallel::register(BiocParallel::MulticoreParam(workers=12))


####################################### processed result using graph clustering ########################
## set path
fpath = '/home/bsharmi6/NA_TF_project/scRNAseq_1million/'
## read clustering csv file 
cluster.df = read.csv('/home/bsharmi6/NA_TF_project/scRNAseq_1million/analysis/clustering/graphclust/clusters.csv', h=T)
## read 10x
##https://bioconductor.org/packages/release/bioc/vignettes/zinbwave/inst/doc/intro.html
my10x = se1.3M()
## select a cluster
iclust = 21
## get cluster indices
cluster.df_i = cluster.df[cluster.df$Cluster %in% iclust,]
## reduce my10x to cluster
my10x_iclust = my10x[,my10x@colData$Barcode %in% cluster.df_i$Barcode]
## create sc object
sc <- as(my10x_iclust, "SingleCellExperiment")
## runPCA
sc <- runPCA(sc, exprs_values = "counts")

I get an error at the PCA step:
Error in curl::curlfetchmemory(url, handle = handle) : Failed to connect to hsdshdflab.hdfgroup.org port 80: Connection refused

I get the following error if I try to create a Seurat object bypassing the PCA step:

seurat <- as.Seurat(sc, data = NULL)

Error in curl::curlfetchmemory(url, handle = handle) : Failed to connect to hsdshdflab.hdfgroup.org port 80: Connection refused Calls: as.Seurat ... requestfetch -> requestfetch.write_memory -> Execution halted

The size of this cluster is not very big (27998 genes and 18919 cells) so I am wondering why is it failing. If I use the randomly sampled 20k cells generated by 10X I do not have any problem creating the Seurat object. Can someone please let me know how to solve this problem?

Thank you very much

sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /apps/easybuild/software/pegasus-sandybridge/OpenBLAS/0.3.1-GCC-7.3.0-2.30/lib/libopenblassandybridgep-r0.3.1.so

locale: [1] LCCTYPE=enUS.UTF-8 LCNUMERIC=C
[3] LCTIME=enUS.UTF-8 LCCOLLATE=enUS.UTF-8
[5] LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8
[7] LCPAPER=enUS.UTF-8 LCNAME=C
[9] LCADDRESS=C LCTELEPHONE=C
[11] LCMEASUREMENT=enUS.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] restfulSEData1.4.0 ExperimentHub1.8.0
[3] AnnotationHub2.14.5 loomR0.2.1.9000
[5] hdf5r1.2.0 R62.4.0
[7] scater1.10.1 dplyr0.8.1
[9] zinbwave1.4.2 biomaRt2.38.0
[11] ggplot23.1.1 magrittr1.5
[13] scRNAseq1.8.0 Seurat3.0.1
[15] TENxGenomics0.0.27 Matrix1.2-14
[17] BiocFileCache1.6.0 dbplyr1.2.2
[19] SingleCellExperiment1.4.1 restfulSE1.4.1
[21] SummarizedExperiment1.12.0 DelayedArray0.8.0
[23] BiocParallel1.16.6 matrixStats0.54.0
[25] Biobase2.42.0 GenomicRanges1.34.0
[27] GenomeInfoDb1.18.2 IRanges2.16.0
[29] S4Vectors0.20.1 BiocGenerics0.28.0

loaded via a namespace (and not attached): [1] copula0.999-19.1 bigrquery1.1.1
[3] plyr1.8.4 igraph1.2.4.1
[5] lazyeval0.2.2 splines3.5.1
[7] pspline1.0-18 listenv0.7.0
[9] digest0.6.19 foreach1.4.4
[11] htmltools0.3.6 viridis0.5.1
[13] GO.db3.7.0 gdata2.18.0
[15] memoise1.1.0 cluster2.0.7-1
[17] ROCR1.0-7 limma3.38.3
[19] annotate1.60.1 globals0.12.4
[21] stabledist0.7-1 R.utils2.8.0
[23] prettyunits1.0.2 colorspace1.3-2
[25] blob1.1.1 rappdirs0.3.1
[27] ggrepel0.8.1 crayon1.3.4
[29] RCurl1.95-4.11 jsonlite1.6
[31] genefilter1.64.0 iterators1.0.10
[33] survival2.44-1.1 zoo1.8-6
[35] ape5.3 glue1.3.1
[37] gtable0.3.0 zlibbioc1.28.0
[39] XVector0.22.0 Rhdf5lib1.4.3
[41] future.apply1.2.0 HDF5Array1.10.1
[43] scales1.0.0 mvtnorm1.0-10
[45] edgeR3.24.3 DBI1.0.0
[47] bibtex0.4.2 Rcpp1.0.1
[49] metap1.1 viridisLite0.3.0
[51] xtable1.8-2 progress1.2.0
[53] reticulate1.12 bit1.1-14
[55] rsvd1.0.1 SDMTools1.1-221.1
[57] rhdf5client1.4.1 tsne0.1-3
[59] glmnet2.0-16 htmlwidgets1.3
[61] httr1.4.0 gplots3.0.1.1
[63] RColorBrewer1.1-2 ica1.0-2
[65] pkgconfig2.0.2 XML3.98-1.15
[67] R.methodsS31.7.1 locfit1.5-9.1
[69] softImpute1.4 tidyselect0.2.5
[71] rlang0.3.4 reshape21.4.3
[73] later0.7.4 AnnotationDbi1.44.0
[75] munsell0.5.0 tools3.5.1
[77] RSQLite2.1.1 ggridges0.5.1
[79] stringr1.4.0 yaml2.2.0
[81] npsurv0.4-0 bit640.9-7
[83] fitdistrplus1.0-14 caTools1.17.1.2
[85] purrr0.3.2 RANN2.6.1
[87] pbapply1.4-0 future1.13.0
[89] nlme3.1-137 mime0.6
[91] R.oo1.22.0 compiler3.5.1
[93] beeswarm0.2.3 plotly4.9.0
[95] curl3.3 png0.1-7
[97] interactiveDisplayBase1.20.0 lsei1.2-0
[99] tibble2.1.2 pcaPP1.9-73
[101] stringi1.4.3 gsl2.1-6
[103] lattice0.20-35 pillar1.4.1
[105] ADGofTest0.3 BiocManager1.30.4
[107] Rdpack0.11-0 lmtest0.9-37
[109] data.table1.12.2 cowplot0.9.4
[111] bitops1.0-6 irlba2.3.3
[113] gbRd0.4-11 httpuv1.4.5
[115] promises1.0.1 KernSmooth2.23-15
[117] gridExtra2.3 vipor0.4.5
[119] codetools0.2-15 MASS7.3-50
[121] gtools3.8.1 assertthat0.2.1
[123] rhdf52.26.2 rjson0.2.20
[125] withr2.1.2 sctransform0.2.0
[127] GenomeInfoDbData1.2.0 hms0.4.2
[129] grid3.5.1 tidyr0.8.3
[131] DelayedMatrixStats1.4.0 Rtsne0.15
[133] numDeriv2016.8-1 shiny1.1.0
[135] ggbeeswarm_0.6.0 `

@jspaezp
Copy link

jspaezp commented Jun 13, 2019

Hello @BSharmi

I wonder if the problem is that each worker is requesting too much memory (since the size of the dataset will "increase" for each core used), have you tried running it in a single thread fashion to see if that is the problem?

Also, since the problem is a refused request in port 80, have you checked that your firewall or equivalent is not blocking the outgoing connection to the workers?

Best,
Sebastian

@BSharmi
Copy link
Author

BSharmi commented Jun 14, 2019 via email

@satijalab
Copy link
Collaborator

Unfortunately this does not appear to be related to Seurat, as we can certainly handle dataset sizes of 30k cells. note that you receive an identical error when trying to use runPCA on the SCE object - so this appears to be something more specific to your computational setup as opposed to the Seurat converter (apologies)

@BSharmi
Copy link
Author

BSharmi commented Jun 14, 2019 via email

@paupuigdevall
Copy link

Hi @BSharmi . Did you find a workaround to this problem? I'm also dealing with a 1.3 million cell datasets and I can't convert the anndata to Seurat due to the problem described in #1644

@sheetalgiri
Copy link

sheetalgiri commented Aug 16, 2021

@paupuigdevall @BSharmi were you able to find any workaround to this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants