Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing bigWigs from GEO: Error in .local(con, format, text, ...) : UCSC library operation failed #62

Open
bschilder opened this issue Apr 1, 2022 · 5 comments

Comments

@bschilder
Copy link

bschilder commented Apr 1, 2022

Hello,

rtracklayer has been great for importing various supplementary files from GEO. However, I've run into the following error when trying to import certain bigWig files.

A couple of notes:

Reprex

GEO page.
Comes from dataset GSE188512 in a study led by @dbart1807

URL <- "ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw" 
query_granges <- GenomicRanges::GRanges("chr6:165169213-167169213")

gr <- rtracklayer::import(con = URL, which = query_granges)
gr <- rtracklayer::import.bw(con = URL, which = query_granges)

Error

 Error in .local(con, format, text, ...) : UCSC library operation failed 

Session info

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] echoannot_0.99.4                  BSgenome.Hsapiens.UCSC.hg38_1.4.4
 [3] BSgenome_1.62.0                   rtracklayer_1.54.0               
 [5] Biostrings_2.62.0                 XVector_0.34.0                   
 [7] GenomicRanges_1.46.1              GenomeInfoDb_1.30.1              
 [9] IRanges_2.28.0                    S4Vectors_0.32.4                 
[11] BiocGenerics_0.40.0              

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3                          GGally_2.1.2                           
  [3] R.methodsS3_1.8.1                       tidyr_1.2.0                            
  [5] ggplot2_3.3.5                           bit64_4.0.5                            
  [7] knitr_1.38                              DelayedArray_0.20.0                    
  [9] R.utils_2.11.0                          data.table_1.14.2                      
 [11] rpart_4.1.16                            KEGGREST_1.34.0                        
 [13] RCurl_1.98-1.6                          GEOquery_2.62.2                        
 [15] AnnotationFilter_1.18.0                 generics_0.1.2                         
 [17] GenomicFeatures_1.46.5                  RSQLite_2.2.11                         
 [19] shadowtext_0.1.1                        proxy_0.4-26                           
 [21] bit_4.0.4                               tzdb_0.3.0                             
 [23] enrichplot_1.14.2                       xml2_1.3.3                             
 [25] lubridate_1.8.0                         SummarizedExperiment_1.24.0            
 [27] assertthat_0.2.1                        viridis_0.6.2                          
 [29] gargle_1.2.0                            xfun_0.30                              
 [31] hms_1.1.1                               fansi_1.0.3                            
 [33] restfulr_0.0.13                         progress_1.2.2                         
 [35] caTools_1.18.2                          dbplyr_2.1.1                           
 [37] Rgraphviz_2.38.0                        igraph_1.3.0                           
 [39] DBI_1.1.2                               htmlwidgets_1.5.4                      
 [41] reshape_0.8.8                           purrr_0.3.4                            
 [43] ellipsis_0.3.2                          dplyr_1.0.8                            
 [45] backports_1.4.1                         biomaRt_2.50.3                         
 [47] MatrixGenerics_1.6.0                    MungeSumstats_1.3.16                   
 [49] vctrs_0.4.0                             Biobase_2.54.0                         
 [51] ensembldb_2.18.4                        cachem_1.0.6                           
 [53] withr_2.5.0                             ggforce_0.3.3                          
 [55] checkmate_2.0.0                         treeio_1.18.1                          
 [57] GenomicAlignments_1.30.0                prettyunits_1.1.1                      
 [59] cluster_2.1.3                           DOSE_3.20.1                            
 [61] ape_5.6-2                               lazyeval_0.2.2                         
 [63] crayon_1.5.1                            crul_1.2.0                             
 [65] pkgconfig_2.0.3                         tweenr_1.0.2                           
 [67] nlme_3.1-157                            pkgload_1.2.4                          
 [69] ProtGenerics_1.26.0                     XGR_1.1.8                              
 [71] nnet_7.3-17                             rlang_1.0.2                            
 [73] lifecycle_1.0.1                         filelock_1.0.2                         
 [75] httpcode_0.3.0                          BiocFileCache_2.2.1                    
 [77] echotabix_0.99.5                        dichromat_2.0-0                        
 [79] rprojroot_2.0.2                         polyclip_1.10-0                        
 [81] matrixStats_0.61.0                      graph_1.72.0                           
 [83] Matrix_1.4-1                            aplot_0.1.3                            
 [85] osfr_0.2.8                              boot_1.3-28                            
 [87] base64enc_0.1-3                         png_0.1-7                              
 [89] viridisLite_0.4.0                       rjson_0.2.21                           
 [91] clisymbols_1.2.0                        rootSolve_1.8.2.3                      
 [93] bitops_1.0-7                            R.oo_1.24.0                            
 [95] KernSmooth_2.23-20                      ggnetwork_0.5.10                       
 [97] blob_1.2.2                              stringr_1.4.0                          
 [99] qvalue_2.26.0                           regioneR_1.26.1                        
[101] dnet_1.1.7                              gridGraphics_0.5-1                     
[103] readr_2.1.2                             jpeg_0.1-9                             
[105] echodata_0.99.7                         scales_1.1.1                           
[107] memoise_2.0.1                           magrittr_2.0.3                         
[109] plyr_1.8.7                              hexbin_1.28.2                          
[111] gplots_3.1.1                            zlibbioc_1.40.0                        
[113] scatterpie_0.1.7                        compiler_4.1.0                         
[115] echoconda_0.99.5                        BiocIO_1.4.0                           
[117] RColorBrewer_1.1-2                      plotrix_3.8-2                          
[119] Rsamtools_2.10.0                        cli_3.2.0                              
[121] patchwork_1.1.1                         htmlTable_2.4.0                        
[123] Formula_1.2-4                           MASS_7.3-56                            
[125] tidyselect_1.1.2                        stringi_1.7.6                          
[127] yaml_2.3.5                              GOSemSim_2.20.0                        
[129] supraHex_1.32.0                         latticeExtra_0.6-29                    
[131] ggrepel_0.9.1                           grid_4.1.0                             
[133] VariantAnnotation_1.40.0                fastmatch_1.1-3                        
[135] tools_4.1.0                             lmom_2.8                               
[137] parallel_4.1.0                          rstudioapi_0.13                        
[139] foreign_0.8-82                          TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[141] piggyback_0.1.1                         gridExtra_2.3                          
[143] gld_2.6.4                               farver_2.1.0                           
[145] ggraph_2.0.5                            digest_0.6.29                          
[147] BiocManager_1.30.16                     Rcpp_1.0.8.3                           
[149] OrganismDbi_1.36.0                      httr_1.4.2                             
[151] AnnotationDbi_1.56.2                    RCircos_1.2.2                          
[153] ggbio_1.42.0                            biovizBase_1.42.0                      
[155] colorspace_2.0-3                        brio_1.1.3                             
[157] XML_3.99-0.9                            fs_1.5.2                               
[159] reticulate_1.24-9000                    splines_4.1.0                          
[161] yulab.utils_0.0.4                       RBGL_1.70.0                            
[163] tidytree_0.3.9                          expm_0.999-6                           
[165] gh_1.3.0                                graphlayouts_0.8.0                     
[167] Exact_3.1                               ggplotify_0.1.0                        
[169] ggtree_3.2.1                            jsonlite_1.8.0                         
[171] tidygraph_1.2.0                         ggfun_0.0.6                            
[173] testthat_3.1.3                          R6_2.5.1                               
[175] Hmisc_4.6-0                             pillar_1.7.0                           
[177] htmltools_0.5.2                         glue_1.6.2                             
[179] fastmap_1.1.0                           DT_0.22                                
[181] BiocParallel_1.28.3                     class_7.3-20                           
[183] ChIPseeker_1.30.3                       fgsea_1.20.0                           
[185] mvtnorm_1.1-3                           utf8_1.2.2                             
[187] lattice_0.20-45                         tibble_3.1.6                           
[189] curl_4.3.2                              DescTools_0.99.44                      
[191] gtools_3.9.2                            zip_2.2.0                              
[193] GO.db_3.14.0                            openxlsx_4.2.5                         
[195] survival_3.3-1                          limma_3.50.1                           
[197] googleAuthR_2.0.0                       desc_1.4.1                             
[199] munsell_0.5.0                           e1071_1.7-9                            
[201] DO.db_2.9                               GenomeInfoDbData_1.2.7                 
[203] reshape2_1.4.4                          gtable_0.3.0  

Many thanks in advance,
Brian

@sanchit-saini
Copy link
Contributor

sanchit-saini commented Apr 5, 2022

Hi @bschilder,

I tried to replicate it on Linux and I think it should behave similarly on macOS.

> gr <- rtracklayer::import.bw(con = URL, which = query_granges)
#R: TCP non-blocking connect() to ftp.ncbi.nlm.nih.gov timed-out in select() after 10000 milliseconds - Cancelling!: Operation #now in progress
#Error in .local(con, format, text, ...) : UCSC library operation failed
#In addition: Warning message:
#In .local(con, format, text, ...) :
#  Can't get data socket for ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw

Request to the URL is timed out as FTP protocol has a limit is 10000 milliseconds in UCSC kent library upon which rtracklayer relies. Hence the error states the UCSC operation failed.

Solution : It should work if you update the protocol to http from ftp. such as

suppressPackageStartupMessages(library(rtracklayer))
URL <- "http://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw" 
query_granges <- GenomicRanges::GRanges("chr6:165169213-167169213")

gr <- rtracklayer::import(con = URL, which = query_granges)
gr <- rtracklayer::import.bw(con = URL, which = query_granges)

It's surprising to know import is not working locally on macOS. If you could provide logs. It would be helpful.

@bschilder
Copy link
Author

Aha, the "http://" prefix did the trick! Never realized you could do that.

Here's the outputs from my original reprex. Apologies for not thinking to include these earlier.

Error in seqinfo(con) : UCSC library operation failed
In addition: Warning messages:
1: In seqinfo(con) :
  TCP non-blocking connect() to ftp.ncbi.nlm.nih.gov timed-out in select() after 10000 milliseconds - Cancelling!
2: In seqinfo(con) :

Screenshot 2022-04-06 at 14 32 44

I'll go ahead and add a conditional to my functions that makes sure all ftp URLs have the "http://" prefix. Would it make sense to add this feature internally to rtracklayer as well?

Thank you so much for the quick reply and solution.

All the best,
Brian

@sanchit-saini
Copy link
Contributor

sanchit-saini commented Apr 9, 2022

Would it make sense to add this feature internally to rtracklayer as well?

rtracklayer cannot modify or insert the prefix of a URI.

The only way we get information about the protocol is from the prefix of the URI. Hence, the burden of providing the correct prefix is on the user.

Without knowing the correct protocol, we don't know how to communicate with the resource such that we cannot operate on them.

An error occurred in the screenshot because the protocol is not present in the URL.

Hope this helps.
Thanks

@bschilder
Copy link
Author

In the original example I gave, the ftp:// prefix was included and gave the same error as without it. So I don't think the error my in my most recent example was exclusively due to the omission of the ftp:// prefix (though it may very well have contributed).

However, now (as of April 10th 2022) I'm noticing that including the ftp:// prefix (without replacing it with http://) works when it didn't before. Has something changed with rtracklayer since my original post? Can you think of some reason for the inconsistency?

@sanchit-saini
Copy link
Contributor

Has something changed with rtracklayer since my original post?

Nothing's changed. It is at the same commit. https://git.bioconductor.org/packages/rtracklayer
I tried to debug it, So it seems to be working expectedly on the local FTP server. Although no success with the provided FTP URL
ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5684nnn/GSM5684359/suppl/GSM5684359_H3K27me3_CUTnTag_10k_HCT116_S6.hg38.rmdup.win100.bw.

I'm noticing that including the ftp:// prefix (without replacing it with http://) works when it didn't before.

Was it the same FTP URL or some other URL?
Can you provide the URL which worked?

Can you think of some reason for the inconsistency?

At this moment, I'm not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants