-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in readBin while reading .gz files #31
Comments
Your problem is probably related to downloading since R is reporting the large accession2taxid.gz ends prematurely. I think the workaround was downloading the files manually, e.g. with a browser, then processing with taxonomizr as normal. Whatever the issue was seemed to also mess up downloads outside R so I believe she ended up downloading on another computer. Might be worth trying a manual download here (with your own computer at first) to narrow things down. The final error sounds like some sort of permissions issue with R not able to write to C:/Users/lelio/DOCUME
Maybe also report |
Hi, |
Hi! Thanks for taking the time to answer me! As you suggested, I reinstalled the package from Github. I then changed the temporary files folder back to the initial one. After that, I tried your code and everything worked perfectly with the message printing and then being successfully removed. Then, I deleted all nodes and compressed files definitively, I changed the folder for saving and I ran the `Downloading names and nodes with getNamesAndNodes() essai de l'URL 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz.md5' Error in (function (outDir = ".", url = "ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz", : I am sure I have removed those files and I can't see them anymore... Here is the result of `R version 4.0.2 (2020-06-22) Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): Best regards, |
Hmm taxdump.tar.gz should be about 55 MB (e.g. as shown here https://ftp.ncbi.nih.gov/pub/taxonomy/). So it appears the download is truncated for you (and the function is correctly flagging a problem). I'm not sure if this is NCBI's server intermittently messing up (I've had trouble downloading from them in the past) or some bigger Windows/R issue. Could you try running the raw R command to download and check the md5 a few times:
to see if you get a 55 MB file with a consistent md5. If that |
Thanks for you quick answer! Here is the result for And here is the md5: I tried again the |
I think I'm finally able to somewhat replicate this (or a similar bug). It appears that on Mac or Windows, the
But in Windows, the command runs for about a minute and then fails as long as it's on slow internet where the download takes longer than 60 seconds. This is the same if I run the
Mac also does not respect the So on my side, I guess I just can't trust download.file to do the right thing. I'll investigate other packages to handle the simple file download. Packages And for you, I guess there's three options:
Thanks for helping me (potentially) get to the bottom of a very annoying issue. |
The github version of the package is updated to use the |
Hi! Thank you very much for your help! Everything worked perfectly for me when running the I can now finally use your package! Best regards, |
Great. Thanks a lot for the follow up and for the help tracking it down. I'll go ahead and push that version to CRAN. It's a shame to add a dependency (not sure how much pain the curl libraries add on Windows/Mac) but this will hopefully squash what has been a very difficult to nail down bug. |
Hi!
I need to obtain taxid from a huge list of accessions numbers so "taxonomizr" seems to be the perfect option.
However, I got the following error when I run
prepareDatabase
or theread.accession2taxid
commands (here after multiple tries so databases already downloaded):Downloading names and nodes with getNamesAndNodes() ./names.dmp, ./nodes.dmp already exist. Delete to redownload Downloading accession2taxid with getAccession2taxid() This can be a big (several gigabytes) download. Please be patient and use a fast connection. ./nucl_gb.accession2taxid.gz, ./nucl_wgs.accession2taxid.gz already exist. Delete to redownload Preprocessing names with read.names.sql() Preprocessing nodes with read.nodes.sql() Preprocessing accession2taxid with read.accession2taxid() Reading ./nucl_gb.accession2taxid.gz. Error in readBin(inn, what = raw(0L), size = 1L, n = BFR.SIZE) : error reading from the connection In addition : Warning message: In readBin(inn, what = raw(0L), size = 1L, n = BFR.SIZE) : invalid or incomplete compressed data
I tried many things:
- Deleting all files and redownloading them -> Same error
- Downloading only the nucl_gb file -> Same error
- Downloading manually the nucl_gb file and running read.accession2taxid separately -> Same error
- Rewriting the files with
overwrite = TRUE
in theread.names.sql
&read.nodes.sql
functions.- Changing SQL database name -> Same error, same file (both 381 184 Ko) with another name
- Changing the temporary directory (method in last answer) since I saw that taxizedb was using the same, following your reply on the Issue 3 -> Successfully changed the temporary folder but same error.
I saw that @MajaCN had exactly the same issue and managed to deal with it but the solution is not provided ("we found a work-around and have the files now!").
Last things that might be useful for resolving this problem:
- My computer is 91.1 Go left.
- I have Windows 10.
- I have the following error when running:
accessionToTaxa("Z17430.1", "accession_2_Taxa.sql")
Error: no such table: accessionTaxa Warning message: In file.remove(tmp) : impossible to delete the file 'C:/Users/lelio/DOCUME~1/STAGEM~1/LOCAL_~1\RtmpYdSW74\file3d4c119e4bf5', due to 'Permission denied'
Thanks in advance for helping me!
Best regards,
Eliot RUIZ
The text was updated successfully, but these errors were encountered: