Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nucl_est.accession2taxid.gz unavailable #12

Closed
swuyts opened this issue Apr 16, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@swuyts
Copy link

commented Apr 16, 2019

Hi there,

I am planning to test out this package as it seems super useful. However, while running prepareDatabase, the following error is trown:

trying URL 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid//nucl_est.accession2taxid.gz'
Error in (function (url, destfile, method, quiet = FALSE, mode = "w",  : 
  cannot open URL 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid//nucl_est.accession2taxid.gz'
Calls: prepareDatabase -> do.call -> <Anonymous> -> mapply -> <Anonymous>
In addition: Warning message:
In (function (url, destfile, method, quiet = FALSE, mode = "w",  :
  cannot open URL 'ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid//nucl_est.accession2taxid.gz': FTP status was '550 Requested action not taken; file unavailable'
Execution halted

Any insights on how to solve this?

Many thanks!
Sander

@manuelsmendoza

This comment has been minimized.

Copy link

commented Apr 21, 2019

Hi @swuyts! I think that it's solved! Please try and commend us.

If it fails again, please try the following (for Linux and macOS):

DBDIR="/path/directory/to_build/taxonomy_database"
cd $DBDIR
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid//nucl_gb.accession2taxid.gz
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid//nucl_wgs.accession2taxid.gz
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid//prot.accession2taxid.gz

Now go to R and do build the database

setwd("/path/directory/to_build/taxonomy_database")
library(taxonomizr)
getNamesAndNodes()
read.names.sql('names.dmp','accessionTaxa.sql')
read.nodes.sql('nodes.dmp','accessionTaxa.sql')
read.accession2taxid(list.files('.','accession2taxid.gz$'),'accessionTaxa.sql')

Does it work for you?

Cheers,
~MM

@sherrillmix

This comment has been minimized.

Copy link
Owner

commented Apr 22, 2019

Sorry for the late reply swuyts. Somehow I don't seem to get emails for issue openings.

In any case, the problem is from NCBI merging nucl_est and nucl_gs into nucl_gb. After the merge those files are no longer present but with default values the prepareDatabase function looks for them and errors out.

You could fix this with something like:

prepareDatabase('accessionTaxa.sql',types = c("nucl_gb", "nucl_wgs"))

but the github version of the package has been updated to fix the defaults and so can be used as normal:

prepareDatebase('accessionTaxa.sql')

I also pushed the changes to CRAN so if everything goes correctly you should shortly be able to update your package with an install.packages('taxonomizr') to fix the issue.

@swuyts

This comment has been minimized.

Copy link
Author

commented Apr 23, 2019

Hi @manuelsmendoza and @sherrillmix

I just updated the package through CRAN and the command runs as expected now! Many thanks!

Sander

@sherrillmix

This comment has been minimized.

Copy link
Owner

commented Apr 23, 2019

Great. Glad it worked out smoothly. Thanks for the update.

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.