Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not downloading suppressed or replaced Refseq assembly accessions #138

Open
BobFukkink opened this issue Oct 26, 2020 · 5 comments
Open

Comments

@BobFukkink
Copy link

Dear,

I am using a list of Refseq assembly accessions, I constructed a few months ago, to download the corresponding fasta files. When I try to download this list again, some of the fasta files are not downloaded.

It seems that the missing downloads are from Refseq assembly accessions that are "replaced" (e.g. https://www.ncbi.nlm.nih.gov/assembly/GCF_000699585.1/) or "suppressed" (e.g. https://www.ncbi.nlm.nih.gov/assembly/GCF_000155855.1/).

For reproducibility Is there any way of downloading these as well?

Best regards,
Bob

@jayrbolton
Copy link

👍 Also having this issue.

@jayrbolton
Copy link

Looks like the reason for this is that the FTP download url is fetched via the "*summary.txt" file (example), which only contains the latest accession versions, and doesn't list old ones.

@nick-youngblut
Copy link

I'm running into this issue as well. I'm guessing that no one has found a good solution. Would it be possible to automatically switch to the accession that has replaced the old (replaced/suppressed) accession?

@kblin
Copy link
Owner

kblin commented Aug 5, 2021

Keep in mind that ncbi-genome-download is just a fancy frontend for the NCBI FTP server, using the assembly summary files to get all the info. If the NCBI deletes a line from that file, for all ncbi-genome-download cares, that entry is gone.

@nick-youngblut
Copy link

nick-youngblut commented Aug 5, 2021

Keep in mind that ncbi-genome-download is just a fancy frontend for the NCBI FTP server, using the assembly summary files to get all the info. If the NCBI deletes a line from that file, for all ncbi-genome-download cares, that entry is gone.

It appears that the assembly_summary_historical.txt could be used. The 18th column lists the latest assembly for those assemblies that have been suppressed/replaced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants