Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add genomic range and ORF 'correction' option #22

Merged
merged 3 commits into from
Oct 22, 2021
Merged

Conversation

zdk123
Copy link
Contributor

@zdk123 zdk123 commented Oct 13, 2021

This addresses #19, supplying an option to add genomic ranges to an accession download (e.g. the from and to parameters in the request query string).

For large records, this saves a substantial amount of time and bandwidth compared to downloading the whole thing and then subsetting.

Example usage:

ncbi-acc-download NC_007194 --range 1001:9000
ncbi-acc-download NC_007194 -g 1001:9000

While combining multiple accessions with a genomic range triggers an error:

ncbi-acc-download NC_007194 NC_007195 --range 1001:9000

Of course if you are picking arbitrary coordinates like this, it is sometimes the case you'll be in the middle of an ORF. While NCBI won't complain, certain downstream applications I've run into don't like this. Therefore I've also added a correct option in the --extended-validation flag, that would filter these ORFs out. There's also a new unit test for the correction validator (note that correct does not get run when all is specified).

ncbi-acc-download NC_007194 -g 1001:9000 -e correct

zdk123 referenced this pull request in mibig-secmet/mibig-json Oct 20, 2021
Signed-off-by: Kai Blin <kblin@biosustain.dtu.dk>
@kblin
Copy link
Owner

kblin commented Oct 20, 2021

Ah, crud, I should have pinned biopython. 1.79 deprecating UnknownSeq strikes again. I'll fix this.

@kblin
Copy link
Owner

kblin commented Oct 20, 2021

weird, I can't seem to rebase this PR on the current master branch. Maybe you can update it to be based on current master? That'll fix the biopython-related test failure.

@zdk123
Copy link
Contributor Author

zdk123 commented Oct 20, 2021

I'll do that - I originally forked from DarianHole/ncbi-acc-download so that might explain it

@zdk123
Copy link
Contributor Author

zdk123 commented Oct 21, 2021

@kblin rebased

@kblin
Copy link
Owner

kblin commented Oct 22, 2021

Awesome, thanks. Apart from the wrong version number bump, things look good to me, thanks for the contribution!
I'll fix the version number and cut a new release.

@kblin kblin merged commit 2a93e24 into kblin:master Oct 22, 2021
@kblin kblin mentioned this pull request Oct 22, 2021
@zdk123
Copy link
Contributor Author

zdk123 commented Oct 22, 2021

thanks - I can contribute the usage code over at secmet/mibig-json as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants