forked from jdoughertyii/PyVCF
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use zero-based, half-open coordinates for Reader.fetch #156
Closed
gotgenes
wants to merge
5
commits into
jamescasbon:master
from
gotgenes:feature-zbho_tabix_coordinates
Closed
Use zero-based, half-open coordinates for Reader.fetch #156
gotgenes
wants to merge
5
commits into
jamescasbon:master
from
gotgenes:feature-zbho_tabix_coordinates
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Decorates tests that are potentially skipped, as well as broken tests that are always skipped, as being skipped, rather than indicating falsely that these tests have passed (the result of premature return statements prior to any assertions in the tests). This introduces another dependency for Python 2.6, the unittest2 module, which back-ported this functionality from Python 2.7 and Python 3.
These changes make the behavior of Reader.fetch consistent with with pysam.Tabixfile, which uses the zero-based, half-open coordinate system for Tabixfile.fetch. See http://www.cgat.org/~andreas/documentation/pysam/api.html#pysam.Tabixfile.fetch Previously, PyVCF's Reader.fetch declared no particular coordinate system. Since the method quietly deducted 1 from the start position, apparently it assumed users were going to input a one-based coordinate there. However, users familiar with pysam's Tabixfile for other formats get an unexpected surprise when variants ahead of the start coordinate start getting returned by Reader.fetch. As _Record.start and _Record.end are in the ZBHO coordinate system, it adds to the consistency that fetch take start and end coordinates in ZBHO, so the same _Record instance could be retrieved using its .CHROM, .start, and .end coordinates. This change also removes the prior behavior of fetch of returning a single _Record instance if given only chrom and start coordinates, by implicitly doing a Tabixfile.fetch(chrom, start-1, start). The new behavior when omitting the end parameter is to return an iterator of _Record instances starting at start and continuing through the end of the chromosome chrom. Again, this is the behavior consistent with pysam.Tabixfile.fetch, and is what users ought to expect. This change also allows the user to omit both the start and end positions. In this case, an iterable of _Record instances for all records for the particular chromosome chrom will be returned, which again, is consistent with Tabixfile.fetch. This behavior also resolves Issue jamescasbon#123 "Cannot fetch() whole chromosome".
martijnvermaat
added a commit
that referenced
this pull request
May 14, 2014
Use zero-based, half-open coordinates for Reader.fetch
Agreed. Many thanks, especially for the detailed commit message and docstring! |
(Merged after rebasing.) |
gotgenes
added a commit
to gotgenes/PyVCF
that referenced
this pull request
May 14, 2014
This corrects several lines that relate to the changes to fetch brought in by Pull Request jamescasbon#156.
Thanks for accepting this PR, @martijnvermaat! I have submitted PR #157 because I missed the documentation tucked away inside |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request currently sits atop PR #155. It includes one commit beyond #155, 221d7a4, that provides the changes for this pull request. Please accept #155 prior to accepting this one.
I have filed this pull request because Reader.fetch currently uses one-based indexing, which is inconsistent with pysam's Tabixfile.fetch (on which Reader.fetch relies). This caused a critical issue in one of my projects where variants were getting included that overlapped what I thought was my zero-based start index. Needless to say, I was less than pleased at the implicit and inconsistent behavior of Reader.fetch; this PR corrects this.
I quote the contents of the commit message containing the changes below