…ls 1.2 when inputs have no ##contig information
* Remember the ploidity of uncalled genotypes such that the sample genotypes written by PyVCF.Writer match the sample genotypes read by PyVCF.Reader. * For uncalled _Calls, gt_nums and gt_bases are None; gt_alleles is a list of "None" with a length of _Call.ploidity.
The VCF 4.0 and newer specifications say the ALT field is a comma separated list that includes "base Strings made up of the bases A,C,G,T,N". Notably, the last case was not handled by `Record.is_snp`, causing it to erroneously report `False` for records with "N" as the ALT.
It is not valid according to the spec, but issue #164 shows a VCF file where the FORMAT column contains just a dot character. We have no way of interpreting the subsequent genotype columns in that case, so this patch ignores them.
These coordinates should represent the zero-based, half-open region of the reference sequence affected by all the events included in ALT. These coordinates allow the user to identify precisely which bases are altered by the events in the record. Provides more thorough documentation on the coordinate schemes for _Record.POS, .start, and .end.
These changes make the behavior of Reader.fetch consistent with with pysam.Tabixfile, which uses the zero-based, half-open coordinate system for Tabixfile.fetch. See http://www.cgat.org/~andreas/documentation/pysam/api.html#pysam.Tabixfile.fetch Previously, PyVCF's Reader.fetch declared no particular coordinate system. Since the method quietly deducted 1 from the start position, apparently it assumed users were going to input a one-based coordinate there. However, users familiar with pysam's Tabixfile for other formats get an unexpected surprise when variants ahead of the start coordinate start getting returned by Reader.fetch. As _Record.start and _Record.end are in the ZBHO coordinate system, it adds to the consistency that fetch take start and end coordinates in ZBHO, so the same _Record instance could be retrieved using its .CHROM, .start, and .end coordinates. This change also removes the prior behavior of fetch of returning a single _Record instance if given only chrom and start coordinates, by implicitly doing a Tabixfile.fetch(chrom, start-1, start). The new behavior when omitting the end parameter is to return an iterator of _Record instances starting at start and continuing through the end of the chromosome chrom. Again, this is the behavior consistent with pysam.Tabixfile.fetch, and is what users ought to expect. This change also allows the user to omit both the start and end positions. In this case, an iterable of _Record instances for all records for the particular chromosome chrom will be returned, which again, is consistent with Tabixfile.fetch. This behavior also resolves Issue #123 "Cannot fetch() whole chromosome".
Decorates tests that are potentially skipped, as well as broken tests that are always skipped, as being skipped, rather than indicating falsely that these tests have passed (the result of premature return statements prior to any assertions in the tests). This introduces another dependency for Python 2.6, the unittest2 module, which back-ported this functionality from Python 2.7 and Python 3.
…ether arg (eq function), fixed edge case in _AltRecord
- Added 'vcf_record_sort_key' to allow user to specify arbitrary chromosome ordering. - Fixed issue #140 by making sure to emit all records from the current chromosome before moving on to the next one. This takes care of the problem in most typical cases (eg. when all files have records for all contigs), but not in some edge cases, in which case the 'vcf_record_sort_key' arg can be used to fully solve the problem by explicitly defining the chromosome order.