You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is no description of the permitted values for fileDate, source or phasing fields.
Section 1.2.7 (Contig field format) states
As with chromosomal sequences [...] The format is identical to that of a reference sequence, but with an additional URL tag [...]
yet there is no description of handling chromosomes and the format of a reference sequence header record is not described.
Section 1.4.1 (Fixed fields) states
CHROM - chromosome: An identifier from the reference genome or an angle-bracketed ID string pointing to a contig in the assembly file.
The use of _or_ implies exclusivity i.e. if it is an identifier from the reference genome it is not an angle-bracketed ID string. When I create a VCF using reference genome identifiers, without inserting the chromosomes as angle-bracketed ID strings in the header, both @cyenyxe 's validator and bcftools give warnings related to missing header IDs.
The text was updated successfully, but these errors were encountered:
When I create a VCF using reference genome identifiers, without inserting the chromosomes as angle-bracketed ID strings in the header, both @cyenyxe 's validator and bcftools give warnings related to missing header IDs.
This is expected behavior. The angle-backeted ID string is referring to the CHROM field itself, not the syntax for defining reference sequences in the header (which also happens to use angle brackets). VCF 4.2 Section 5.4.2 has an example where angle-bracketed ID strings are used in the CHROM field:
#CHROM POS ID REF ALT QUAL FILTER INFO
13 123456 bnd U C C[<ctg1>: 229[ 6 PASS SVTYPE=BND
13 123457 bnd V A ] <ctg1>: 45]A 6 PASS SVTYPE=BND
<ctg1> 1 bnd X A ] <ctg1>: 329]A 6 PASS SVTYPE=BND
<ctg1> 329 bnd Y T T[<ctg1>: 1[ 6 PASS SVTYPE=BND
In this example, 13 requires a header line since it is a reference contig, but ctg1 does not since it is not in the reference genome.
Thanks for the clarification. I'll submit some suggestions as a pull request in due course.
Is there a description of the format of the assembly file? I can infer that it might be fasta from looking at the file names in the examples. Are the any restrictions on what can go in it? E.g. single versus multiple sequences.
Section 1.2
There is no description of the permitted values for
fileDate
,source
orphasing
fields.Section 1.2.7 (Contig field format) states
yet there is no description of handling chromosomes and the format of a reference sequence header record is not described.
Section 1.4.1 (Fixed fields) states
The use of _or_ implies exclusivity i.e. if it is an identifier from the reference genome it is not an angle-bracketed ID string. When I create a VCF using reference genome identifiers, without inserting the chromosomes as angle-bracketed ID strings in the header, both @cyenyxe 's validator and bcftools give warnings related to missing header IDs.
The text was updated successfully, but these errors were encountered: