diff --git a/docs/content/gingr/types.rst b/docs/content/gingr/types.rst index 7005c36..87685d7 100644 --- a/docs/content/gingr/types.rst +++ b/docs/content/gingr/types.rst @@ -13,5 +13,5 @@ exported to/from Gingr (or the `harvesttools` command line utility). * XMFA files can be accompanied by Fasta reference files to provide sequence between LCBs and to allow Genbank annotations (which must have matching GI numbers) to be loaded later. Genbank files that contain sequence can also be used as references. * Multi-fasta alignments will use the first sequence as the reference. Genbank annotations can be loaded later if the GIs match. * Variants - * VCF files must be imported with a Fasta reference. Since VCF does not store complete alignment information, any insertions larger than one base will be replaced by an LCB boundary when importing. If a genotype is diploid or polyploid, only the first haplotype is used in the multi-alignment (the others are ignored). Symbolic alleles and breakends are currently unsupported and will also be ignored. Note that any information ignored when importing (which also includes besides Variants can be output as a simple VCF file that contains haploid genotypes, qualities, and filters, but ignores indels are currently ignored when writing. + * VCF files must be imported with a Fasta reference. The only fields imported are sequence identifier (CHROM), position (POS), reference allele (REF), alternate alleles (ALT), quality (QUAL), filters (FILTER, including ##FILTER specifications in the header), and genotype (GT); all other information is ignored. Additionally, Since VCF does not store complete alignment information, any insertions larger than one base will be replaced by an LCB boundary when importing. If a genotype is diploid or polyploid, only the first haplotype is used in the multi-alignment (the others are ignored). Symbolic alleles and breakends are currently unsupported and will also be ignored. When writing to VCF, only the imported fields will be populated. Indel output is also currently unimplemented, so indels will be skipped when writing. * The multi-fasta SNP output is the same format as multi-fasta alignments, but only contains columns with unfiltered ("PASS") variants (like a Mauve SNP file). This is useful for generating phylogenetic trees, but does not contain positional information or rearrangements.