From 623777c3a5a1c36da415f99a8ed36794ab0d8b80 Mon Sep 17 00:00:00 2001 From: Brian Ondov Date: Tue, 25 Nov 2014 16:04:24 -0500 Subject: [PATCH] VCF clarification --- docs/content/gingr/types.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/content/gingr/types.rst b/docs/content/gingr/types.rst index 8be91c5..7005c36 100644 --- a/docs/content/gingr/types.rst +++ b/docs/content/gingr/types.rst @@ -13,5 +13,5 @@ exported to/from Gingr (or the `harvesttools` command line utility). * XMFA files can be accompanied by Fasta reference files to provide sequence between LCBs and to allow Genbank annotations (which must have matching GI numbers) to be loaded later. Genbank files that contain sequence can also be used as references. * Multi-fasta alignments will use the first sequence as the reference. Genbank annotations can be loaded later if the GIs match. * Variants - * VCF files do not store complete alignment information, so any insertions larger than one base will be replaced by an LCB boundary. The first haplotype of each genotype is used to construct the multi-alignment, and subsequent haplotypes in diploid or polyploid genotypes will be ignored. While VCF can be output, it will be stripped down to essentially haploid genotypes, qualities, and filters. - * The multi-fasta SNP output is the same format as multi-fasta alignments, but only contains columns with variants (like a Mauve SNP file). This is useful for generating phylogenetic trees, but does not contain positional information or rearrangements. + * VCF files must be imported with a Fasta reference. Since VCF does not store complete alignment information, any insertions larger than one base will be replaced by an LCB boundary when importing. If a genotype is diploid or polyploid, only the first haplotype is used in the multi-alignment (the others are ignored). Symbolic alleles and breakends are currently unsupported and will also be ignored. Note that any information ignored when importing (which also includes besides Variants can be output as a simple VCF file that contains haploid genotypes, qualities, and filters, but ignores indels are currently ignored when writing. + * The multi-fasta SNP output is the same format as multi-fasta alignments, but only contains columns with unfiltered ("PASS") variants (like a Mauve SNP file). This is useful for generating phylogenetic trees, but does not contain positional information or rearrangements.