-
Notifications
You must be signed in to change notification settings - Fork 8
Home
jandot edited this page Sep 14, 2010
·
4 revisions
How does the vcf2tsv function work? In contrast to the perl implementation, this vcf2tsv conserves all INFO and FORMAT tags.
Basically, it first scans the input file to get a unique list of all the INFO and FORMAT tags that are present in it (let’s call these all-info-tags and all-format-tags). The sorted INFO tags will become part of the header. As for the format tags: they are interleaved with each sample name to become part of the header as well. Then to actually process the file, it goes through each line and:
- creates the bit of the output line that concerns the INFO field
- creates a map of the INFO field (e.g. “DP=17;GN=BRCA2;CN=INTRONIC” becomes {"DP" “17”, “GN” “BRCA2”, “CN” "INTRONIC})
- goes through all-info-tags and gets the value from this map; an empty string if that tag is not present in the INFO string.
- creates the bit of the output line that concerns the FORMAT and sample fields. For each individual:
- creates a map by interleaving the split FORMAT field with the sample data
- goes through all-format-tags and gets the value from this map; an empty string if that tag is not present in the sample data