vcffilter allows you to filter VCF files to prioritize variants in sequencing studies.
vcffilter requires python 2.7
the --info_filter
flag allows you to filter based on the contents of the INFO field. To avoid problems with shells, the symbols >, >=, <, <=, ==, != are replaced with gt, gte, lt, lte, eq, and neq, respecitively. INFO flags can be filtered with --info_filter <FLAG> is set
and --info_filter <FLAG> not set
. You can also filter strings with contains
and ncontains
.
Examples:
--info_filter CG gt 2
: Only return records with GERP scores greater than 2--info_filter DB not set
: Return variants not in dbSNP--info_filter PH contains damaging
: Only return records predicted damaging by polyphen
You can filter for variants matching a Mendelian genetic model with the --model dom
and --model rec
flags. --model dom
only returns genotypes that match a dominant model, namely those where each individual has a minor allele. --model rec
only returns genotypes that match a recessive model, that is those that match the follwing criteria: 1) everyone has two minor alleles and 2) everyone is homozygous.
--region 5 10000 20000
filters only variants within Chromosome 5, from 10000 to 20000bp--geno .5
requires that at least half of individuals must have a genotype called--no-qc
doesn't add the filter (on by default) that the FILTER column equals 'PASS'--qual 25
only returns variants with QUAL field > 25
Filters are applied in the following order:
- Region
- Minimum quality
- FILTER = 'PASS'
- Minimum call rate
- Info filters, in order they were specified.
- Mendelian Model filters