Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

add FILTERS.md

  • Loading branch information...
commit 9fcf4f66e2519a0c6e83d8f13767f149f7b0c22d 1 parent 7883ab4
James Casbon authored
Showing with 85 additions and 2 deletions.
  1. +78 −0 FILTERS.md
  2. +3 −2 setup.py
  3. +4 −0 vcf.py
View
78 FILTERS.md
@@ -0,0 +1,78 @@
+Filtering a VCF file based on some properties of interest is a common enough
+operation that PyVCF offers an extensible script. ``vcf_filter.py`` does
+the work of reading input, updating the metadata and filtering the records.
+You can reuse this work by providing a filter class.
+
+For example, lets say I want to filter each site based on the quality of the site.
+I can create a class like this:
+
+
+ class SiteQuality(vcf.Filter):
+
+ description = 'Filter sites by quality'
+ name = 'site_quality'
+ short_name = 'sq'
+
+ @classmethod
+ def customize_parser(self, parser):
+ parser.add_argument('--site-quality', type=int, default=30,
+ help='Filter sites below this quality')
+
+ def __init__(self, args):
+ self.threshold = args.site_quality
+
+ def __call__(self, record):
+ if record.QUAL < self.threshold:
+ return record.QUAL
+
+
+This class subclasses ``vcf.Filter`` which provides the interface for VCF filters.
+The ``description``, ``name`` and ``short_name`` are metadata about the parser.
+The ``customize_parser`` method allows you to add arguments to the script.
+We use the ``__init__`` method to grab the argument of interest from the parser.
+Finally, the ``__call__`` method processes each record and returns a value if the
+filter failed. The base class uses the ``short_name`` and ``threshold`` to create
+the filter ID in the VCF file.
+
+To make vcf_filter.py aware of the filter, you need to declare a ``vcf.filters`` entry
+point in your ``setup``:
+
+ setup(
+ ...
+ entry_points = {
+ 'vcf.filters': [
+ 'site_quality = vcf_filter:SiteQuality',
+ ]
+ }
+ )
+
+Now when you call vcf_filter.py, you should see your filter in the list of available filters:
+
+ >$ vcf_filter.py --help
+ usage: vcf_filter.py [-h] [--no-short-circuit] [--output OUTPUT]
+ [--site-quality SITE_QUALITY]
+ [--genotype-quality GENOTYPE_QUALITY]
+ input filter [filter ...]
+
+ Filter a VCF file
+
+ available filters:
+ site_quality: Filter sites by quality
+ min_genotype_quality: Demand a minimum quality associated with a non reference call
+
+ positional arguments:
+ input File to process (use - for STDIN)
+ filter Filters to use
+
+ optional arguments:
+ -h, --help show this help message and exit
+ --no-short-circuit Do not stop filter processing on a site if a single
+ filter fails.
+ --output OUTPUT Filename to output (default stdout)
+ --site-quality SITE_QUALITY
+ Filter sites below this quality
+ --genotype-quality GENOTYPE_QUALITY
+ Filter sites with no genotypes above this quality
+
+
+
View
5 setup.py
@@ -12,7 +12,7 @@
name='PyVCF',
py_modules=['vcf', 'vcf_filter'],
scripts=['vcf_melt', 'vcf_filter.py'],
- author='James Casbon',
+ author='James Casbon and @jdoughertyii',
author_email='casbon@gmail.com',
description='Variant Call Format (VCF) parser for python',
test_suite='test.test_vcf.suite',
@@ -22,5 +22,6 @@
'site_quality = vcf_filter:SiteQuality',
'vgq = vcf_filter:VariantGenotypeQuality',
]
- }
+ },
+ url='https://github.com/jamescasbon/PyVCF'
)
View
4 vcf.py
@@ -103,6 +103,10 @@
Record(CHROM=20, POS=1230237, REF=T, ALT=['.'])
+An extensible script is available to filter vcf files in vcf_filter.py. VCF filters
+declared by other packages will be available for use in this script. Please
+see FILTERS.md for full description.
+
'''
import collections
import re
Please sign in to comment.
Something went wrong with that request. Please try again.