Filter by sample #66

merged 25 commits into from Feb 23, 2014


None yet
4 participants

lennax commented Jul 9, 2012

  • New class vcf.SampleFilter modifies Reader to filter each row's samples as they are being read
  • Class destructor removes monkey patch but modified Reader does work normally
  • Class can be used as a module or via the command line interface
  • Samples to filter can be specified by name or index
  • Specified samples are filtered by default but can be kept by specifying "invert"
  • Filter can write to any writable object (stdout, specified outfile, etc)
  • Errors and status are given with warnings and logging to allow customization

Lenna Peterson added some commits Jul 2, 2012

Lenna Peterson Initial per-sample line filtering. 75cf8f8
Lenna Peterson Improved samp filter performance, allow invert. 18deb2a
Lenna Peterson Args can be provided all at once or in sequence.
The latter style (filt3) allows semi-interactive at Python prompt.
Lenna Peterson Reduced amount of sample filter code in parser. 73376c8
Lenna Peterson Actually write out sample-filtered file. 362bbab
Lenna Peterson Switched Writer \r\n to os.linesep. d71b2cd
Lenna Peterson Fixed sample name list update/printing. bce2c47
Lenna Peterson Moved all sample filtering to filter script. 67744c0
Lenna Peterson Implemented argparse. a048ec0
Lenna Peterson Tweak args, pep8, move empty outfile warning. 19ce645
Lenna Peterson Fixed argparse arg names. 95fc70b
Lenna Peterson Changed default out to sys.stdout 67afb27
Lenna Peterson Added unit test for sample filtering script. 33d2b5c
Lenna Peterson Added authorship statement. 792d685
Lenna Peterson Added sample filter to list of scripts in setup. d78a945
Lenna Peterson Moved sample filter object to src dir. 75c4775
Lenna Peterson Using logging for easy quiet mode. 0047032
Lenna Peterson Unit test for sample filter module. 6b1fa89
Lenna Peterson Docs/test for undo_monkey_patch 817f5e9
Lenna Peterson Changed tests to use subprocess returncode. 0b0d809
Lenna Peterson Destructor undoes patch; warn if 0 samples kept 746ece9
Lenna Peterson Recommend explicit use of del. 30321c5
Lenna Peterson Added empty filter list; del is now less critical. 49f8897

I only came across this pull after I needed a sample filter and wrote a quick one myself (albeit without CLI integration). Am I right in thinking that this code still parses all the samples even if they are to be filtered? I have a VCF with 1600 samples so parsing them all is very costly and the main point of the filter was to prevent this. I only pass the wanted samples to the "_parse_samples" call. You can see my diff at

lennax commented Sep 5, 2012

My goal was to not modify the source code of the Reader class at all. My monkey patch/decorator intercepts the sample parameter to _parse_samples and removes the undesired samples, so it doesn't fully parse each sample. It might be worth doing some profiling to see if there's a significant difference in performance.


I'm a beginner to Python and I'm trying to understand your code and how to use it to remove certain samples. If I wanted to remove a sample from the example file ['NA00001'] from the vcf file what is the code I would use?


jamescasbon commented Feb 6, 2014

People want this. We should definitely rebase it!

lennax commented Feb 10, 2014

I'm a little overwhelmed at the moment but I will set aside time to rebase this within the next two weeks.


jamescasbon commented Feb 10, 2014

Thanks, Lenna!

On 10 February 2014 18:33, Lenna Peterson wrote:

I'm a little overwhelmed at the moment but I will set aside time to rebase
this within the next two weeks.

Reply to this email directly or view it on GitHub


Lenna Peterson added some commits Feb 22, 2014

Lenna Peterson Merge branch 'master' of into lenna
Lenna Peterson Restore subprocess import to test cbe8d90

lennax commented Feb 22, 2014

I've merged master back into this and it passes the tests.


jamescasbon commented Feb 23, 2014

Wow, thank you so much Lenna - lots of people have requested this. Sorry for letting it sit here so long.

jamescasbon merged commit 45513dd into jamescasbon:master Feb 23, 2014

lennax deleted the lennax:lenna branch Feb 23, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment