remove unwanted samples for a given record #94

Open
janedanes opened this Issue Feb 21, 2013 · 2 comments

Comments

Projects
None yet
3 participants

Here is what I am doing:

I'm working with a very large vcf file (816 samples). Our experiment has included duplicate samples to try and pick out false positives. The duplicate samples are identified as

Check_1:245098 (unique number is library identifier)
Check_1:245012
etc..

I've written a python program that filters out SNP sites where 6/8 duplicates with the same SNP/genotype. I then wrote a set of functions that selects the best duplicate (check_1) for that SNP site. So for one SNP site Check_1:245098 has a higher depth of coverage but for another SNP site Check_1:245012 might have the best depth of coverage. I want to create a consensus Check_1.

I can successfully remove the other 7 duplicates and create a new set of record.samples (809 instead of 816). In this new set of record.samples, I've removed the library identifier and chopped the name to just Check_1.

But I can't figure out how to write this to a file. The writer.write_record doesn't seem to work. It just rewrites all the old duplicates. And I want to write the same record but with a different set of record.samples.

I realize there is a sample filter written but as I am a relative newbie to python I am having trouble understanding how to access the methods (functions) and attributes(variables) in a class.

I would really appreciate any help.

Hello:

I am having the same issue, and would appreciate some help. I am editing sample names, ie resetting sample.sample after the record is read from file. When I try to use rec.genotype(sample), I am getting a key error, even though I can manually verify that the sample is present in the rec.samples list.

Any advice is much appreciated.

Thanks,
Matt

duxan commented Feb 26, 2017

Hi!

This one is quite old, but in case someone needs this from pyVCF, there is actually SampleFilter class (sample_filter.py), and it could be used this way:

import vcf
vcf.SampleFilter(infile=<input_vcf>, filters=<comma-separated-string>, outfile=<output_vcf>)

It will read input VCF with somewhat changed Reader and use Writer to give back filtered VCF. filters param represents unwanted samples.

@jamescasbon Will this class remain in the package, since it is not documented? Are there any downsides to it's usage like this? Thanks.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment