Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about adding genotypes when writing a new VCF #236

Open
hdashnow opened this issue Jun 23, 2016 · 8 comments
Open

Question about adding genotypes when writing a new VCF #236

hdashnow opened this issue Jun 23, 2016 · 8 comments

Comments

@hdashnow
Copy link

Hi, apologies for submitting a question as an issue. I wasn't quite sure where to ask it (I couldn't find a mailing list, so please point me to it if I've just missed it!).

I'm writing VCF files from scratch using PyVCF. I've worked out how to do this (mostly). The thing I can't work out how to do from the docs, is add the genotype field.

Here's the code I'm using to add a position to the vcf.

record = vcf.model._Record(CHROM=chrom, POS=start, ID='.', REF=ref_sequence,
            ALT=[vcf.model._Substitution(allele1), vcf.model._Substitution(allele2)],
            QUAL='.', FILTER='PASS', INFO={'RU':repeatunit},
            FORMAT='.', sample_indexes=[], samples=None)
vcf_truth.write_record(record)

And here's an example line of vcf output (note SAMPLE field is blank, as expected since I said samples=None)

##fileformat=VCFv4.1
##source=generate_stutter_vcfs.py
##reference=ucsc.hg19.fasta
##INFO=<ID=RU,Number=1,Type=String,Description="Repeat Unit">
##INFO=<ID=RL,Number=1,Type=Integer,Description="Reference Length of Repeat">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
chr3    63898360    .   GGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG    GGCAGCAGCAGCAGCAGCAGCAG,GGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG   .   PASS    RU=GCA  .

Where I'd like the genotype field to contain 1/2, so it should look like this:

##fileformat=VCFv4.1
##source=generate_stutter_vcfs.py
##reference=ucsc.hg19.fasta
##INFO=<ID=RU,Number=1,Type=String,Description="Repeat Unit">
##INFO=<ID=RL,Number=1,Type=Integer,Description="Reference Length of Repeat">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
chr3    63898360    .   GGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG    GGCAGCAGCAGCAGCAGCAGCAG,GGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAGCAG   .   PASS    RU=GCA  GT  1/2

In case it's relevant, note that I'm writing indels, specifically microsatellites (so there could be more than two alt alleles, but I'm just trying to solve the simple case of two alt alleles at this stage).

Any advice would be appreciated!

Thanks,
Harriet

@martijnvermaat
Copy link
Collaborator

Support for writing is definitely immature, but #82 may be of interest to you.

@hdashnow
Copy link
Author

hdashnow commented Jul 5, 2016

I've read through #82 and notice there is an old PR #136, the combination of which is leading me to believe that this functionality currently does not exist (at least not in master). Is that correct?

Are you suggesting I use this branch: https://github.com/jamescasbon/PyVCF/tree/wip/82-add-call-data?

At this stage, it's probably going to be easier for me to do this myself, since I really don't want to be depending on a functionality that is in such flux. Let me know if you get something merged into master.

@Redmar-van-den-Berg
Copy link

I ran into the same issue when I wrote a script to filter individual calls(samples) within a variant, using the FT annotation. I could not figure out how to add to the FORMAT for a variant. In the end, I just used a separate script to read the vcf as a text file, and add FT to the format field that way.

I think it would be a really nice addition to pyvcf to be able to add to the FORMAT annotations, instead of everyone having their own little hacks to work around this.

@everestial
Copy link

@hdashnow : I am having a similar problem. Can you please share, 'how you worked out your problem of updating/writing the values in the Sample field?"

@hdashnow
Copy link
Author

Hi @everestial. Gosh, that was a while ago. I think I just went back to creating VCFs manually.

@everestial
Copy link

Would it be possible for you to share your method?
I am able to the INFO field and description, but cannot figure out how to add new fields in FORMAT and it's corresponding value in SAMPLE field. The discussed add_field for _Call isn't available.

@hdashnow
Copy link
Author

I mean I literally printed lines of VCF to output. I didn't use PyVCF for the output at all. Sorry, I can't help.

@everestial
Copy link

No worries !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants