Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

writeVCF not implemented. I wrote some implementation #32

Closed
beskns opened this issue Sep 25, 2019 · 6 comments
Closed

writeVCF not implemented. I wrote some implementation #32

beskns opened this issue Sep 25, 2019 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@beskns
Copy link

beskns commented Sep 25, 2019

columns = [
'gene_name',
'transcript_id',
'exons',
'ref_exon',
'alt_exon',
'ref_donor',
'alt_donor',
'ref_acceptor',
'alt_acceptor',
'ref_acceptorIntron',
'alt_acceptorIntron',
'ref_donorIntron',
'alt_donorIntron',
'delta_logit_psi',
'pathogenicity',
'efficiency'
]

def writeVCF(vcf_in, vcf_out, predictions):
from cyvcf2 import VCF, Writer
vcf = VCF(vcf_in)
vcf.add_info_to_header({
'ID': 'mmsplice',
'Description': 'MMSplice splice variant effect. Format:' + '|'.join(columns),
'Type': 'Character',
'Number': '.'
})
w = Writer(vcf_out, vcf)

for var in vcf:
    ID = f"{var.CHROM}:{var.POS}:{var.REF}:{var.ALT}"
    pred = predictions[predictions.ID == ID]
    if pred is not None:
        pred_4_var = [
            '|'.join([row[k] for k in columns[:3]]) + '|' +
            '|'.join([format(row[k], ".3f") for k in columns[3:]])
            for ind, row in pred.iterrows()
        ]
        var.INFO['mmsplice'] = '&'.join(pred_4_var)
    w.write_record(var)
@MuhammedHasan
Copy link
Collaborator

MuhammedHasan commented Sep 25, 2019

Thanks for your contribution but writeVCF is already implemented.

https://github.com/gagneurlab/MMSplice/blob/7f4aeb8bfa6cd460bccc5db593c066d1691bf1f6/mmsplice/mmsplice.py#L193

If you think this implementation is lacking some of the features, please report them. We can improve the implementation.

@MuhammedHasan MuhammedHasan added the question Further information is requested label Sep 25, 2019
@MuhammedHasan MuhammedHasan self-assigned this Sep 25, 2019
@beskns
Copy link
Author

beskns commented Sep 25, 2019 via email

@s6juncheng
Copy link
Collaborator

thanks @beskns. @MuhammedHasan maybe we put it in the utils.py and import to the first level in the init.py file for the next release.

@tstohn
Copy link

tstohn commented Aug 19, 2020

Are there any news regarding the WriteVcf functionality.
We would also be thrilled to use it in the MedGen in Tübingen.
Thanks,
Tim

@s6juncheng
Copy link
Collaborator

s6juncheng commented Sep 7, 2020

This function is now implemented in mmsplice.utils.writeVCF

def writeVCF(vcf_in, vcf_out, predictions):

New version is on pypi and can be installed with pip.

Thanks @beskns for sharing your implementation. I did some modifications based on that.

After writing the predictions as vcf file, you can read the output file into a pandas DataFrame with mmsplice.utils.read_vep

def read_vep(vep_result_path,

Hopefully, this works for your use case @tstohn.

I close the issue for now, please feel free to reopen if there is further question or request.

@tstohn
Copy link

tstohn commented Sep 15, 2020

Hey Jun,
Thanks a lot. That works for me.
Only thing I was wondering was, that i was getting a segfault when writing a variant list, for which some variants have
no mmsplice prediction.
I noticed it to be due to 'pred' in line 384 in utils.py beeing an empty DataFrame, which holds no values, which are then accessed in line 387.
In case you can not reproduce this let me know and I ll have a deeper look into it on my machine.
Thanks again & Cheers,
Tim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants