Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcf2maf can not report VEP custom database annotation? #274

Closed
tangwei1129 opened this issue Nov 24, 2020 · 11 comments
Closed

vcf2maf can not report VEP custom database annotation? #274

tangwei1129 opened this issue Nov 24, 2020 · 11 comments
Assignees

Comments

@tangwei1129
Copy link

tangwei1129 commented Nov 24, 2020

I have a question about VCF2MAF, I used VEP to annotate the VCF file with new annotation of gnomAD genomes data.

$ vep -i Sample_15120+Sample_15121.FINALmutect2.vcf --vcf -o example.vcf --offline --cache --force_overwrite --fork 16 --dir_cache /path/VEP/customDB --species human --assembly GRCh38 --fasta customDB/GRCh38.d1.vd1.fa --af_gnomad --custom /path/VEP/customDB/gnomad.genomes.r2.1.1.sites.liftover_grch38.vcf.gz,gnomADg,vcf,exact,0,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH,variant_type,segdup

And the output vcf has the information with the gnomad genome info.

Then I used the vcf2maf to convert to maf. But those information from gnomad genome did not show up
$ vcf2maf.pl --input-vcf example.vcf --output-maf example.maf --inhibit-vep --ref-fasta customDB/GRCh38.d1.vd1.fa --tumor-id Sample_15120 --normal-id Sample_15121 --vep-forks 16

Even I used the –retain-info to specify the INFO I want, it only added to the header, but no information still for last three columns.
$ vcf2maf.pl --input-vcf example.vcf --output-maf example1.maf --inhibit-vep --ref-fasta customDB/GRCh38.d1.vd1.fa --tumor-id Sample_15120 --normal-id Sample_15121 --vep-forks 16 --retain-info gnomADg,gnomADg_variant_type,gnomADg_segdup

but if I --retain-info CSQ, it will product all the information without separation.

Do you know how to fix the problem to have custom database shown up in the MAF file?

Thank you very much,

Wei

@djb17
Copy link

djb17 commented Dec 12, 2020

I'm running into a similar issue when trying to add additional columns from VCF files generated from VEP utilizing dbNSFP plugin. It seems like --retain-info is not working properly since it adds columns, but no corresponding values for the columns.

@alanhoyle
Copy link
Contributor

alanhoyle commented Apr 14, 2021

I've created pull request #282 which addresses this issue because it's one we've also run into.

The issue is that VEP saves your custom annotations into the INFO column, but only inside the CSQ= subsection, not as stand-alone values. VCF2MAF doesn't handle those in its current incarnation as it hardcodes all accepted CSQ fields into an @ann_cols variable and the --retain-info only works with stand-alone ones. The pull request addresses this by doing the following:

  1. adding --vep-custom STRING and --vep-config FILE options that get passed through into subprocess like: vep --custom STRING --config FILE
  2. adding --retain-ann COMMA,SEP,VALUES which allows you to specify which CSQ values to pass into the MAF, by appending them to the hardcoded list

In @tangwei1129 's example, you'd specify --vep-custom /path/VEP/customDB/gnomad.genomes.r2.1.1.sites.liftover_grch38.vcf.gz,gnomADg,vcf,exact,0,AF_AFR,AF_AMR,AF_ASJ,AF_EAS,AF_FIN,AF_NFE,AF_OTH,variant_type,segdup --retain-ann gnomADg_AF_AFR,gnomADg_AF_AMR,gnomADg_AF_ASJ,gnomADg_AF_EAS,gnomADg_AF_FIN,gnomADg_AF_NFE,gnomADg_AF_OTH,gnomADg_variant_type,gnomADg_segdup

I'm not familiar with the output from the dbNSFP plug-in. @djb17, can you post a simple example?

@tangwei1129
Copy link
Author

tangwei1129 commented Apr 14, 2021 via email

@alanhoyle
Copy link
Contributor

I'm not 100% certain, but I think the easiest way would be to download my forked version to test it out. I'm not an expert in github, so there's probably some way that you could download and test from the pull request. Note that I'm not one of the main developers/maintainers, but we use it at our center.

Link to my fork:

https://github.com/alanhoyle/vcf2maf

@tangwei1129
Copy link
Author

tangwei1129 commented Apr 14, 2021 via email

@alanhoyle
Copy link
Contributor

@tangwei1129 , any feedback on my fork/pull request? #282 Did it work for you?

@tangwei1129
Copy link
Author

tangwei1129 commented Apr 20, 2021 via email

@ckandoth ckandoth self-assigned this Apr 23, 2021
@ckandoth
Copy link
Collaborator

ckandoth commented Apr 23, 2021

I tested #282 successfully as follows and merged it into the main branch. Thanks @alanhoyle. I will release shortly as v1.6.21. Closing this issue.

perl vcf2maf.pl --ncbi-build GRCh38 --input-vcf tests/test_b38.vcf --output-maf test_b38.vep.maf --vep-custom tests/test_b38.gnomad.exomes.r2.1.1.sites.vcf.gz,gnomAD,vcf,exact,,AC --retain-ann gnomAD_AC --ref-fasta ~/.vep/homo_sapiens/102_GRCh38/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

@tangwei1129
Copy link
Author

I have tested it worked.
But what if there are multiple --custom data to query?
should I use --vep-custom more than once just like in the VEP?

@alanhoyle
Copy link
Contributor

I don't think my implementation will handle multiple --vep-custom lines. Merging them into a single config file and use --vep-config would work I think.

@Setfelix
Copy link

Great additions. --retain-ann also works for keeping columns from running VEP with a plugin. I just added dbNSFP columns to my MAF, --retain-info did not work for me. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants