Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CADD-SV plugin not working #667

Closed
kittysher opened this issue Nov 28, 2023 · 7 comments
Closed

CADD-SV plugin not working #667

kittysher opened this issue Nov 28, 2023 · 7 comments
Assignees

Comments

@kittysher
Copy link

Hello,

I am trying to run the CADD-SV plugin on my consensus SV vcfs which have already been generated. They have been labelled with VEP already but I am trying to add in the CADD information too. I am using VEP v110 and the input vcfs are v4.2 Here is an example of the variants I am trying to annotate:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Germline   Tumour
chr1    4082558 manta_MantaDEL:278:0:0:0:0:0    TGTCCATCAGGCTCCTGCCATTGCCCTGGGACTGCCGCCGGAGTGAGAGGAATCTCGGCTTCTCCGGGGAAGCCCCGGTGGAGCGTGAGGAGCCTGTGATCACTGATCAATTATTCTTTATTAAGTGTATTTCCTGCATTAGTCATTTCTTTGCTGCTTCATTAGCTTCTTCTCAGCTTTCAGTGGATTGTGGGCACCACAGAGTACAAGATAATTTATTTTATTAGGCAATTTATTTTCCGGGTTCATTTTTAAAGGGCAATTAATCAATTTAAATGGAGAGGTGAAGAATGGGGACCCAAATACTCCTGGGACAAAAGTGACATTATCAACACCTTTGGGGGGAGAAAGCTTTGCTTTTTATTCTGACTTGGGAACAAATCAGGCGGGAGCAGGTGGAGCACCTTAGCATTGAGGATGGGGAGGGCTCACTGTCCCTGGACTCCAAGCACTGCTTCCTTGCTGCCCTGGAAGGTGGCAGCTGCAGCAGGAAGCAGAAGAAAGGGGCCTCTTGGGGA
  T       .       PASS SVTYPE=DEL;SVLEN=-517;END=4083075;CIPOS=0,0;CIGAR=1M517D;SOMATIC;SOMATICSCORE=76;ANN=T|intergenic_region|MODIFIER|LINC01346-EEF1DP6|ENSG00000233304-ENSG00000229280|intergenic_region|ENSG00000233304-ENSG00000229280|||n.4082559_4083075del||||||;MERGEDID=0;ORIGINALID=MantaDEL:278:0:0:0:0:0;CALLER=manta;SUPPORTINGID=manta_MantaDEL:278:0:0:0:0:0,gridss_gridss0fb_2347o,gridss_gridss0fb_2347h;SUPPORTINGCALLER=gridss,manta;SUPPORTINGIDCOUNT=3;SUPPORTINGCALLERCOUNT=2  PR:SR   56,0:49,0       35,8:18,7
chr1    19031895        manta_MantaBND:1693:0:1:0:0:0:0 C       [chr3:122974046[C       .       PASS    SVTYPE=BND;CIPOS=0,1;CIEND=0,1;MATEID=MantaBND:1693:0:1:0:0:0:1;HOMLEN=1;HOMSEQ=G;BND_DEPTH=48;MATE_BND_DEPTH=36;SOMATIC;SOMATICSCORE=68;ANN=[chr3:122974046[C|gene_fusion|HIGH|SEMA5B|ENSG00000082684|transcript|ENST00000451055.6|protein_coding||t(1%3B3)(p36.13%3Bq21.1)(c.*103878248)|t(1%3B3)(%3BENST00000451055:Met1_Val42)|||||;MERGEDID=4;ORIGINALID=MantaBND:1693:0:1:0:0:0:0;CALLER=manta;SUPPORTINGID=manta_MantaBND:1693:0:1:0:0:0:0,manta_MantaBND:1693:0:1:0:0:0:1,gridss_gridss1bb_4249o,gridss_gridss1bb_4249h;SUPPORTINGCALLER=gridss,manta;SUPPORTINGIDCOUNT=4;SUPPORTINGCALLERCOUNT=2       PR:SR   53,0:50,0       73,14:64,9
chr1    43221167        manta_MantaDUP:TANDEM:3705:0:1:0:0:0    G       <DUP:TANDEM>    .       PASS    SVTYPE=DUP;SVLEN=5287;END=43226454;CIPOS=0,1;CIEND=0,1;HOMLEN=1;HOMSEQ=A;SOMATIC;SOMATICSCORE=73;ANN=<DUP:TANDEM>|TF_binding_site_variant|MODIFIER|||PU1|MA0080.3|||n.43221168_43226454dup||||||,<DUP:TANDEM>|TF_binding_site_variant|MODIFIER|||Nrsf|MA0138.2|||n.43226454_43221168dup||||||,<DUP:TANDEM>|TF_binding_site_variant|MODIFIER|||CTCF|MA0139.1|||n.43221168_43226454dup||||||,<DUP:TANDEM>|TF_binding_site_variant|MODIFIER|||USF1|MA0093.2|||n.43221168_43226454dup||||||,<DUP:TANDEM>|intragenic_variant|MODIFIER|EBNA1BP2|ENSG00000117395|gene_variant|ENSG00000117395|||n.43226454_43221168dup||||||,<DUP:TANDEM>|intragenic_variant|MODIFIER|CFAP57|ENSG00000243710|gene_variant|ENSG00000243710|||n.||||||;MERGEDID=5;ORIGINALID=MantaDUP:TANDEM:3705:0:1:0:0:0;CALLER=manta;SUPPORTINGID=manta_MantaDUP:TANDEM:3705:0:1:0:0:0,gridss_gridss4bf_2007o,gridss_gridss4bf_2007h;SUPPORTINGCALLER=gridss,manta;SUPPORTINGIDCOUNT=3;SUPPORTINGCALLERCOUNT=2       PR:SR   61,0:63,0       75,17:64,8

My VEP script is this:

vep --input_file sample.vcf \
--output_file test_CADD.vcf.gz \
--force_overwrite \
--compress_output bgzip \
--vcf \
--fork 4 \
--offline \
--no_stats \
--cache \
--dir_cache annotation/vep/ \
--synonymsannotation/vep/homo_sapiens_merged/110_GRCh38/chr_synonyms.txt \
--dir_pluginsannotation/vep/Plugins/110/ \
--plugin CADD,sv=annotation/CADD-SV/v1.1/1000G_phase3_SVs.tsv.gz \
--max_sv_size 1000 \
--per_gene \
--merged \
--assembly GRCh38 \
--fasta Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa \
--use_given_ref \
--fields "Allele,Consequence,IMPACT,SYMBOL,Feature_type,Existing_variation,CADD_PHRED,CADD_RAW"

The VEP command runs but I get an error about the CADD plugin, and the CADD fields in the output are all missing:

WARNING:` Plugin 'CADD' went wrong:
-------------------- EXCEPTION --------------------
MSG: Missing reference or alternate sequence

I am very stuck as to what is going wrong - is this due to incompatibility between the VCF versions? Please could you help me to get this plugin up and running?

Thank you very much,
Kitty

@nakib103 nakib103 self-assigned this Nov 29, 2023
@nakib103
Copy link
Contributor

Hello @kittysher ,

Thanks for your query!

I can confirm that I can re-produce the issue. We are looking more into it and let you know once we have more information.

Best regards,
Nakib

@nakib103
Copy link
Contributor

Hi @kittysher,

Thanks for your patience. Here is why your data is not getting annotated -
CADD plugin matches SV using position and variant type as these are data available from the CADD annotation file. Currently supported type is INS, DUP, and DEL. Check the CADD-SV website for details -
https://kircherlab.bihealth.org/download/CADD-SV/v1.1/column-annotations.txt (See Type column)

Looking more at you input file -

  • the first line have a typo - there is a new line after the REF allele. Even after the fix it won't get annotated as it is not an SV and won't get any match from the annotation file.
  • the second line variant type is BND and so would not get annotated.
  • the third line I don't see any matching variant in the CADD annotation file -
$ tabix -D https://kircherlab.bihealth.org/download/CADD-SV/v1.1/1000G_phase3_SVs.tsv.gz 1:43221167-43221167 | wc -l
0

Hope it helps answer your question. Let me know if you have any further query.

Best regards
Nakib

@kittysher
Copy link
Author

Hi Nakib,

Thank you for getting back to me about this issue.
Sorry about the typo in the first line of the input file I gave - that was due to me typing it incorrectly here, the VCF itself doesn't have this. On this page, it says that breakend annotation is available in VEP v110:
https://www.ensembl.info/2023/07/21/cool-stuff-ensembl-vep-can-do-enhanced-structural-variant-annotation/
Please could you let me know if this is the case?

Many thanks
Kitty

@nakib103
Copy link
Contributor

Hello @kittysher,

Yes, that is the case. That is why you will see the breakend variant getting annotated in the output with things such as transcript affected and consequence on that transcript -

manta_MantaBND:1693:0:1:0:0:0:0	chr1:19031896	[chr3:122974046[C	ENSG00000082684	ENST00000650207	Transcript	feature_truncation ...

But annotation from CADD plugin is different matter. Plugins are extension to Ensembl VEP and often depends on files from external resources. As I pointed out CADD annotation file (which is an external resource) currently supports INS, DUP, and DEL. From https://kircherlab.bihealth.org/download/CADD-SV/v1.1/column-annotations.txt file -

Chrom	Chromosome
Start	Start Coordinates
End	End Coordinates
Type	DELetion, INSertion, DUPlication. <- see here
....

You can learn more about Ensembl VEP plugins from here - https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html

Best regards,
Nakib

@kittysher
Copy link
Author

Hi Nakib,

OK thank you for clarifying the difference between the VEP and CADD annotations.
I am still not getting the CADD VEP plugin to work though - the output is empty for every variant in my VCF which I don't think is correct. Please could you explain what this error message means? Is there some incompatibility between my VCF and the reference CADD SV file?

WARNING:` Plugin 'CADD' went wrong:
-------------------- EXCEPTION --------------------
MSG: Missing reference or alternate sequence

Many thanks
Kitty

@nakib103
Copy link
Contributor

nakib103 commented Dec 7, 2023

Hi @kittysher,

The warning is given when the alternate allele type is not supported. In the example you provided it was the BND type that was not supported and is generating the warning.

It could be there are no variants in your input files that gets matched with CADD SV file that are of supported types. Can you provide any other variants that is not annotated but should be?

Best regards,
Nakib

@nakib103
Copy link
Contributor

Hello @kittysher,
I am closing this ticket. If you have further problem feel free to open another ticket.

Best regards,
Nakib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants