ERROR: Make sure that ref-fasta is the same genome build as your VCF #97

mxxdxxx · 2016-11-15T20:21:31Z

Hello everyone,

some days ago I ran into the error mentioned above. The error occurred while running the vcf2maf example/test:

perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf
ERROR: Make sure that ref-fasta is the same genome build as your VCF: /home/mariodeng/.vep/homo_sapiens/86_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

While tracing it down, it turned out that I was having trouble with the VEP installation which also failed running the last test:

perl variant_effect_predictor.pl --species homo_sapiens --assembly GRCh37 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --uniprot --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $VEP_DATA --fasta $VEP_DATA/homo_sapiens/86_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz --input_file example_GRCh37.vcf --output_file example_GRCh37.vep.vcf --polyphen b --gmaf --maf_1kg --maf_esp --regulatory --custom $VEP_DATA/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz,ExAC,vcf,exact,1,AC,AN

2016-11-14 18:59:11 - Read existing cache info

-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta

STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::setup_fasta /home/mariodeng/vep/Bio/EnsEMBL/Variation/Utils/FastaSequence.pm:210
STACK main::configure variant_effect_predictor.pl:835
STACK toplevel variant_effect_predictor.pl:146
Date (localtime)    = Mon Nov 14 18:59:11 2016
Ensembl API version = 86
---------------------------------------------------

It turned out that I am way behind the suggested perl version. I am on 5.10.1, while 5.22 is suggested.
To resolve these issues there are now several options:

You can update your systems perl the suggested version (which was not suitable in my case)
Or one can install a local version of the correct perl version (see here)
As the error is caused by reading a gziped file, one can simply unzip the reference:

cd $HOME/.vep/homo_sapiens/86_GRCh37/
gunzip Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

After unzipping the reference I was able to run the examples.

Best,
Mario

The text was updated successfully, but these errors were encountered:

ckandoth · 2016-11-15T21:26:38Z

Thanks so much for documenting this Mario. Much appreciated. Note that the first error (from vcf2maf) is related to an older samtools being unable to parse a gzipped FASTA file. The second error (from VEP) is related to an older Perl being unable to run a module that parses a gzipped FASTA file.

For the first error, I changed the message to be more informative. So it will now say:

ERROR: You're either using an outdated samtools, or --ref-fasta is not the same genome build as your --input-vcf.

For the second error, I'm hoping people will find their way to this page where you've documented the problem nicely with possible solutions.

Cheers!

~Cyriac

blueskypie · 2021-02-25T03:47:27Z

I'm getting the 2nd error while trying to run the following:
perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.2.maf --ref-fasta ~/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz

Error msg:

-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta

STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::setup_fasta /mnt/data01/n/home/jike.cui/miniconda3/share/ensembl-vep-102.0-0/Bio/EnsEMBL/Variation/Utils/FastaSequence.pm:210
STACK Bio::EnsEMBL::VEP::BaseVEP::fasta_db /mnt/data01/n/home/jike.cui/miniconda3/share/ensembl-vep-102.0-0/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:477
STACK Bio::EnsEMBL::VEP::Runner::init /mnt/data01/n/home/jike.cui/miniconda3/share/ensembl-vep-102.0-0/modules/Bio/EnsEMBL/VEP/Runner.pm:138
STACK Bio::EnsEMBL::VEP::Runner::run /mnt/data01/n/home/jike.cui/miniconda3/share/ensembl-vep-102.0-0/modules/Bio/EnsEMBL/VEP/Runner.pm:203

If I unzip Homo_sapiens.GRCh37.dna.toplevel.fa.gz, vcf2maf runs very very slow.

perl: v5.30.1
vep: v102
Thanks for any help!

ckandoth closed this as completed Nov 15, 2016

ckandoth self-assigned this Feb 2, 2017

geocarvalho mentioned this issue Mar 1, 2017

Problems with annotation J35P312/FindSV#2

Closed

cegg mentioned this issue Nov 9, 2018

SIFT not available Ensembl/ensembl-vep#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: Make sure that ref-fasta is the same genome build as your VCF #97

ERROR: Make sure that ref-fasta is the same genome build as your VCF #97

mxxdxxx commented Nov 15, 2016

ckandoth commented Nov 15, 2016

blueskypie commented Feb 25, 2021 •

edited

Loading

ERROR: Make sure that ref-fasta is the same genome build as your VCF #97

ERROR: Make sure that ref-fasta is the same genome build as your VCF #97

Comments

mxxdxxx commented Nov 15, 2016

ckandoth commented Nov 15, 2016

blueskypie commented Feb 25, 2021 • edited Loading

blueskypie commented Feb 25, 2021 •

edited

Loading