Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Make sure that ref-fasta is the same genome build as your VCF #97

Closed
mxxdxxx opened this issue Nov 15, 2016 · 2 comments
Closed
Assignees

Comments

@mxxdxxx
Copy link

mxxdxxx commented Nov 15, 2016

Hello everyone,

some days ago I ran into the error mentioned above. The error occurred while running the vcf2maf example/test:

perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.maf
ERROR: Make sure that ref-fasta is the same genome build as your VCF: /home/mariodeng/.vep/homo_sapiens/86_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

While tracing it down, it turned out that I was having trouble with the VEP installation which also failed running the last test:

perl variant_effect_predictor.pl --species homo_sapiens --assembly GRCh37 --offline --no_progress --no_stats --sift b --ccds --uniprot --hgvs --symbol --numbers --domains --gene_phenotype --canonical --protein --biotype --uniprot --tsl --pubmed --variant_class --shift_hgvs 1 --check_existing --total_length --allele_number --no_escape --xref_refseq --failed 1 --vcf --minimal --flag_pick_allele --pick_order canonical,tsl,biotype,rank,ccds,length --dir $VEP_DATA --fasta $VEP_DATA/homo_sapiens/86_GRCh37/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz --input_file example_GRCh37.vcf --output_file example_GRCh37.vep.vcf --polyphen b --gmaf --maf_1kg --maf_esp --regulatory --custom $VEP_DATA/ExAC_nonTCGA.r0.3.1.sites.vep.vcf.gz,ExAC,vcf,exact,1,AC,AN

2016-11-14 18:59:11 - Read existing cache info

-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta

STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::setup_fasta /home/mariodeng/vep/Bio/EnsEMBL/Variation/Utils/FastaSequence.pm:210
STACK main::configure variant_effect_predictor.pl:835
STACK toplevel variant_effect_predictor.pl:146
Date (localtime)    = Mon Nov 14 18:59:11 2016
Ensembl API version = 86
---------------------------------------------------

It turned out that I am way behind the suggested perl version. I am on 5.10.1, while 5.22 is suggested.
To resolve these issues there are now several options:

  1. You can update your systems perl the suggested version (which was not suitable in my case)
  2. Or one can install a local version of the correct perl version (see here)
  3. As the error is caused by reading a gziped file, one can simply unzip the reference:
cd $HOME/.vep/homo_sapiens/86_GRCh37/
gunzip Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz

After unzipping the reference I was able to run the examples.

Best,
Mario

@ckandoth
Copy link
Collaborator

Thanks so much for documenting this Mario. Much appreciated. Note that the first error (from vcf2maf) is related to an older samtools being unable to parse a gzipped FASTA file. The second error (from VEP) is related to an older Perl being unable to run a module that parses a gzipped FASTA file.

For the first error, I changed the message to be more informative. So it will now say:

ERROR: You're either using an outdated samtools, or --ref-fasta is not the same genome build as your --input-vcf.

For the second error, I'm hoping people will find their way to this page where you've documented the problem nicely with possible solutions.

Cheers!

~Cyriac

@blueskypie
Copy link

blueskypie commented Feb 25, 2021

I'm getting the 2nd error while trying to run the following:
perl vcf2maf.pl --input-vcf tests/test.vcf --output-maf tests/test.vep.2.maf --ref-fasta ~/.vep/homo_sapiens/102_GRCh37/Homo_sapiens.GRCh37.dna.toplevel.fa.gz

Error msg:

-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot index bgzipped FASTA file with Bio::DB::Fasta

STACK Bio::EnsEMBL::Variation::Utils::FastaSequence::setup_fasta /mnt/data01/n/home/jike.cui/miniconda3/share/ensembl-vep-102.0-0/Bio/EnsEMBL/Variation/Utils/FastaSequence.pm:210
STACK Bio::EnsEMBL::VEP::BaseVEP::fasta_db /mnt/data01/n/home/jike.cui/miniconda3/share/ensembl-vep-102.0-0/modules/Bio/EnsEMBL/VEP/BaseVEP.pm:477
STACK Bio::EnsEMBL::VEP::Runner::init /mnt/data01/n/home/jike.cui/miniconda3/share/ensembl-vep-102.0-0/modules/Bio/EnsEMBL/VEP/Runner.pm:138
STACK Bio::EnsEMBL::VEP::Runner::run /mnt/data01/n/home/jike.cui/miniconda3/share/ensembl-vep-102.0-0/modules/Bio/EnsEMBL/VEP/Runner.pm:203

If I unzip Homo_sapiens.GRCh37.dna.toplevel.fa.gz, vcf2maf runs very very slow.

perl: v5.30.1
vep: v102
Thanks for any help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants