-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BamUtil: recab error: failed to open reference genome #62
Comments
Which version of bamUtil are you using? Can you try the NonPrimaryDedup branch. |
Hi Jonathon, After cloning and making the NonPrimaryDedup branch bamUtil, I tried to use the dedup --recab command as above, but it returned exactly the same errors (trying both reference genomes) as before. |
Ok. After cloning, you can switch to that branch by running |
I edited my previous comment, but I'm not sure it was seen. I used the command: |
I cannot reproduce (see below). Can you try using
|
Hi, thank you and sorry I took so long to respond. |
Hello,
I am trying to use the TOPMed Variant Calling Pipeline (latest: Freeze 8) to analyze human genetic data. I am following the instructions on the Readme (https://github.com/statgen/topmed_variant_calling) and have aligned my sequenced reads in BAM format (and indexed them) using bwa to the GRCh38 (full analysis set plus decoy hla) reference genome fasta. (Found here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/).
The TOPMed pipeline readme does not mention the steps of deduplicating and recalibrating the BAM files, but as mentioned in the wiki here (https://genome.sph.umich.edu/wiki/GotCloud) and my own knowledge of variant calling pipeline best practices, I believe these steps are highly recommended.
So I would like to use BamUtil to deduplicate and recalibrate my BAM files. Following the instructions here (https://genome.sph.umich.edu/wiki/BamUtil:_recab), my command is:
bamUtil/bin/bam dedup --recab --in input.bam --out input.bam.recab --force --refFile /ref/hg38/GRCh38_full_analysis_set_plus_decoy_hla.fa --dbsnp /ref/dbsnp_142.b38.vcf.gz --oneChrom --storeQualTag OQ --maxBaseQual 40
But it fails with the following error message:
Failed to open reference genome /ref/hg38/GRCh38_full_analysis_set_plus_decoy_hla.fa /ref/hg38/GRCh38_full_analysis_set_plus_decoy_hla-bs.umfa: wrong type of file (expected type 460927905 but got 1096302936)
I also tried the same command as above but using a similar(?) reference genome (hs38DH.fa), which I obtained (following the instructions here: https://github.com/statgen/topmed_variant_calling) using the command:
wget ftp://share.sph.umich.edu/1000genomes/fullProject/hg38_resources/topmed_variant_calling_example_resources.tar.gz
This returned a similar error:
GenomeSequence::open: failed to open file /ref/hs38DH-bs.umfa Failed to open reference genome /ref/hs38DH.fa hs38DH-bs.umfa: wrong type of file (expected type 460927905 but got 1415074904)
I am not sure what it means by "expected type 460927905". I would greatly appreciate any help in solving this issue. Thank you!
The text was updated successfully, but these errors were encountered: