Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed, when phasing rare alleles #20

Open
TanguyLallemand opened this issue Apr 4, 2023 · 10 comments

Comments

@TanguyLallemand
Copy link

Hello, @odelaneau ,

Thanks for the great work with shapeit5!!

I'm looking into implementing it in a nextflow pipeline (from modules made by @LouisLeNezet, thanks also to him!).
I have a problem when phasing rare variants with shapeit5 either 5.0 or 5.1.
Indeed the HMM computations are not done. Do you have any idea why this might be?

``
[SHAPEIT5] Phase_rare (phase rare variants onto a haplotype scaffold)

Files:

  • Unphased data : [split.chr18.vcf.gz]
  • Scaffold data : [chr18.ref.common.phased.vcf.gz]
  • Output data : [chr18_11000001_12000000.vcf.gz]

Parameters:

  • Seed : 15052011
  • Threads : 8 threads
  • PBWT : [depth = 2,2 / modulo = 0.1 / mac = 2 / mdr = 0.3]
  • HMM : [Ne = 15000 / Constant recombination rate of 1cM per Mb]

Parsing specified genomic regions

  • Scaffold region [chr18:10500001-12500000]
  • Input region [chr18:11000001-12000000]

Reading genotype data

  • BCF scanning ...
    [W::hts_idx_load3] The index file is older than the data file: split.chr18.vcf.gz.tbi
    [W::hts_idx_load3] The index file is older than the data file: chr18.ref.common.phased.vcf.gz.tbi
  • BCF scanning (35.15s)
    • Variant sites: [#scaffold=37997 | #rare=13362 | #all=51359]
    • 21696 rare variants in buffer regions excluded
    • 13362 variants found in both files [priority to scaffold]
    • 5265 multi-allelic variants excluded
  • HAP allocation [#scaffold=13362 / #samples=1842] (0.00s)
  • Genotype set allocation [#rare=37997 / #samples=1842] (0.00s)
  • BCF parsing ...
  • BCF parsing [100%]
  • Plain VCF/BCF parsing (36.04s)
    • Scaffold : [0/0=23191904 0/1=821979 1/1=598921]
    • Rare : [0/0=65436271 0/1=1925356 1/1=1326764 ./.=1302083]

Initializing sparse genotypes

  • 0 missing genotypes imputed at monomorphic sites
  • Genotype set transpose V2I (0.05s)

Setting up genetic map

  • Region length [1499962 bp / 1.5 cM (assuming 1cM per Mb)]
  • HMM parameters [Ne=15000 / Error=0.0001]
  • V2H transpose (0.03s)

PBWT pass

  • PBWT initialization [#eval=10283 / #select=6] (0.00s)

  • IBD2 constraints [100%]

  • IBD2 constraints [#inds=0 / #pairs=0] (0.42s)

  • PBWT forward selection [100%]

  • PBWT forward selection (2.65s)

  • PBWT backward selection [100%]

  • PBWT backward selection (2.87s)

    • #states=229.97+/-88.68
    • #collisions = 17789 / #pushes = 70642 / rate = 79.88%

HMM computations

  • Processing [0%]
    phase_rare_static: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32&, aligned_vector32&, std::vector&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed.
    ``
@odelaneau
Copy link
Owner

odelaneau commented Apr 6, 2023 via email

@TanguyLallemand
Copy link
Author

Hi,

Yes, sure here they are.

phase_common_static
--input split.chr18.full_pop.vcf.gz
--region chr18:1-4000000
--thread 8
--filter-maf 0.001
--output-format bcf
--output chr18_1_1000000.bcf

Between this two line i have ligated all file produced with phase common, producing chr18.ref.common.phased.vcf.gz.

phase_rare_static
--input split.chr18.full_pop.vcf.gz
--scaffold chr18.ref.common.phased.vcf.gz
--input-region chr18:1-1000000
--scaffold-region chr18:1-1500000
--thread 8
--output chr18_1_1000000.vcf.gz

I am running those files in a docker container based on your released one.

Regards,

T

@odelaneau
Copy link
Owner

Sorry, I come back to this quite late. Is this problem still happening even with last version.

Two comments anyway:

  1. Try increasing your chunk sizes for phase_Rare.
  2. Check if there is nothing wrong with your indexes as you have this warning showing up:

[W::hts_idx_load3] The index file is older than the data file: split.chr18.vcf.gz.tbi
[W::hts_idx_load3] The index file is older than the data file: chr18.ref.common.phased.vcf.gz.tbi
_

@freeseek
Copy link

I am having the same problem with phase_rare version 5.1.1 / commit = 990ed0d / release = 2023-05-08. These are the commands I have used:

phase_common \
  --thread 4 \
  --input "143_HESC_Genome.as.12.qc.bcf" \
  --reference "ALL.chr4.phase3_integrated.20130502.genotypes.bcf" \
  --map genetic_map.txt \
  --region 4:135951881-191154276 \
  --filter-maf 0.001 \
  --output "143_HESC_Genome.as.12.qc.scaffold.bcf"
phase_rare \
  --thread 4 \
  --input "143_HESC_Genome.as.12.qc.bcf" \
  --scaffold "143_HESC_Genome.as.12.qc.scaffold.bcf" \
  --map genetic_map.txt \
  --input-region 4:138482819-191154276 \
  --scaffold-region 4:135951881-191154276 \
  --output "143_HESC_Genome.as.12.qc.pgt.bcf"

After phase_rare outputs HMM computations I get the error:

phase_rare: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32<float>&, aligned_vector32<float>&, std::vector<unsigned int>&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed.

I can share via email this test case if you want to reproduce the issue.

@rebeccaito
Copy link

Thanks for SHAPEIT. It is a great tool. I'm having the same issue described here. Same version as above: v5.1.1 - 990ed0d. I have gone back and increased the chunk size so there are >340k biallelic SNPs, removed variants with excess (>10%) missingness and excess heterozygosity, removed samples with excess missingness and outliers for heterozygosity, and changed the random seed. phase_common_static runs, but I get the same error each time for phase_rare_static at the HMM computations step:

phase_rare_static: src/containers/genotype_set/genotype_set_phasing.cpp:63: void genotype_set::phaseLiAndStephens(unsigned int, unsigned int, aligned_vector32<float>&, aligned_vector32<float>&, std::vector<unsigned int>&, float): Assertion `GRvar_genotypes[vr][tidx].prob >= 0.0f' failed.

Commands used:

phase_common_static \
  --input chrom7_array_exome_combined_qcFiltered.vcf.gz \
  --filter-maf 0.001 \
  --region 7 \
  --map chr7.b38.gmap.gz \
  --output chrom7_array_exome_combined_qcFiltered_phased.bcf \
  --thread 64 \
  --log chrom7_array_exome_combined_qcFiltered_phased.log  

phase_rare_static \
  --input chrom7_array_exome_combined_qcFiltered.vcf.gz \
  --scaffold chrom7_array_exome_combined_qcFiltered_phased.bcf \
  --map chr7.b38.gmap.gz \
  --output chrom7_array_exome_combined_qcFiltered_phased_rare.bcf \
  --log chrom7_array_exome_combined_qcFiltered_phased_rare.log \
  --scaffold-region "7:100096734-131496019" \
  --input-region "7:101096734-130496019" \
  --thread 64 \
  --seed 47 

@MoiColl
Copy link

MoiColl commented Sep 14, 2023

Any news on this error? I'm also getting this error, but for some chunks only. What do you think might be the source of the error? Could it be some weird variants/format in the chunk that is being processed?

@vasilipankratov
Copy link

vasilipankratov commented Dec 7, 2023

Hi all!

I was getting the same error when trying to phase rare variants. Actually, I was getting two flavors of the same error:

Assertion GRvar_genotypes[vr][tidx].prob >= 0.0f failed.
or
Assertion !isnan(GRvar_genotypes[vr][tidx].prob) failed.

I managed to fix it by polishing the vcf. In particular, I updated the INFO tags (with bcftools +fill-tags).
Perhaps, shapeit requires proper values in the AF field or something else.

It would be nice if the upcoming versions of shapeit5 were a bit more explicit about this kind of fails or at least the requirements for the vcf were outlined in the documentation :).

@jalghor
Copy link

jalghor commented Jan 13, 2024

I was having the same issue for some of the chunks when using phase rare. I fixed it by increasing the chunk size.

Best

@justinjj24
Copy link

justinjj24 commented Feb 19, 2024

I was having the same issue for some of the chunks when using phase rare. I fixed it by increasing the chunk size.

Best

@jalghor mean you generated the chunks for rare variants with more than --window-cm arg (=4) by GLIMPSE2_chunk or referring the seed size --seed arg (=15052011) Seed of the random number generator in phase_rare. Can you provide the value you used to overcome the failure?
In my phase_common output AF is missing in the INFO field, same as @vasilipankratov mentioned. Wonder which one would fix the error or which one to consider if the results vary! Thanks

@JuliaMarkowski
Copy link

JuliaMarkowski commented Sep 16, 2024

Hi all! Are there any news on this error? I have tried increasing the chunk size, but still got the error. However, independent of chunk size, I am getting the error only for some chunks, so I think in my case the chunk size is not the issue (generated chunks with glimpse2 as suggested). I also tried bcftools +fill-tags and VCF format is fine. I would appreciate any update. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants