You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When phasing non-PAR X, if present in the input VCF/BCF, the heterozygous haploids are not correctly set to missing.
Here is the problematic code:
for (uint32_t v = 0 ; v < n_variants ; v++) {
if (VAR_GET_AMB(MOD2(v), Variants[DIV2(v)])) {
VAR_SET_MIS(MOD2(v), Variants[DIV2(v)]);
nreset ++;
}
}
The first 2 bits are 10 for het, and it is picked up by VAR_GET_AMB. Next, VAR_SET_MIS sets the last two bits to 11. However, code 11 is not recognized as the code for missing genotype by the VAR_GET_MIS (in fact, it is recognized by VAR_GET_SCA only). The problem is that only VAR_GET_MIS is used later, for example, to check if the missing values should be imputed. As a result, the heterozygous haploids are being propagated to the final phased output.
One option to resolve this is to clear the first 2 bits before setting them to code 01 (missing) e.g.:
#define VAR_CLEAR_FLAG(e, v) ((v)&=(~(3<<((e)<<2))))
for (uint32_t v = 0 ; v < n_variants ; v++) {
if (VAR_GET_AMB(MOD2(v), Variants[DIV2(v)])) {
VAR_CLEAR_FLAG(MOD2(v), Variants[DIV2(v)]);
VAR_SET_MIS(MOD2(v), Variants[DIV2(v)]);
nreset ++;
}
}
Note for the users who read this issue: If you are sure that you don't have haploids (i.e. males) on non-PAR chrX with het values, then everything should be fine. Use bcftools +fixploidy and bcftools +check-ploidy plugins to check your data.
If your haploids in your input VCF/BCF are coded according to the spec with single GT value (i.e. 0 instead of 0/0 and 1 instead of 1/1), the see issue #99 .
The text was updated successfully, but these errors were encountered:
Following up on this issue. Looking at your scripts/documentation, I can't find any information if a check for the presence of het genotypes in non-PAR regions in males was done before phasing UK Biobank data. I.e. do you think this issue may have had an effect in the UK Biobank phasing of chrX nonPAR? Thanks!
To add, I noticed the same behaviour when the input has missing genotypes. There were males in the non-PAR who had missing genotypes but were heterozygous in the phased output.
See attached image: males in the phased output are heterozygous, and have missing genotypes in the input I passed to SHAPEIT5.
Version: 5.1.1 / commit = 990ed0d / release = 2023-05-08 was used
Hi,
When phasing non-PAR X, if present in the input VCF/BCF, the heterozygous haploids are not correctly set to missing.
Here is the problematic code:
The first 2 bits are 10 for het, and it is picked up by
VAR_GET_AMB
. Next,VAR_SET_MIS
sets the last two bits to 11. However, code 11 is not recognized as the code for missing genotype by theVAR_GET_MIS
(in fact, it is recognized byVAR_GET_SCA
only). The problem is that onlyVAR_GET_MIS
is used later, for example, to check if the missing values should be imputed. As a result, the heterozygous haploids are being propagated to the final phased output.One option to resolve this is to clear the first 2 bits before setting them to code 01 (missing) e.g.:
Note for the users who read this issue: If you are sure that you don't have haploids (i.e. males) on non-PAR chrX with het values, then everything should be fine. Use
bcftools +fixploidy
andbcftools +check-ploidy
plugins to check your data.If your haploids in your input VCF/BCF are coded according to the spec with single GT value (i.e. 0 instead of 0/0 and 1 instead of 1/1), the see issue #99 .
The text was updated successfully, but these errors were encountered: