-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to infer the A and B alleles while parsing the site... #41
Comments
The BCFtools/mochatools plugin will infer which allele is the A and B allele as long as at least one homozygous AA or one homozygous BB allele is observed. All sites for which all samples are heterozygous will not be inferrable. It is simply not possible to do so. If you have enough samples in the VCF, this should not be a problem. Are you running the tool on a single sample VCF? My advice is to go back to the org that provides your dataset and tell them to do the right thing and give you the IDAT files (or CEL files if it is Affymetrix data) |
I have attempted running it with both a single VCF (which I now understand why it would give an error), and then with a test VCF with 10 samples. The error persists. Is this just a result of still not having a sufficient number of samples? Just want to resolve the issue before we run the algo to add LRR and BAF to tons of different VCFs. Will try to follow up with the org but have had low success with them about this issue in the past. |
With 10 samples in a VCF, for a very common variant with minor allele frequency close to 0.5 you still have ~1/1,000 chances that all samples will be heterozygous. So it is still possible that you will not be able to infer which one is ALLELE_A and which one is ALLELE_B for a few markers. To be safe, I think you need a VCF with at least ~30 samples from independent participants. Otherwise it is just not possible to retrieve this information. Remember that the root of the issue here is that the org that provides your dataset tossed that information away. This is not a limitation of MoChA |
BCFtools/gtc2vcf can automatically add ALLELE A/ALLELE B/GC/LRR/BAF when you convert a .gtc file. I have no idea what you refer to when you say basic vcf format. One thing for sure. If a VCF does not have LRR/BAF information, then there is no way to "add" this information |
SHAPEIT5, differently from SHAPEIT4, requires the AC and AN fields to be filled. You can quickly fill them with either of the following BCFtools commands:
|
Thank you. Sounds like 5 is a bit more complicated than 4. I've tried a lot of online methods to make shapeit4, but they didn't success. Could you provide the shapeit4 file that has already been compiled? |
SHAPEIT4 and phase_common from SHAPEIT5 are identical other than requiring the AC and AN fields, with the advantage that SHAPEIT5 can handle trios. You can find binaries for SHAPEIT5 here. In the past to generate binaries for SHAPEIT4 I used the following Dockerfile:
|
Your VCF does not include intensity data so it would be pointless to identify which one is the A allele and which one is the B allele. I would advise you to go back to the table data generated by the Affymetrix Power Tools when you genotyped your samples and then use BCFtools/affy2vcf to generate a VCF with BAF, LRR, ALLELE_A, and ALLELE_B. Then you don't have to worry about file formatting issues |
I have a batch of VCF files from an array that I am trying to add ALLELE_A and ALLELE_B into to be able to run them through MoChA. I used the mochatools command shown below to do so:
bcftools +mochatools $input -- -t ALLELE_A,ALLELE_B,GC -f $reference > $output
I am getting the error:
Unable to infer the A and B alleles while parsing the site:
for all non 0/0 sites.Can you please offer some advice on why this might be the case and how to fix it?
P.S. Not sure if this would have anything to do with it, but the VCFs were pre-generated by the org that provides our dataset, but we had to add in the LRR and BAF fields manually afterwards.
The text was updated successfully, but these errors were encountered: