Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incongruencies in GT assignment #793

Open
Virginia-b opened this issue Apr 4, 2024 · 1 comment
Open

Incongruencies in GT assignment #793

Virginia-b opened this issue Apr 4, 2024 · 1 comment
Labels

Comments

@Virginia-b
Copy link

Virginia-b commented Apr 4, 2024

I am using Freebayes (v1.3.6) for SNP calling analysis on eggplant population data. While scrutinizing the VCF file, it has come to my attention the heterozygous assignation in some cases. This was the command used to perform the SNP calling:

/usr/bin/freebayes -b all_parents_mapped_BWA_v3_20X_dedup.bam -f /data/users/vbarfon/Genomes/Solgenomics_2019_08_30/Eggplant_V3_Chromosomes.fa -v multithreadfreebayes/subfiles/all_parents_mapped_BWA_v3_20X_dedup_SMEL3Ch01_0_136534347.vcf -r SMEL3Ch01:0-136534347 --limit-coverage 800 --min-coverage 10 -m 20 -q 20

In this case, a homozygous genotype was assigned to sample5, supported by 11 reference reads and 1 alternate read, resulting in an alternate allele frequency of 1/12=0.083. On the other hand, sample8 exhibited a heterozygous genotype, supported by 9 reference reads and 1 alternate read, with an alternate allele frequency of 1/10=0.1. Despite the similarity in alternate allele frequencies between samples, Freebayes assigns different genotypes at the same loci.

/SMEL3Ch01 96806 . TTTGCC CTGGCG 47.5168 . AB=0.185185;ABP=49.4959;AC=3;AF=0.1875;AN=16;AO=12;CIGAR=1X1M1X2M1X;DP=131;DPB=131;DPRA=1.18636;EPP=3.73412;EPPR=3.90444;GTI=1;LEN=6;MEANALT=1;MQM=35.5833;MQMR=60;NS=8;NUMALT=1;ODDS=4.88556;PAIRED=1;PAIREDR=0.915966;PAO=0;PQA=0;PQR=0;PRO=0;QA=465;QR=4692;RO=119;RPL=10;RPP=14.5915;RPPR=4.48836;RPR=2;RUN=1;SAF=9;SAP=9.52472;SAR=3;SRF=58;SRP=3.17453;SRR=61;TYPE=complex GT:GQ:DP:AD:RO:QR:AO:QA:GL 0/1:31.4607:17:13,4:13:520:4:159:-6.03904,0,-42.0287 0/0:101.192:22:22,0:22:872:0:0:0,-6.62266,-78.7923 0/0:71.0896:12:12,0:12:460:0:0:0,-3.61236,-41.7443 0/0:63.2764:21:20,1:20:792:1:38:0,-2.73405,-68.0036 0/0:51.1134:12:11,1:11:437:1:40:0,-1.51775,-37.5857 0/1:41.2006:27:22,5:22:851:5:187:-7.81267,0,-68.7726 0/0:65.069:10:10,0:10:401:0:0:0,-3.0103,-36.4467 0/1:0.0326615:10:9,1:9:359:1:41:-0.735817,0,-29.6608

Moreover, at other loci, sample6, supported by 11 reference reads and 1 alternate read, exhibits a heterozygous genotype instead of a homozygous one, as exhibited in the previous case by sample5.

/SMEL3Ch01 82135 . A G 885.116 . AB=0.0833333;ABP=21.1059;AC=5;AF=0.3125;AN=16;AO=30;CIGAR=1X;DP=107;DPB=107;DPRA=1.03535;EPP=10.2485;EPPR=11.1604;GTI=1;LEN=1;MEANALT=1;MQM=60;MQMR=59.6104;NS=8;NUMALT=1;ODDS=0.82119;PAIRED=1;PAIREDR=0.922078;PAO=0;PQA=0;PQR=0;PRO=0;QA=1174;QR=2974;RO=77;RPL=19;RPP=7.64277;RPPR=9.35551;RPR=11;RUN=1;SAF=11;SAP=7.64277;SAR=19;SRF=35;SRP=4.39215;SRR=42;TYPE=snp GT:GQ:DP:AD:RO:QR:AO:QA:GL 1/1:30.4148:14:0,14:0:0:14:551:-49.9224,-4.21442,0 0/0:52.3018:15:15,0:15:570:0:0:0,-4.51545,-51.6258 1/1:33.4251:15:0,15:0:0:15:582:-52.7037,-4.51545,0 0/0:40.2606:11:11,0:11:418:0:0:0,-3.31133,-37.9553 0/0:40.2606:11:11,0:11:434:0:0:0,-3.31133,-39.4105 0/1:1.58028:12:11,1:11:398:1:41:-0.482207,0,-32.3809 0/0:58.3224:17:17,0:17:679:0:0:0,-5.11751,-61.4334 0/0:43.2709:12:12,0:12:475:0:0:0,-3.61236,-43.0969

This performance was also observed at higher coverages. For example, in this case, sample1, sample2, sample4, and sample7 carry a heterozygous genotype when their alternate allele frequencies are 4/45=0.088, 4/27=0.148, 6/45=0.133, 3/24=0.125, respectively.

/SMEL3Ch01 96344 . T C 30.2059 . AB=0.120567;ABP=179.331;AC=4;AF=0.25;AN=16;AO=24;CIGAR=1X;DP=242;DPB=242;DPRA=0;EPP=4.45795;EPPR=3.82085;GTI=2;LEN=1;MEANALT=1.125;MQM=45.4583;MQMR=54.682;NS=8;NUMALT=1;ODDS=5.42383;PAIRED=1;PAIREDR=0.949309;PAO=0;PQA=0;PQR=0;PRO=0;QA=966;QR=8706;RO=217;RPL=21;RPP=32.3252;RPPR=3.10036;RPR=3;RUN=1;SAF=15;SAP=6.26751;SAR=9;SRF=114;SRP=4.22112;SRR=103;TYPE=snp GT:GQ:DP:AD:RO:QR:AO:QA:GL 0/1:0.000458462:45:41,4:41:1610:4:164:-1.33939,0,-127.693 0/1:22.7339:27:23,4:23:927:4:164:-6.37746,0,-71.7496 0/0:82.5237:17:16,1:16:643:1:32:0,-1.94407,-52.6745 0/1:29.6502:45:38,6:38:1525:6:246:-8.28092,0,-119.131 0/0:64.3246:27:25,2:25:1013:2:82:0,-0.614318,-82.214 0/0:66.6283:32:29,3:29:1168:3:114:0,-0.956644,-93.4502 0/1:0.0107044:24:21,3:21:853:3:123:-2.78843,0,-68.7308 0/0:96.5417:25:24,1:24:967:1:41:0,-3.63818,-80.9723

Summarizing, those results showed that Freebayes assigns genotypes randomly (two different sites supported by the same number of reads have different genotypes) and it assigns heterozygous genotypes even when the alternate allele frequency is too low, expecting homozygous genotypes at those sites. How can I deal with this issue? The VCF file's head is attached.

data_20X_head.txt

@Virginia-b Virginia-b added the bug label Apr 4, 2024
@Virginia-b
Copy link
Author

It was also asked for help in Biostars: https://www.biostars.org/p/9591810/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant