# Variant Caller

The following pipeline is derived from : [Learn Gencore - Variant Calling](https://learn.gencore.bio.nyu.edu/variant-calling/)

Continuing from last week's alignment. Let's first add a "read group" to the alignment file so GATK will have more information about the sample.



In [2]:
cd alignment

In [3]:
ls

GCF_000001405.33_GRCh38.p7_chr20_genomic.fna  read_2.fastq
aligned_reads.sam			      slurm-14772653.out
aligned_reads2.sam			      slurm-14772803.out
bwa.sh					      slurm-14772830.out
human20.0123				      slurm-14772859.out
human20.amb				      slurm-14772860.out
human20.ann				      slurm-14772861.out
human20.bwt.2bit.64			      slurm-14772862.out
human20.pac				      slurm-14772863.out
read_1.fastq				      slurm-14839008.out
read_1.fastq.sam


## Sorting and converting to BAM in one step

The algorithms used in downstream steps require the data to be sorted by coordinate and in bam format in order to be processed. We use Picard Tools and issue a single command to both sort the sam file produced in step 1 and output the resulting sorted data in bam format.

For a full list of all commands available in picard tools, see this site [Picard tools](https://broadinstitute.github.io/picard/)

similar to trimmomatic, we can use the environment variable to tell the java application where the picard jar file is.


In [4]:
module avail picard


--------------------------- /share/apps/modulefiles ----------------------------
   picard/2.17.11    picard/2.23.8


In [5]:
module load picard/2.23.8

In [6]:
java -jar $PICARD_JAR SortSam \
INPUT=aligned_reads.sam \
OUTPUT=sorted_reads.bam \
SORT_ORDER=coordinate

INFO	2022-02-15 21:20:24	SortSam	

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    SortSam -INPUT aligned_reads.sam -OUTPUT sorted_reads.bam -SORT_ORDER coordinate
**********


21:20:24.713 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/picard/2.23.8/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Feb 15 21:20:24 EST 2022] SortSam INPUT=aligned_reads.sam OUTPUT=sorted_reads.bam SORT_ORDER=coordinate    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Tue Feb 15 21:20:24 EST 2022] Exe

In [10]:
ls -hl *am

-rw-rw-r--. 1 msk8 msk8 63M Feb  9 11:25 aligned_reads.sam
-rw-rw-r--. 1 msk8 msk8 63M Feb  9 11:52 aligned_reads2.sam
-rw-rw-r--. 1 msk8 msk8 63M Feb 10 23:17 read_1.fastq.sam
-rw-rw-r--. 1 msk8 msk8 16M Feb 15 21:20 sorted_reads.bam


Notice how much smaller the bam file is. It contains the same information as the sam file but sorted.

## Add Readgroup

In the pipeline it is important to tag all the reads with the sample id and other meta information which will be helpful to group the samples for our analysis. This can be done during the alignment step, as discussed in the reference provided, but also using picard tools. 



In [11]:
java -jar $PICARD_JAR AddOrReplaceReadGroups \
I=sorted_reads.bam \
O=sorted_reads_rg.bam \
RGID=sample_1 \
RGLB=sample_1 \
RGPL=ILLUMINA \
RGPM=HISEQ \
RGPU=sample_1 \
RGSM=sample_1


INFO	2022-02-15 21:22:18	AddOrReplaceReadGroups	

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    AddOrReplaceReadGroups -I sorted_reads.bam -O sorted_reads_rg.bam -RGID sample_1 -RGLB sample_1 -RGPL ILLUMINA -RGPM HISEQ -RGPU sample_1 -RGSM sample_1
**********


21:22:19.240 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/picard/2.23.8/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Feb 15 21:22:19 EST 2022] AddOrReplaceReadGroups INPUT=sorted_reads.bam OUTPUT=sorted_reads_rg.bam RGID=sample_1 RGLB=sample_1 RGPL=ILLUMINA RGPU=sample_1 RGSM=sample_1 RGPM=HISEQ    VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000

Let's inspect the original sam file and the new bam file with read groups to see the difference. We will use samtools to peak into the bam file.

In [12]:
module avail samtools


--------------------------- /share/apps/modulefiles ----------------------------
   samtools/intel/1.11    samtools/intel/1.12    samtools/intel/1.14


In [13]:
module load samtools/intel/1.14

In [14]:
samtools view aligned_reads.sam | head -5

HS2000-940_146:5:1101:1161:63226	73	NC_000020.11	23775298	60	78M22S	=	23775298	0	CTGNTAGCCCTGCTGAATCTCCCTCCTGACCCAACTCCCTCNTNNNNNNNGCTGGGTGACTGCTGNCNNCACNGGCTGTGNNNNNNNNNNNNNCAGCTGG	?@@#4ADDDFDFFHIGGFCFHCHFGIHGCGHEHHEHD3?BH#0#######--5CEECG=?AEEHE###################################	NM:i:13	MD:Z:3G37C1C0T0A0C0T0C0T15T1C0T3T5	AS:i:52	XS:i:0
HS2000-940_146:5:1101:1161:63226	133	NC_000020.11	23775298	0	*	=	23775298	0	NNCTCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGNNNNCNAAAGGAGCCTGGGT	####################################################################################################	MC:Z:78M22S	AS:i:0	XS:i:0
HS2000-940_146:5:1101:1262:12434	99	NC_000020.11	23843774	60	100M	=	23843977	258	ATCAATGGTGTTTCTTTGCCAAGCTTCCTTAGTCGCCTTTAATCGGGAAAAGGTCTTCATTCTTTCTTGTCTTTGTTACCCTGTCATTTTTGAAGATAAC	?@@BDDFFFFHHHGHIIIHJEGHIIIGIJICHIGIIGIJIDHHGJJJ:;8EFH=CFHGGHIIIJJHHGBEHFFFFEEDCCCCEDCCADDEDD(5>5>@5@	NM:i:0	MD:Z:100	MC:Z:55M45S	AS:i:100	XS:i:60
HS2000-940_146:5:1101:1262:

In [15]:
samtools view sorted_reads_rg.bam | head -5

HS2000-940_146:5:2109:14063:29918	161	NC_000020.11	64145	2	54S46M	=	23724989	23660944	TTCCAATCCATTCCATTCCATCACACTGCATTCCATTCCATTCCAATCCCCTCAACTCCACTCCACTCCACTCCATTCCACTCCAATCAATTCCATTGCA	@CCFFFFFHGDHHJIJJJJJIJIFHHGCGIHHIJJJGHIHIIIJJIHIGGGHIJJE:FFHIGIDHJGIGGIJJ@;CDHGGEIHHHEHF;CCB>;;3;>;>	XA:Z:NC_000020.11,+60520,54S31M15S,0;NC_000020.11,+60582,56S29M15S,0;NC_000020.11,+64492,56S29M15S,0;NC_000020.11,+62105,85M15S,12;	MC:Z:100M	MD:Z:6G27C8T2	RG:Z:sample_1	NM:i:3	AS:i:33	XS:i:31
HS2000-940_146:5:2110:1521:37886	163	NC_000020.11	1217420	0	55S21M24S	=	1217591	271	GACAGTTCTGAAGAGAGCAGGGGTTCTTCCAGCATTGCATTTGAGCTCCGAAAATGGACAGACTGCCTCCTCAAGTCGGTCCTTGACCTCCGTGCACCCT	?7:DDD:B,+ADD43C?BF++<2<):**11*1:;C*0?0B?F>GCDBF30'-'-8@8..@1@E@;37@)?76@###########################	XA:Z:NC_000020.11,+22851991,88M12S,11;NC_000020.11,-23620838,66S34M,1;NC_000020.11,-35076707,12S32M56S,2;NC_000020.11,-16793280,12S36M52S,3;	MC:Z:100M	MD:Z:21	RG:Z:sample_1	NM:i:0	AS:i:21	XS:i:34
HS2000-940_146:5:2110:1521:37886	83	N

Notice in the original file the pair reads are near each other, where as in the sorted file it is based on where the sequences match on the chromosome.

Also notice in the second file there is an entry for **RG:Z:sample_1** in the last column. 

Samtools also has a command to provide a summary of the alignment

In [16]:
samtools flagstat sorted_reads_rg.bam

194483 + 0 in total (QC-passed reads + QC-failed reads)
194412 + 0 primary
0 + 0 secondary
71 + 0 supplementary
0 + 0 duplicates
0 + 0 primary duplicates
193795 + 0 mapped (99.65% : N/A)
193724 + 0 primary mapped (99.65% : N/A)
194412 + 0 paired in sequencing
97206 + 0 read1
97206 + 0 read2
190810 + 0 properly paired (98.15% : N/A)
193108 + 0 with itself and mate mapped
616 + 0 singletons (0.32% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


## Mark Duplicates.

During the sequencing process, the same DNA fragments may be sequenced several times. These duplicate reads are not informative and cannot be considered as evidence for or against a putative variant. For example, duplicates can arise during sample preparation e.g. library construction using PCR. Without this step, you risk having over-representation in your sequence of areas preferentially amplified during PCR. Duplicate reads can also result from a single amplification cluster, incorrectly detected as multiple clusters by the optical sensor of the sequencing instrument. These duplication artifacts are referred to as optical duplicates.

We use Picard Tools to locate and tag duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.

Note that this step does not remove the duplicate reads, but rather flags them as such in the read’s SAM record. We’ll take a look at how this is done shortly. Downstream GATK tools will ignore reads flagged as duplicates by default.

Note: Duplicate marking should not be applied to amplicon sequencing or other data types where reads start and stop at the same positions by design.

In [17]:
java -jar $PICARD_JAR MarkDuplicates \
INPUT=sorted_reads_rg.bam \
OUTPUT=dedup_reads.bam \
METRICS_FILE=metrics.txt

INFO	2022-02-15 21:30:10	MarkDuplicates	

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    MarkDuplicates -INPUT sorted_reads_rg.bam -OUTPUT dedup_reads.bam -METRICS_FILE metrics.txt
**********


21:30:10.724 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/picard/2.23.8/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Feb 15 21:30:10 EST 2022] MarkDuplicates INPUT=[sorted_reads_rg.bam] OUTPUT=dedup_reads.bam METRICS_FILE=metrics.txt    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=

In [18]:
# take a look at the metrics file
cat metrics.txt

## htsjdk.samtools.metrics.StringHeader
# MarkDuplicates INPUT=[sorted_reads_rg.bam] OUTPUT=dedup_reads.bam METRICS_FILE=metrics.txt    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
## htsjdk.samtools.metrics.StringHeader
# Star

## Additional file preparation steps

Before we can use GATK, we need to create a few more index files.

In [19]:
java -jar $PICARD_JAR CreateSequenceDictionary \
R=GCF_000001405.33_GRCh38.p7_chr20_genomic.fna \
O=GCF_000001405.33_GRCh38.p7_chr20_genomic.dict

INFO	2022-02-15 21:49:35	CreateSequenceDictionary	

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    CreateSequenceDictionary -R GCF_000001405.33_GRCh38.p7_chr20_genomic.fna -O GCF_000001405.33_GRCh38.p7_chr20_genomic.dict
**********


21:49:35.622 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/picard/2.23.8/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Tue Feb 15 21:49:35 EST 2022] CreateSequenceDictionary OUTPUT=GCF_000001405.33_GRCh38.p7_chr20_genomic.dict REFERENCE=GCF_000001405.33_GRCh38.p7_chr20_genomic.fna    TRUNCATE_NAMES_AT_WHITESPACE=true NUM_SEQUENCES=2147483647 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_R

In [20]:
samtools faidx GCF_000001405.33_GRCh38.p7_chr20_genomic.fna


In [21]:
samtools index dedup_reads.bam


## Recalibrating quality scores

Normally we would now recalibrate our quality scores. Unfortunately we don't have the correct vcf file of known snps. So we will perform the variant calling without recalibration.

## Ready to call variants

We will be using [GATK](https://gatk.broadinstitute.org/hc/en-us) to call variants.

In [23]:
module avail gatk


--------------------------- /share/apps/modulefiles ----------------------------
   gatk/3.8-0    gatk/4.1.7.0    gatk/4.1.9.0    gatk/4.2.0.0    gatk/4.2.4.1


In [41]:
module load gatk/4.1.9.0

In [42]:
env | grep GATK

GATK_HOME=/share/apps/gatk/4.1.9.0
GATK_LOCAL_JAR=/share/apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar
LMOD_FAMILY_GATK_VERSION=4.1.9.0
LMOD_FAMILY_GATK=gatk
GATK_SPARK_JAR=/share/apps/gatk/4.1.9.0/gatk-package-4.1.9.0-spark.jar
GATK_JAR=/share/apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar
GATK_ROOT=/share/apps/gatk/4.1.9.0


In [47]:
java -jar $GATK_JAR HaplotypeCaller \
-R GCF_000001405.33_GRCh38.p7_chr20_genomic.fna \
-I dedup_reads.bam \
-O raw_variants.vcf


22:23:00.932 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 15, 2022 10:23:01 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
22:23:01.137 INFO  HaplotypeCaller - ------------------------------------------------------------
22:23:01.137 INFO  HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.1.9.0
22:23:01.137 INFO  HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
22:23:01.137 INFO  HaplotypeCaller - Executing as msk8@cm002.hpc.nyu.edu on Linux v4.18.0-305.28.1.el8_4.x86_64 amd64
22:23:01.137 INFO  HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_271-b09
22:23:01.137 INFO  HaplotypeCaller - Start Date/Time: February 15, 2022 10:23:00 PM EST
22:23:01.137 INFO  HaplotypeCa

In [48]:
head -30 raw_variants.vcf


##fileformat=VCFv4.2
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --output raw_variants.vcf --input dedup_reads.bam --reference GCF_000001405.33_GRCh38.p7_chr20_genomic.fna --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-pl

In [49]:
# split the variants to snps and indels
gatk SelectVariants -R GCF_000001405.33_GRCh38.p7_chr20_genomic.fna -V raw_variants.vcf -select-type SNP -O raw_snps.vcf
gatk SelectVariants -R GCF_000001405.33_GRCh38.p7_chr20_genomic.fna -V raw_variants.vcf -select-type INDEL -O raw_indels.vcf


Using GATK jar /share/apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar defined in environment variable GATK_LOCAL_JAR
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /share/apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar SelectVariants -R GCF_000001405.33_GRCh38.p7_chr20_genomic.fna -V raw_variants.vcf -select-type SNP -O raw_snps.vcf
22:24:24.045 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 15, 2022 10:24:24 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
22:24:24.301 INFO  SelectVariants - ------------------------------------------------------------
22:24:24.302 INFO  SelectVariants - The Genome Analysis Toolkit (

: 127

In [50]:
java -jar $GATK_JAR VariantFiltration \
-R GCF_000001405.33_GRCh38.p7_chr20_genomic.fna \
-V raw_snps.vcf \
--filter-name "QD_filter" \
-filter "QD<2.0" \
--filter-name "FS_filter" \
-filter "FS>60.0" \
--filter-name "MQ_filter" \
-filter "MQ<40.0" \
--filter-name "SOR_filter" \
-filter "SOR>10.0" \
-O filtered_snps.vcf

22:24:48.583 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/share/apps/gatk/4.1.9.0/gatk-package-4.1.9.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Feb 15, 2022 10:24:48 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
22:24:48.809 INFO  VariantFiltration - ------------------------------------------------------------
22:24:48.809 INFO  VariantFiltration - The Genome Analysis Toolkit (GATK) v4.1.9.0
22:24:48.809 INFO  VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
22:24:48.809 INFO  VariantFiltration - Executing as msk8@cm002.hpc.nyu.edu on Linux v4.18.0-305.28.1.el8_4.x86_64 amd64
22:24:48.809 INFO  VariantFiltration - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_271-b09
22:24:48.809 INFO  VariantFiltration - Start Date/Time: February 15, 2022 10:24:48 PM EST
22:24:48.809 INFO 

In [51]:
wc -l *vcf


   1445 filtered_snps.vcf
    204 raw_indels.vcf
   1439 raw_snps.vcf
   1614 raw_variants.vcf
   4702 total


In [52]:
grep -v "##" filtered_snps.vcf | head
grep -v "##" filtered_snps.vcf | tail

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample_1
NC_000020.11	23531045	.	T	A	1209.06	PASS	AC=2;AF=1.00;AN=2;DP=35;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=25.36;SOR=0.693	GT:AD:DP:GQ:PL	1/1:0,32:32:95:1223,95,0
NC_000020.11	23531143	.	C	A	1610.06	PASS	AC=2;AF=1.00;AN=2;DP=45;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=28.73;SOR=0.741	GT:AD:DP:GQ:PL	1/1:0,41:41:99:1624,123,0
NC_000020.11	23532123	.	G	C	1619.06	PASS	AC=2;AF=1.00;AN=2;DP=39;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=30.97;SOR=1.096	GT:AD:DP:GQ:PL	1/1:0,39:39:99:1633,117,0
NC_000020.11	23532137	.	C	T	1649.06	PASS	AC=2;AF=1.00;AN=2;DP=41;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=27.24;SOR=1.022	GT:AD:DP:GQ:PL	1/1:0,40:40:99:1663,120,0
NC_000020.11	23532481	.	C	T	2157.06	PASS	AC=2;AF=1.00;AN=2;DP=51;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=28.20;SOR=1.157	GT:AD:DP:GQ:PL	1/1:0,50:50:99:2171,151,0
NC_000020.11	23533472	.	A	T	1003.06	PASS	AC=2;AF=1.