# Hardy-Weinberg Equilibrium

In [1]:
import hail as hl
hl.init()

Running on Apache Spark version 2.4.6
SparkUI available at http://hms-beagle-5466c684ff-2l8nm:4044
Welcome to
     __  __     <>__
    / /_/ /__  __/ /
   / __  / _ `/ / /
  /_/ /_/\_,_/_/_/   version 0.2.58-3f304aae6ce2
LOGGING: writing to /home/olavur/experiments/2020-11-13_fargen1_exome_analysis/fargen-1-exome/notebooks/hail-20201203-0848-0.2.58-3f304aae6ce2.log


In [2]:
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot
from bokeh.models.scales import LinearScale, LogScale
output_notebook()

In [3]:
mt = hl.read_matrix_table('/home/olavur/experiments/2020-11-13_fargen1_exome_analysis/data/mt/variants.mt')

In [4]:
distinct_allele_counts = mt.aggregate_rows(hl.agg.counter(hl.len(mt.alleles)))
distinct_allele_counts

{5: 2520, 6: 1015, 2: 902052, 7: 524, 3: 35254, 4: 7184}

Remove multi-allelic sites. 

**NOTE:** it's possible to perform HWE tests on all alleles in multi-allelic variants by splitting the variants using [split_multi()](https://hail.is/docs/0.2/methods/genetics.html#hail.methods.split_multi).

In [5]:
mt = mt.filter_rows(hl.len(mt.alleles) == 2)

Perform the HWE test, and annotate the rows of the `MatrixTable` with the expected heterozygote frequency and the p-value from the test.

In [6]:
mt = mt.annotate_rows(hwe=hl.agg.hardy_weinberg_test(mt.GT))

Number of alleles is twice the number of samples.

In [7]:
p = hl.plot.histogram(mt.hwe.p_value, range=(0,1), legend='')
show(p)

## Q-Q plot

Plot the p-values from the HWE test as a Q-Q plot. Looks like something went terribly wrong. Maybe just too few samples.

In [11]:
p = hl.plot.qq(mt.hwe.p_value, title='Q-Q plot of HWE p-values', hover_fields={'AC': mt.info.AC[0]})
p.plot_height = 500
p.plot_width = 500
show(p)

2020-12-03 09:10:37 Hail: INFO: Ordering unsorted dataset with network shuffle


For each biallelic site, calculate the number of heterozygotes, homozygotes, and the proportion of heterozygotes.

In [9]:
mt = mt.annotate_rows(het_freq=hl.agg.fraction(mt.GT.is_het()))

In [10]:
p = hl.plot.scatter(mt.het_freq, mt.hwe.het_freq_hwe)
show(p)