# Kinship coefficient

Suppose $\mathbf{X}$ is an $n \times m$ genotype matrix, where $n$ is the number of individuals and $m$ is the number of genetic markers. [Lange et al.](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4062304/) derived a method of moment estimator of of $n \times n$ kinship matrix $\Phi$ such that

$$
\widehat \Phi_{ij} = \frac{e_{ij} - \sum_{k=1}^m [p_k^2 + (1 - p_k)^2]}{m - \sum_{k=1}^m [p_k^2 + (1 - p_k)^2]}, \quad 1 \le i, j \le n,
$$

$$
\begin{eqnarray*}
    e_{ij} &=& \frac{1}{4} \sum_{k=1}^m [x_{ik} x_{jk} + (2 - x_{ik})(2 - x_{jk})] \\
    p_k &=& \frac {1}{2n} \sum_{i=1}^n x_{ik}.
\end{eqnarray*}
$$

### Implementation

In [1]:
function kinship(X::Matrix{Float64})
    # get dimension
    n = size(X, 1)
    m = size(X, 2)    
    # pre-allocate memory
    Φ = Matrix{Float64}(undef, n, n)
    # compute allele frequency
    p = vec(sum(X, dims = 1) ./ (2 * n))
    # compute sum of square terms
    ss = 2 * dot(p, p) - 2 * sum(p) + m
    # compute row sum
    x = vec(sum(X, dims = 2))
    # compute Φ
    BLAS.syrk!('L', 'N', 0.5, X, 0.0, Φ)
    for i in 1:n
        for j in i:n
            Φ[j, i] = (Φ[j, i] + m - 0.5 * (x[i] + x[j]) - ss) / (m - ss)
            Φ[i, j] = Φ[j, i]
        end
    end    
    return Φ
end

kinship (generic function with 1 method)

In [2]:
using BenchmarkTools, LinearAlgebra, Random
Random.seed!(1234)
X = rand(0.0:2.0, 1000, 10000);

In [3]:
@time kinship(X)
@benchmark kinship(X)

  0.054336 seconds (15 allocations: 7.790 MiB)


BenchmarkTools.Trial: 
  memory estimate:  7.79 MiB
  allocs estimate:  15
  --------------
  minimum time:     48.264 ms (0.00% GC)
  median time:      52.353 ms (0.00% GC)
  mean time:        53.114 ms (1.61% GC)
  maximum time:     63.663 ms (12.17% GC)
  --------------
  samples:          95
  evals/sample:     1

### Alternative kinship matrix

Genomic relationship matrix (GRM) can be also estimated as 

$$
\widehat \Phi_{ij} = \frac{1}{2m} \sum_{k=1}^m \frac{(x_{ik} - 2p_k)(x_{jk} - 2p_k)}{2p_k(1-p_k)}
$$