# Theory

Let the centered genotype matrix $X = \left( \begin{array}{c} X_U \\ X_T \end{array} \right)$
where $X$ has dimension $M \times N$,
the untyped SNP matrix $X_U$ has dimension $M_U \times N$
and typed SNP matrix $X_T$ has dimension $M_T \times N$ and $M=M_U+M_T$.

Let $S$ be the $M \times M$ LD matrix defined as $S=\lim_{N \rightarrow \infty} N^{-1}XX^T$.
$S$ can be further broken down as the LD between and within typed and untyped SNPs as
$S=\left( \begin{array}{cc} S_{UU} & S_{UT} \\ S_{TU} & S_{TT} \end{array} \right)$.

We can now relate the true causal per-allele effect sizes $b$ and the joint per-allele effect sizes $g$.

$$
\begin{align}
Y & = X_A^T b + e_b \\
  & = X_T^T g + e_g \\
\hat{g} & = \left( X_T X_T^T \right)^{-1} X_T X_A^T b \\
g & = \lim_{N \rightarrow \infty} \hat{g} = S_{TT} S_{TA} b \\
b & = S_{TA}^{-1} S_{TT}^{-1} g \\ \\
S_{TA} & = UDV^T \\
S_{TA}^{-1} & = VD^{-1}U^T \\
S_{TA}^{-1}S_{TA} & = I
\end{align}
$$

# Simulation

1. Simulate $S_1$ and $S_2$ from a $W_M(\Sigma,n)/n$ distribution
    * $M$ is the total number of SNPs
    * $\Sigma$ is the target LD matrix - have it be a banded matrix with some gradual LD falloff
    * $n$ is tricky - as $n$ increases, $S$ becomes tighter around $\Sigma$.
1. Simulate $X_q$ from a $N\left(0,S_q\right)$
1. Separate $X_{qT}$ and $X_{qU}$
1. Simulate $b_1$ and $b_2$ from a $N\left[0, \left( \begin{array}{cc} 1 & \rho \\ \rho & 1 \end{array} \right)\right]$ for causal alleles, 0 otherwise
1. Calculate $g_q=S_{qTT} S_{qTA} b_q$
1. Calculate $Y_q=X^Tb_q$
1. Calculate $\hat{g}_q=(X_{qT} X_{qT}^T)^{-1}X_{qT}Y_q$
1. Calculate $\hat{b}_q=S_{TA}^{-1} S_{TT}^{-1} \hat{g}_q$
1. Compare $g_q$ and $\hat{g}_q$
1. Compare $\rho$, $\rho\left(b_1,b_2\right)$ and $\rho\left(\hat{b}_1,\hat{b}_2\right)$

In [1]:
import numpy as np
import numpy.random as nr
import scipy.stats as ss

In [7]:
Mu = 50        # untyped SNPs
Mt = 50        # typed SNPs
M  = Mu + Mt   # total SNPs

N = 1000       # samples

Mc = 10        # causal SNPs
rho = 0.7      # effect size correlation

## 1. Simulate $S_q$s

In [8]:
def generate_sigma(M=100, r=0.2, k=5):
    Sigma = np.diag([0.5]*M)
    rk = 1
    for i in range(1,k+1):
        rk = rk*r
        Sigma += np.diag([rk]*(M-i), i)
    Sigma += Sigma.T
    return(Sigma)

In [9]:
Sigma = generate_sigma(M)

In [10]:
n = 1000
S = ss.wishart.rvs(n, Sigma, 2) / n

In [11]:
print(np.linalg.norm(Sigma - S[0]),
      np.linalg.norm(Sigma - S[1]),
      np.linalg.norm(S[0] - S[1]))

3.13864754977 3.19548895367 4.49800347204


## 2. Simulate $X_q$s

In [12]:
mu = np.zeros(M)

In [13]:
X = np.array([[nr.multivariate_normal(mu, s) for i in range(N)] for s in S])

In [14]:
X.shape

(2, 1000, 100)

## 3. Partition typed and untyped SNPs

In [15]:
SNPs = nr.permutation(M)
untyped = np.sort(SNPs[:Mu])
typed   = np.sort(SNPs[Mu:])

## 4. Simulate causal effect sizes

In [16]:
causal = np.sort(nr.permutation(M)[:Mc])

In [42]:
B = np.zeros((2,M))
for i in causal:
    B.T[i] = nr.multivariate_normal([0,0], [[1,rho],[rho,1]])

## 5. Calculate true joint effect sizes

In [43]:
G = np.array([np.linalg.inv(s[typed][:,typed]).dot(s[typed].dot(b))
              for b, s in zip(B, S)])

### 5b. Go back

In [47]:
def g2b (b, s, g):
    (u, d, v) = np.linalg.svd(s[typed])
    stt = s[typed][:,typed]
    return(v[:,:50].dot(u.T.dot(np.linalg.inv(stt).dot(g))/d))

In [53]:
B2 = np.array([g2b(b, s, g) for b, s, g in zip(B, S, G)])

In [52]:
np.corrcoef(B)

array([[ 1.        ,  0.78683807],
       [ 0.78683807,  1.        ]])

In [54]:
np.corrcoef(B2)

array([[ 1.        ,  0.08453754],
       [ 0.08453754,  1.        ]])

In [24]:
(u, d, v) = np.linalg.svd(s[typed])

In [35]:
stt = s[typed][:,typed]

In [40]:
v[:,:50].dot(u.T.dot(np.linalg.inv(stt).dot(g))/d)

array([  2.57836488e-01,  -5.03353016e-01,  -6.74171646e-01,
         9.67312671e-01,   5.39995439e-01,  -1.44419267e-01,
         3.59253998e-01,  -4.40325784e-01,   1.24375478e-01,
        -4.62865664e-01,  -1.04344681e+00,  -5.42789639e-01,
         6.99869292e-01,   3.75447955e-01,   5.62582552e-01,
         4.91451043e-01,  -2.68739176e-01,   1.05391450e+00,
         1.93367905e-01,  -4.60093562e-01,   8.23876205e-01,
        -1.48639871e+00,   4.51321505e-01,  -1.01277307e-02,
         1.92588355e-01,   1.14356705e-01,  -2.15369403e-01,
         9.47621536e-03,   2.48750261e-01,  -6.68955089e-02,
        -1.90201964e-03,   4.22963212e-02,  -5.21775674e-01,
        -4.07094638e-01,   3.01536692e-01,   3.23549652e-01,
        -6.90348987e-01,  -1.14263164e+00,   3.86987963e-01,
        -2.70725956e-01,   6.81852305e-01,  -5.07300971e-01,
        -6.07754343e-03,   5.19482471e-01,   1.10973252e-01,
        -2.12222578e-01,   9.03953278e-01,   2.17125636e-01,
         2.96827193e-01,

## 6. Calculate phenotypes

In [93]:
Y = np.array([x.dot(b) for x, b in zip(X, B.T)])

In [101]:
Y -= Y.mean(axis=1)[:,np.newaxis]

## 7. Calculate join effect estimates

In [114]:
Ghat = [np.linalg.inv(x.T[typed].dot(x[:,typed])).dot(x.T[typed].dot(y))
        for x, y in zip(X, Y)]

In [116]:
np.corrcoef(Ghat, G)

array([[ 1.        ,  0.30893579,  0.96279323,  0.45669473],
       [ 0.30893579,  1.        ,  0.45626239,  0.89669926],
       [ 0.96279323,  0.45626239,  1.        ,  0.58283572],
       [ 0.45669473,  0.89669926,  0.58283572,  1.        ]])

In [None]:
P = nr.uniform(0.05, 0.5, M)
V = np.diag(2*P*(1-P))

In [None]:
def LD_matrix(M=100, r=0.2, k=5):
    Sigma = np.diag([0.5]*M)
    rk = 1
    for i in range(1,k+1):
        rk = rk*r
        Sigma += np.diag([rk]*(M-i), i)
    Sigma += Sigma.T
    return(Sigma)

In [None]:
Sigma = LD_matrix()

In [None]:
S   = np.sqrt(V).dot(Sigma).dot(np.sqrt(V))

In [None]:
X = nr.multivariate_normal([0]*100, Cov, 10000).T

In [None]:
Covhat = X.dot(X.T) / X.shape[1]

In [None]:
X.shape

In [None]:
Sigmahat = np.corrcoef(X)

In [None]:
b = nr.normal(size=M)

In [None]:
Y = X.T.dot(b)
Y -= Y.mean()

In [None]:
a = S.dot(b)/np.diag(V)

In [None]:
ahat = X.dot(Y) / (X**2).sum(axis=1)

In [None]:
np.vstack([a, S.dot(b)/np.diag(S), ahat, Covhat.dot(b)/np.diag(Covhat), Sigmahat.dot(b)/np.diag(Sigmahat)]).T

In [None]:
X.shape

In [None]:
X.shape

In [None]:
Sigma

In [None]:
Ma = 270
Mb = 30
Mc = 970

In [None]:
rho = 