# Proof

There is a causal SNP which is untyped with causal per-SNP effect $b_1$ and $b_2$ in populations 1 and 2. It is in LD with a tagged SNP with causal effect. The LD in the two populations is $r_1$ and $r_2$. This leads to the following:

$$
\begin{aligned}
a_1 & = r_1 \times b_1 \\
a_2 & = r_2 \times b_2 \\
\\
\rho \left( \alpha_1, \alpha_2 \right)
  & = \rho \left( r_1 b_1, r_2 b_2 \right) \\
  & = \frac
    { \text{cov} \left( r_1 b_1, r_2 b_2 \right) }
    {
      \sqrt{
        \text{Var} \left( r_1 b_1 \right)
        \text{Var} \left( r_2 b_2 \right)
      }
    } \\
\\
\text{Var} \left( r_i b_i \right)
  & = \text{E} \left[ \left( r_i b_i \right)^2 \right] 
    - \left[ \text{E} \left( r_i b_i \right) \right]^2 \\
  & = \text{E} \left( r_i^2 \right) \text{E} \left( b_i^2 \right)
  ,\; r_i \perp b_i,\, \text{E}(b_i) = 0 \text{ or } \text{E}(r_i) = 0 \\
  & = \text{Var} \left( r_i \right) \text{Var} \left( b_i \right )
  ,\; \text{E}(b_i) = 0 \textbf{ and } \text{E}(r_i) = 0 \\
\\
\text{cov} \left( r_1 b_1, r_2 b_2 \right)
  & = \text{E} \left\{
      \left[ r_1 b_1 - \text{E} \left( r_1 b_1 \right) \right]
      \left[ r_2 b_2 - \text{E} \left( r_2 b_2 \right) \right]
    \right\} \\
  & = \text{E} \left( r_1 r_2 b_1 b_2 \right)
  ,\; \left[ \text{E}(b_1) = 0 \text{ or } \text{E}(r_1) = 0 \right] \text{ and }
    \left[ \text{E}(b_2) = 0 \text{ or } \text{E}(r_2) = 0 \right] \\
  & = \text{E} \left( r_1 r_2 \right) \text{E} \left( b_1 b_2 \right)
  ,\; r_i \perp b_j \\
  & = \text{cov} \left( r_1 r_2 \right) \text{cov} \left( b_1 b_2 \right)
  ,\; \text{E}(b_1) = \text{E}(b_2) = \text{E}(r_1) = \text{E}(r_2) = 0 \\
\\
\rho \left( \alpha_1, \alpha_2 \right)
  & = \frac
    { \text{cov} \left( r_1 r_2 \right) \text{cov} \left( b_1 b_2 \right) }
    {
      \sqrt{
        \text{Var} \left( r_1 \right)
        \text{Var} \left( r_2 \right)
        \text{Var} \left( b_1 \right)
        \text{Var} \left( b_2 \right)
      }
    } \\
 & = \rho \left( r_1, r_2 \right) \rho \left( b_1, b_2 \right )
\end{aligned}
$$

# Test

First simulate the $b$s.

In [1]:
import numpy as np
import numpy.random as nr

import scipy.stats as ss

In [2]:
M = 1000
N = 1000000

In [3]:
rho_b = 0.7

In [4]:
b = nr.multivariate_normal((0,0), ((1,rho_b),(rho_b, 1)), M)
np.corrcoef(b.T)[1,0]

0.6998293274673173

Simulate the $r$s. This is a bit trickier becuase $r \in \left[ -1,1 \right]$.

In [5]:
rho_r = 0.7

In [6]:
r = nr.multivariate_normal((0,0), ((1,rho_r),(rho_r, 1)), M)
r[r > 1] = 1
r[r < -1] = 1
np.corrcoef(r.T)[1,0]

0.25007696668038587

That doesn't work so well. Instead, simulate $r$ from the multivariate normal as usual, and then map it to a beta distribution that is stretched to span $\left[-1,1\right]$ and have expectation 0 (so $\text 2*{Beta}\left( \alpha, \alpha \right) - 1$).

In [7]:
r = nr.multivariate_normal((0,0), ((1,rho_r),(rho_r, 1)), M)
r = ss.norm.cdf(r)
r = 2*ss.beta.ppf(r, 2, 2) - 1

print(np.min(r), np.max(r))
print(r.mean(axis=0))
print(np.corrcoef(r.T)[0,1])

-0.963817585762 0.990170323728
[ 0.00031517  0.01309775]
0.685678007048


Simulate the true marginal effect sizes.

In [8]:
a = r * b

In [9]:
np.corrcoef(a.T)[0,1]

0.49479662832080706

In [10]:
rho_b*rho_r

0.48999999999999994

Simulate the causal SNPs.

In [11]:
p = nr.uniform(0.05, 0.95, (M, 2))

In [12]:
X = np.zeros((M, 2, N), dtype='uint8')
for m in range(M):
    for i in range(2):
        X[m,i,:] = nr.binomial(2, p[m,i], N)

Simulate the tagged SNPs.

In [13]:
Y = np.zeros((M, 2, N), dtype='uint8')
for m in range(M):
    for i in range(2):
        replace   = nr.binomial(1, 1-np.abs(r[m,i]), N)
        candidate = nr.binomial(2, p[m,i], N)
        Y[m,i,:]  = replace*candidate + (1-replace)*X[m,i,:]
        if r[m,i] < 0: Y[m,i,:] = 2 - Y[m,i,:]

In [14]:
rhat = np.zeros(r.shape)
for m in range(M):
    for i in range(2):
        rhat[m,i] = np.corrcoef(X[m,i], Y[m,i])[0,1]

Estimate the phenotype.

In [15]:
Z = X.T.dot(b)[:,(0,1),(0,1)]
Z -= Z.mean(axis=0)[np.newaxis,:]

Estimate the marginal effect sizes.

In [16]:
ahat = np.zeros((M,2))
for i in range(2):
    y = Y[:,i,:]
    y = y - y.mean(axis=1)[:,np.newaxis]
    z = Z[:,i]
    ahat[:,i] = y.dot(z) / (y**2).sum(axis=1)

The correlation of estimated marginal effect sizes.

In [17]:
np.corrcoef(ahat.T)[0,1]

0.49220275457945828