# LD calculation between top eQTL in GTEx V6 data

Here we compute the LD between the 16K top eQTL we analyzed in GTEx data using simple correlation between 

We expect to see very weak LD between genotypes. We use Whole Blood sample genotypes because it has a large sample size.

## A survey of strong LD patterns

We take a look at sparse method result for chromosome 1. 

In [1]:
dat = as.matrix(feather::read_feather("LD_chrom_1.feather"))
rownames(dat) = colnames(dat)
dim(dat)[1]

We mark the upper triangle and diagonal of the matrix zero, because the matrix is symetric with diagonal elements equal 1. We want to focus on values in the lower triangle.

In [2]:
dat0 = dat
diag(dat) = 0
dat[upper.tri(dat)] = 0

Now we count the total number and proportion of LD pairs having LD greater than 0.5:

In [3]:
m = abs(dat) > 0.5
sum(m)
sum(m) / ((dim(dat)[1]^2 - dim(dat)[1])/ 2)

The SNPs involved are (**unique row and column names of TRUE entries in matrix `m`**):

In [4]:
snps <- c(rownames(m)[row(m)[which(m)]], colnames(m)[col(m)[which(m)]])
length(unique(snps))

## Putting together all chroms

In [5]:
grid = c(0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99)
prop = matrix(0, length(grid), 22)
snps = matrix(0, length(grid), 22)
n_snps = 0
for (i in 1:22) {
    tmp = as.matrix(feather::read_feather(paste0("LD_chrom_", i, ".feather")))
    rownames(tmp) = colnames(tmp)
    diag(tmp) = 0
    tmp[upper.tri(tmp)] = 0
    n_snps = n_snps + dim(tmp)[1]
    for (j in 1:length(grid)) {
        m = abs(tmp) > grid[j]
        prop[j,i] = sum(m) / ((dim(tmp)[1]^2 - dim(tmp)[1])/ 2)
        ss = c(rownames(m)[row(m)[which(m)]], colnames(m)[col(m)[which(m)]])
        snps[j,i] = length(unique(ss))
    }
}

Here is summary of proportion of pairs on each chromosome (row) having LD greater than thresholds (column)

In [6]:
prop = data.frame(t(prop))
colnames(prop) = grid
rownames(prop) = paste('chr', 1:22)
prop
apply(prop, 2, mean)

Unnamed: 0,0.5,0.6,0.7,0.8,0.9,0.95,0.99
chr 1,0.0003818616,0.0002871086,0.0002037545,0.0001332241,6.839312e-05,4.488298e-05,1.211128e-05
chr 2,0.0005124475,0.0003765295,0.0002755094,0.0001891831,8.816301e-05,4.224478e-05,9.183647e-06
chr 3,0.0007963595,0.0005352156,0.0003981797,0.000297342,0.000186162,8.790981e-05,1.292791e-05
chr 4,0.0007034919,0.0005724493,0.0004138188,0.0002758792,0.0001655275,7.586678e-05,2.069094e-05
chr 5,0.0005975534,0.0004134017,0.0003081722,0.0002029427,0.0001352951,6.764756e-05,1.503279e-05
chr 6,0.0005323771,0.0004161108,0.0003273813,0.0002508904,0.0001621608,8.872952e-05,1.835783e-05
chr 7,0.0010071014,0.000762224,0.0005552854,0.0003621426,0.0002000407,7.932648e-05,1.724489e-05
chr 8,0.0010113156,0.0007399871,0.0005179909,0.0003268276,0.0001233312,7.399871e-05,6.166559e-06
chr 9,0.0009929209,0.0006790962,0.0004938881,0.0003858501,0.0001440507,6.173602e-05,3.601268e-05
chr 10,0.0009643907,0.0006901564,0.0005119041,0.00031994,0.0001416877,8.227029e-05,2.285286e-05


Here is summary of number of unique SNPs involved on each chromosome (row) having LD greater than thresholds (column)

In [7]:
snps = data.frame(t(snps))
colnames(snps) = grid
rownames(snps) = paste('chr', 1:22)
snps

Unnamed: 0,0.5,0.6,0.7,0.8,0.9,0.95,0.99
chr 1,587,505,405,296,170,113,33
chr 2,306,243,214,157,86,43,10
chr 3,298,238,187,153,106,53,10
chr 4,132,118,96,66,42,18,6
chr 5,207,166,135,95,63,36,8
chr 6,216,173,140,117,81,51,12
chr 7,275,225,173,137,86,42,10
chr 8,154,123,102,76,40,24,2
chr 9,183,157,124,100,42,23,14
chr 10,201,146,115,94,43,30,7


Proportion of SNPs involved are:

In [8]:
apply(snps, 2, sum) / n_snps

## Analysis for the entire data matrix

In [None]:
dat = as.matrix(feather::read_feather("LD_all.feather"))
rownames(dat) = colnames(dat)
dat0 = dat
diag(dat) = 0
dat[upper.tri(dat)] = 0
grid = c(0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99)
prop = matrix(0, length(grid), 1)
snps = matrix(0, length(grid), 1)
for (j in 1:length(grid)) {
  m = abs(dat) > grid[j]
  prop[j, 1] = sum(m) / ((dim(dat)[1]^2 - dim(dat)[1])/ 2)
  ss = c(rownames(m)[row(m)[which(m)]], colnames(m)[col(m)[which(m)]])
  snps[j, 1] = length(unique(ss))
}
rownames(prop) = grid
rownames(snps) = grid

```
> prop
             [,1]
0.5  4.056535e-05
0.6  3.053422e-05
0.7  2.245775e-05
0.8  1.531286e-05
0.9  8.068154e-06
0.95 4.258654e-06
0.99 9.482160e-07

> prop * ((dim(dat)[1]^2 - dim(dat)[1])/ 2)
     [,1]
0.5  4877
0.6  3671
0.7  2700
0.8  1841
0.9   970
0.95  512
0.99  114

> snps
     [,1]
0.5  5000
0.6  4134
0.7  3335
0.8  2509
0.9  1474
0.95  835
0.99  223

> snps / dim(dat)[1]
           [,1]
0.5  0.32243503
0.6  0.26658928
0.7  0.21506416
0.8  0.16179790
0.9  0.09505385
0.95 0.05384665
0.99 0.01438060
```