# LD calculation between top eQTL in GTEx V6 data

Here we compute the LD between the 16K top eQTL we analyzed in GTEx data. Two methods were used:

1. Simple correlation between genotypes
2. A sparse method (Wen & Stephens 2010)

We expect to see very weak LD between genotypes. We use Whole Blood sample genotypes because it has a large sample size.

## A survey of strong LD patterns

We take a look at sparse method result for chromosome 1. 

In [1]:
dat = as.matrix(readRDS("LD_Whole_Blood_1.RDS"))
dim(dat)[1]

We adjust names of columns and rows:

In [None]:
rownames(dat) = unlist(lapply(rownames(dat), function(x) (paste(strsplit(x, '_')[[1]][-1], collapse = '_'))))
colnames(dat) = unlist(lapply(colnames(dat), function(x) (paste(strsplit(x, '_')[[1]][-1], collapse = '_'))))

We mark the upper triangle and diagonal of the matrix zero, because the matrix is symetric with diagonal elements equal 1. We want to focus on values in the lower triangle.

In [2]:
dat0 = dat
diag(dat) = 0
dat[upper.tri(dat)] = 0

Now we count the total number and proportion of LD pairs having LD greater than 0.5:

In [1]:
m = abs(dat) > 0.5
sum(m)
sum(m) / ((dim(dat)[1]^2 - dim(dat)[1])/ 2)

ERROR: Error in eval(expr, envir, enclos): object 'dat' not found


The SNPs involved are (**unique row and column names of TRUE entries in matrix `m`**):

In [4]:
snps <- c(rownames(m)[row(m)[which(m)]], colnames(m)[col(m)[which(m)]])
length(unique(snps))

## A survey of overall LD patterns

In [None]:
colors = c(
"#ffffff",
"#ffe6e6",
"#ffcccc",
"#ffb3b3",
"#ff9999",
"#ff8080",
"#ff6666",
"#ff4d4d",
"#ff3333",
"#ff1a1a",
"#ff0000",
"#e60000",
"#cc0000",
"#b30000",
"#990000",
"#800000",
"#660000",
"#4d0000",
"#330000",
"#1a0000",
"#000000"
)
LDheatmap::LDheatmap(dat0, color = colors, flip = T, title = "")

## Putting together all chroms

In [5]:
grid = c(0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99)
prop = matrix(0, length(grid), 22)
snps = matrix(0, length(grid), 22)
plot_widgets = list()
n_snps = 0
for (i in 1:22) {
    tmp = as.matrix(readRDS(paste0("LD_Whole_Blood_", i, ".RDS")))
    diag(tmp) = 0
    tmp[upper.tri(tmp)] = 0
    n_snps = n_snps + dim(tmp)[1]
    plot_widgets[[i]] = vector()
    for (j in 1:length(grid)) {
        m = abs(tmp) > grid[j]
        prop[j,i] = sum(m) / ((dim(tmp)[1]^2 - dim(tmp)[1])/ 2)
        ss = c(rownames(m)[row(m)[which(m)]], colnames(m)[col(m)[which(m)]])
        snps[j,i] = length(unique(ss))
        tmp[!m] = 0
        plot_widgets[[i]][j] = plotly::plot_ly(z = tmp, colors = colorRamp(c("#00356B", "#A4DDED", "white", "#FFC0CB", "#722F37")), type = "heatmap")
    }
}

Here is summary of proportion of pairs on each chromosome (row) having LD greater than thresholds (column)

In [6]:
prop = data.frame(t(prop))
colnames(prop) = grid
rownames(prop) = paste('chr', 1:22)
prop
apply(prop, 2, mean)

Unnamed: 0,0.5,0.6,0.7,0.8,0.9,0.95,0.99
chr 1,0.0009402058,0.0006015942,0.0003953333,0.0002621232,0.0001418044,7.648841e-05,2.578261e-05
chr 2,0.001106003,0.000768501,0.0005358983,0.0003443432,0.0001983964,0.0001003384,1.368251e-05
chr 3,0.0017750274,0.0012965136,0.0008244663,0.000465581,0.0002942213,0.0001907589,2.586561e-05
chr 4,0.0012087494,0.0009189807,0.0006788866,0.0004801881,0.0002318149,0.000157303,4.139553e-05
chr 5,0.0007513954,0.000536711,0.0003220266,0.0003220266,0.0001073422,0.0,0.0
chr 6,0.0012832255,0.0007871086,0.0006010647,0.0004484134,0.0003005324,0.0001860438,3.339249e-05
chr 7,0.0019135913,0.0013430796,0.0009865098,0.000665597,0.0003684555,0.00020998,4.358075e-05
chr 8,0.0018098755,0.0014449812,0.0010727891,0.0006568097,0.0003284048,0.000124064,2.919154e-05
chr 9,0.001987232,0.0014469533,0.0010184564,0.000558909,0.0003663959,0.0001552525,4.96808e-05
chr 10,0.0019800333,0.0014642263,0.001120355,0.0007321131,0.0003327787,0.0001719357,5.546312e-05


Here is summary of number of unique SNPs involved on each chromosome (row) having LD greater than thresholds (column)

In [7]:
snps = data.frame(t(snps))
colnames(snps) = grid
rownames(snps) = paste('chr', 1:22)
snps

Unnamed: 0,0.5,0.6,0.7,0.8,0.9,0.95,0.99
chr 1,804,668,525,416,260,158,54
chr 2,422,346,268,204,136,79,12
chr 3,366,309,255,176,124,87,16
chr 4,176,139,116,90,52,35,10
chr 5,14,10,6,6,2,0,0
chr 6,258,192,154,116,88,58,14
chr 7,345,309,255,188,127,79,22
chr 8,219,186,146,103,68,34,8
chr 9,256,206,172,124,86,36,16
chr 10,279,224,185,127,82,43,17


Proportion of SNPs involved are:

In [8]:
apply(snps, 2, sum) / n_snps