# mr-ash example analysis
Here we analyze [GTEx based simulation data](MR-ASH-Simulation.html) using mr-ash implemented  in `varbvs`. The simulated data is stored in the same format as the [processed GTEx](../pipeline/Pipelines.html#Preprocessing) data; therefore the same procedure can be used directly for data analysis. The only difference is that simulated data do not have covariates.

All files are saved in HDF5 format on midway `project/compbio/internal_public_supp/GTEx7Toy`. In the toy data-set I selected 3 genes:

```
/chr4/ENSG00000145214
/chr18/ENSG00000264247
/chr19/ENSG00000267508
```
These should be used as `geno_table` variable in the code below. 

For the toy real data there are 53 groups in the HDF file. Their names are available via:

```bash
h5ls TY.expr.h5 
```
For the simulated data there is only one group called `simulated`, which is the data I'll use in the example below.

To load data:

In [1]:
# source("http://bioconductor.org/biocLite.R")
# biocLite("rhdf5")
library(rhdf5)
fpath = '/home/gaow/Documents/GTEx/ToyExample'
genotype_file = paste0(fpath, '/TY.genotype.h5')
expr_file = paste0(fpath, '/TY.expr_simulated.h5')
gene = 'ENSG00000145214'
geno_table = '/chr4/ENSG00000145214'
expr_table = '/simulated'

load_data = function(genotype_file, expr_file, geno_table, expr_table) {
    geno <- h5read(genotype_file, geno_table)
    gdata = geno$block0_values
    colnames(gdata) = geno$axis1
    rownames(gdata) = geno$axis0
    expr <- h5read(expr_file, expr_table)
    edata = expr$block0_values
    colnames(edata) = expr$axis1
    rownames(edata) = expr$axis0
    edata = edata[, basename(geno_table)]
    return(list(X=gdata,y=edata))
}

dat = load_data(genotype_file, expr_file, geno_table, expr_table)

To analyze:

In [2]:
# library(devtools)
# install_github("pcarbo/varbvs",subdir = "varbvs-R")
X = as.matrix(dat$X)
storage.mode(X) <- "double"
y = as.vector(dat$y)
res = varbvs::varbvsmix(X, NULL, y, sa = c(0, 1, 0.4, 3))

Fitting variational approximation for linear regression model with
mixture-of-normals priors.
samples:      635    mixture component sd's:    0.63..1.7
variables:    7258   fit mixture variances:     no
covariates:   0      fit mixture weights:       yes
mixture size: 4      fit residual var. (sigma): yes
intercept:    yes    convergence tolerance      1.0e-04
       variational    max. --------- hyperparameters ---------
iter   lower bound  change   sigma  mixture sd's  mix. weights
0001 -1.146383e+04 8.4e-01 9.5e+01       [0.6,2] [0.072,0.625]                                                             0002 -6.222954e+03 9.0e-01 1.7e+02       [0.6,2] [0.013,0.868]                                                             0003 -3.873344e+03 9.7e-01 2.2e+02       [0.6,2] [0.002,0.949]                                       

To extract results from analysis:

In [3]:
res$pip = res$alpha %*% c(res$w)
res$beta = res$mu %*% c(res$w)

## Compare with simulation parameters

In [4]:
meta = paste0(fpath, '/TY.meta_simulation.json')
# install.packages('rjson', repos = 'http://cloud.r-project.org')
meta <- rjson::fromJSON(file = meta)
str(meta)

List of 4
 $ pi   : num [1:3] 0.25 0.3 0.45
 $ pi0  : num 0.98
 $ sigma: num [1:3] 1 0.4 3
 $ beta :List of 3
  ..$ ENSG00000264247: num [1:9871] -5.36 0 0 0 0 ...
  ..$ ENSG00000267508: num [1:8911] -4.83 0 0 0 0 ...
  ..$ ENSG00000145214: num [1:7258] 4.07 0 0 0 0 ...


Compare mixture proportion estimates:


In [5]:
truth = c(meta$pi0, meta$pi * (1 - meta$pi0))
est = res$w
cbind(truth, est)

truth,est
0.98,0.9922627
0.005,0.007641773
0.006,9.553623e-05
0.009,1.814362e-09


Compare effect size estimates:

In [6]:
beta = cbind(res$beta, meta$beta[[gene]])
beta = beta[order(beta[,2]),]
beta

0,1,2
4:289272:C:T,-4.629785e-03,-2.9899070
4:322080:A:G,-9.845190e-04,-2.8569581
4:525147:A:T,-1.466576e-02,-2.3781926
4:572249:T:G,-3.039666e-03,-2.3617605
4:811215:A:G,-1.542018e-02,-2.1958673
4:1084335:G:C,-2.937208e-03,-2.0941702
4:1730254:A:G,-5.409377e-03,-1.8410226
4:999595:G:A,-2.051137e-02,-1.6770177
4:586981:G:T,-2.378742e-03,-1.6481972
4:1134766:C:T,-2.238047e-03,-1.5740744
