# Fine mapping on FMO2 data in GTEx

This is an update and combined version of [m&m](20180415_MNMASH_FMO2.html) and [susie](20180416_SingleCondition_FMO2.html) on multi-tissue fine mapping.

Major updates:

- Fixed lfsr computation: now lfsr is defined as how "mappable" are the SNP sets identified by the $l$-th fit.
- In addition to a modified version of effect-level plots (posterior mean plot similar to the ones shown before), we summarize for each $l$:
  - The size of 95% HPD interval
  - The purity of it (defined by the smallest pair-wise LD)
  - The minimum lfsr across conditions
  
  we expect to see high correlation between small size, high purity and low lfsr.

## The DSC prototype

As previously explained we use DSC for prototyping of these methods

In [1]:
%cd ../src/model_dsc

/home/gaow/Documents/GIT/github/mvarbvs/src/model_dsc

In [2]:
./model.dsc -h


INFO: [32mMODULES[0m
+------------+---------------------+--------------------------------+-------------------+------+
|            |      parameters     |             input              |       output      | type |
+------------+---------------------+--------------------------------+-------------------+------+
|  get_data  |      data_file      |                                |        data       |  R   |
| original_Y |                     |              data              |        data       |  PY  |
|  init_mnm  | Sigma, (U, grid, p) |              data              |    data, model    |  R   |
|  fit_mnm   |      maxL, maxI     |          data, model           | fitted, posterior |  R   |
| fit_varbvs |    sa, maxL, maxI   |              data              | posterior, fitted |  R   |
|  diagnose  |                     | data, model, fitted, posterior |     diagnosed     |  R   |
| fit_susie  |    sa, maxL, maxI   |              data              | posterior, fitted |  R   |
+-----

As a first pass I extracted data for FMO2 on Thyroid and Lung. I use a maximum of 5 effects and 10 iterations of variational updates:

In [3]:
./model.dsc

[1;32mINFO: Checking R library mashr ...[0m
[1;32mINFO: Checking R library abind ...[0m
[1;32mINFO: Checking R library varbvs@pcarbo/varbvs/varbvs-R ...[0m
[1;32mINFO: Checking R library susieR@stephenslab/susieR ...[0m
[1;32mINFO: Checking R library dscrutils@stephenslab/dsc/dscrutils ...[0m
Downloading GitHub repo stephenslab/dsc@master
from URL https://api.github.com/repos/stephenslab/dsc/zipball/master
Installing dscrutils
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet  \
  CMD INSTALL  \
  '/tmp/Rtmpcofya8/devtools2bee1e057a13/stephenslab-dsc-0564d6e/dscrutils'  \
  --library='/home/gaow/R/x86_64-pc-linux-gnu-library/3.4' --install-tests 

* installing *source* package ‘dscrutils’ ...
** R
** inst
** tests
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (dscrutils)
Reloading installed dscrutils
INFO: DSC script exported to [32mmnm_mod

Scripts for the DSC can be found [here](20180424_mnm_model.html).

## Plot utilities

In [4]:
import seaborn as sns
import matplotlib.pyplot as plt

def plot_beta(yaxis, zaxis, ld, ci, conf):
    xaxis = [x+1 for x in range(len(yaxis))]
    cmap = sns.cubehelix_palette(start=2.8, rot=.1, as_cmap=True)
    f, ax = plt.subplots(figsize=(18,5))
    points = ax.scatter(xaxis, yaxis, c=zaxis, cmap=cmap)
    f.colorbar(points, label=conf['zlabel'])
    if 'CIMax' in conf:
        for idx in range(ci[1].shape[0]):
            if ci[0][idx] < conf['CIMax']:
                for ii, xx in enumerate(yaxis):
                    if ci[1][idx, ii] > 0:
                        ax.scatter(xaxis[ii], yaxis[ii], s=80,
                                   facecolors='none', edgecolors='#D3D3D3')
    if 'pip_cutoff' in conf:
        for idx, item in enumerate(zaxis):
            if item > conf['pip_cutoff']:
                ax.scatter(xaxis[idx], yaxis[idx], s=80, 
                           facecolors='none', edgecolors='r')
                for ii, xx in enumerate(ld[idx,:]):
                    if xx > conf['ld_cutoff1'] and xx < 1.0:
                        ax.scatter(xaxis[ii], yaxis[ii], 
                                   color='y', marker='+')
                for ii, xx in enumerate(ld[idx,:]):
                    if xx > conf['ld_cutoff2'] and xx < 1.0:
                        ax.scatter(xaxis[ii], yaxis[ii], 
                                   color='g', marker='x')
    ax.set_title(conf['title'])
    ax.set_ylabel(conf['ylabel'])
    plt.gca()
    plt.show()

## Results from M&M

In [5]:
res = readRDS('mnm_model/fit_mnm/get_data_1_original_Y_1_init_mnm_1_fit_mnm_1.rds')$posterior
dat = readRDS('mnm_model/init_mnm/get_data_1_original_Y_1_init_mnm_1.rds')$data

In [15]:
r2 = dat$r2

In [21]:
%get r2 res --from R

In [23]:
res['alpha']

array([[5.02270854e-16, 5.28579835e-13, 2.41333343e-06, 2.32107453e-10,
        7.02121251e-05],
       [7.10181301e-16, 1.63280769e-12, 4.89671641e-06, 4.74065873e-10,
        1.53345627e-04],
       [9.74334382e-16, 1.47910641e-12, 4.62195429e-06, 4.03392300e-10,
        1.33824069e-04],
       ...,
       [5.23380415e-16, 5.16879272e-13, 1.94544887e-06, 2.30002450e-10,
        6.79727553e-05],
       [6.67798587e-16, 3.15949653e-13, 1.48179937e-06, 3.68383617e-10,
        5.50379340e-05],
       [5.62294006e-16, 5.23157669e-13, 1.96230912e-06, 2.30986910e-10,
        6.93099179e-05]])

In [24]:
res['n_in_CI']

[1, 4, 16, 12, 6248]

In [25]:
res['in_CI']

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [1., 1., 1., ..., 1., 1., 1.]])

In [26]:
res['lfsr']

array([[7.69767063e-12, 1.53965009e-07],
       [2.84015127e-08, 1.19111949e-07],
       [2.28957405e-02, 2.37296034e-02],
       [4.61633617e-03, 3.63572877e-06],
       [7.28444095e-01, 7.45542232e-01]])

### Purity plot

See beginning of this document. Here I call this type of plot the "purity plot".

We focus on inferred posterior mean. We plot this quantity in both Thyroid and Lung tissues, annotated by local false sign rate (lfsr) and LD structure.

### Thyroid results

Top signals ($lfsr < 0.05$) are circled in red, with SNPs in LD with it ($r^2>0.1$) colored in yellow, ($r^2>0.6$) colored in green. 

In [None]:
conf = {'title': 'FMO2, Thyroid, M&M ASH', 
        'ylabel': 'betahat', 
        'zlabel': 'PIP (1 - lfsr)',
        'pip_cutoff': 0.95,
        'ld_cutoff1': 0.1,
        'ld_cutoff2': 0.6,
        'CIMax': 50}
plot_beta(post_mean[:,0], 1 - lfsr[:,0], r2, (n_in_CI, in_CI), conf)

### Lung results

In [None]:
conf = {'title': 'FMO2, Lung, M&M ASH', 
        'ylabel': 'betahat', 
        'zlabel': 'PIP (1 - lfsr)',
        'pip_cutoff': 0.95,
        'ld_cutoff1': 0.1,
        'ld_cutoff2': 0.6, 
        'CIMax': 50}
plot_beta(post_mean[:,1], 1 - lfsr[:,1], r2, (n_in_CI, in_CI), conf)