# Analysis with precomputed quantatities + missing data in Y

Here we analyze [an example of GTEx V7 cis-gene data-set](http://shiny.stephenslab.uchicago.edu/gaow/GTExV7/Multi_Tissues.ENSG00000145214.RDS).

In [1]:
raw_data = readRDS('Multi_Tissues.ENSG00000145214.RDS')

In [2]:
names(raw_data)

In [3]:
dim(raw_data$X)

In [4]:
dim(raw_data$y_res)

In [5]:
head(raw_data$y_res)

Unnamed: 0,Adipose_Subcutaneous,Adipose_Visceral_Omentum,Adrenal_Gland,Artery_Aorta,Artery_Coronary,Artery_Tibial,Brain_Amygdala,Brain_Anterior_cingulate_cortex_BA24,Brain_Caudate_basal_ganglia,Brain_Cerebellar_Hemisphere,⋯,Skin_Not_Sun_Exposed_Suprapubic,Skin_Sun_Exposed_Lower_leg,Small_Intestine_Terminal_Ileum,Spleen,Stomach,Testis,Thyroid,Uterus,Vagina,Whole_Blood
GTEX-1117F,0.02125546,0.31798809,,,0.2393113,-1.04494665,,,,,⋯,0.1506737,,,,,,,0.01039438,0.722842,
GTEX-111CU,-0.39699365,-0.64860112,-0.005019028,,,,,,,,⋯,0.63489211,,0.001705089,-0.2423771,-0.23110084,-0.8254563,-0.72159107,,,
GTEX-111FC,0.12702099,,,,,-0.11075103,,,,,⋯,-0.27617936,0.003240822,,-0.1753999,,0.3182093,-0.44989562,,,
GTEX-111VG,-0.56557561,,,,,,,,,,⋯,-0.18524402,-0.370778712,,,,-0.5285808,-0.47352474,,,
GTEX-111YS,-0.70396973,-0.1960398,0.245463191,0.04715705,,-0.07033531,,,,,⋯,-0.3067258,,0.492145996,,-0.08577711,-0.4095227,-0.47541942,,,-1.06802847
GTEX-1122O,0.04918042,-0.08174026,0.365081493,-0.07085276,-0.1343889,0.40367406,,,,,⋯,-0.09075013,0.394240061,0.404058046,-0.2285992,0.14804767,,-0.02374288,,,0.08137622


## Initialize data object

In [6]:
data = mmbr:::DenseData$new(raw_data$X, raw_data$y_res)

## Setting up MASH object

In [7]:
residual_covar = diag(apply(raw_data$y_res, 2, function(x) var(x, na.rm=T)))

In [8]:
prior_mats = mmbr:::create_cov_canonical(ncol(raw_data$y_res), singletons=F)

In [9]:
scaling = c(0.05,0.15,0.25,0.4) # FIXME: use auto-grid

In [13]:
str(prior_mats)

List of 5
 $ : num [1:49, 1:49] 1 0 0 0 0 0 0 0 0 0 ...
 $ : num [1:49, 1:49] 1 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 ...
 $ : num [1:49, 1:49] 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
 $ : num [1:49, 1:49] 1 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 0.75 ...
 $ : num [1:49, 1:49] 1 1 1 1 1 1 1 1 1 1 ...


In [15]:
str(data)

Classes 'DenseData', 'R6' <DenseData>
  Public:
    add_to_residual: function (value) 
    clone: function (deep = FALSE) 
    compute_MXt: function (M) 
    compute_residual: function (fitted) 
    compute_Xb: function (b) 
    initialize: function (X, Y, center = TRUE, scale = TRUE) 
    n_condition: active binding
    n_effect: active binding
    n_sample: active binding
    remove_from_residual: function (value) 
    rescale_coef: function (b) 
    X: active binding
    X_has_missing: active binding
    X2_sum: active binding
    XtR: active binding
    XtX: active binding
    XtY: active binding
    Y: active binding
    Y_has_missing: active binding
  Private:
    .X: -0.156460235018997 -0.156460235018997 -0.156460235018997 ...
    .Y: 0.0212554552618386 -0.396993654855127 0.127020987443375  ...
    .Y_has_missing: TRUE
    cm: 0.0238948630831407 0.03125 0.0323353769671092 0.21771036 ...
    csd: 0.152721636141217 0.173472166622178 0.176677642549131 0. ...
    d: 708.232129614059

Now using EE model,

In [16]:
mash_init = mmbr:::MashInitializer$new(prior_mats, scaling, alpha = 0)

In [17]:
mash_init$precompute_cov_matrices(data, residual_covar)

The line above currently takes 3m40s. **It is 2.5GB on disk in RDS format**. This is to compute for $R = 49, J = 7962, P = 21$. **$P$ is 21 for null weight plus at most 20 other components**. I saved it to disk,

In [19]:
saveRDS(mash_init, 'mash_init.rds')

```
-rw-r--r-- 1 gaow gaow  2.5G May 12 07:41 mash_init.rds
```

And test the memory it takes to keep it -- 7.42GB.

```bash
python ~/GIT/github/misc/monitor/monitor.py Rscript -e "mash_init = readRDS('mash_init.rds')"
```

```
time elapsed: 25.18s
peak first occurred: 15.40s
peak last occurred: 24.65s
max vms_memory: 7.42GB
max rss_memory: 7.23GB
memory check interval: 1s
return code: 0
```

 ## Fit one MASH regression model

In [20]:
mmbr_obj = mmbr:::MashRegression$new(ncol(raw_data$X), residual_covar, mash_init)

In [None]:
mmbr_obj$fit(data)

The step above now take 2 min. This is for one iteration. `MV-SuSiE` computation time will depend on how many iterations there are, as we will find out next. 

## Fit MV-SuSiE model

In [None]:
#mash_init = readRDS('mash_init.rds')

In [None]:
res = mmbr::susie(raw_data$X,raw_data$y_res,
                  L=10,prior_variance=mash_init,
                  compute_objective=FALSE)

Code above takes 25 minutes to complete for L=5, >1hr for L=10.

In [None]:
saveRDS(res, "mmbr_res_10.rds")

## Visualize results

In [None]:
p = mmbr::mmbr_plot(res)

In [None]:
pdf('mmbr_plot_ENSG00000145214.pdf', width = 20, height = 20)
print(p$plot)
dev.off()

In [None]:
pdf('susie_plot_ENSG00000145214.pdf', width=9, height=5)
susieR::susie_plot(res,y='PIP', main = 'Default SuSiE plot for cross-condition PIP', xlab = 'SNP positions', add_legend = T)
dev.off()

In [None]:
%preview susie_plot_ENSG00000145214.pdf -s png --dpi 150

In [None]:
%preview mmbr_plot_ENSG00000145214.pdf -s png --dpi 150