# Analysis with precomputed matrix quantatities + missing data in Y

Here we analyze an example of GTEx V7 cis-gene data-set.

In [1]:
raw_data = readRDS('Multi_Tissues.ENSG00000145214.RDS')
# raw_data$y_res = as.matrix(raw_data$y_res[,-1])
names(raw_data)

In [2]:
dim(raw_data$X)

In [3]:
dim(raw_data$y_res)

## Initialize data object

In [4]:
data = mmbr:::DenseData$new(raw_data$X, raw_data$y_res)

## Setting up MASH object

In [5]:
residual_covar = diag(apply(raw_data$y_res, 2, function(x) var(x, na.rm=T)))

In [6]:
prior_mats = mmbr:::create_cov_canonical(ncol(raw_data$y_res), singletons=F)

In [7]:
scaling = c(0.05,0.15,0.25,0.4) # FIXME: use auto-grid

In [8]:
mash_init = mmbr:::MashInitializer$new(prior_mats, scaling, alpha = 1)

In [None]:
mash_init$precompute_cov_matrices(data, residual_covar)

The line above currently takes 3m40s. **It is 2.5GB on disk in RDS format**. This is to compute for $R = 49, J = 7962, P = 21$. **$P$ is 21 for null weight plus at most 20 other components**. I saved it to disk,

In [None]:
saveRDS(mash_init, 'mash_init.rds')

```
-rw-r--r-- 1 gaow gaow  2.5G May 12 07:41 mash_init.rds
```

And test the memory it takes to keep it -- 7.42GB.

```bash
python ~/GIT/github/misc/monitor/monitor.py Rscript -e "mash_init = readRDS('mash_init.rds')"
```

```
time elapsed: 25.18s
peak first occurred: 15.40s
peak last occurred: 24.65s
max vms_memory: 7.42GB
max rss_memory: 7.23GB
memory check interval: 1s
return code: 0
```

In [8]:
mash_init = readRDS('mash_init.rds')

In [10]:
mmbr_obj = mmbr:::MashMultipleRegression$new(ncol(raw_data$X), residual_covar, mash_init)

In [None]:
mmbr_obj$fit(data)

The step above now only take 2 min.

In [None]:
saveRDS(mmbr_obj, 'mmbr_obj.rds')