# Create prior mixture for simulation studies

This notebook contains scripts to create mixture prior for use with simulations in DSC.

## Artificial structure

Here I'm going to make for 50 conditions patterns of covariate with 
1. canonical
2. paired sharing eg a pair of tissues
3. block sharing eg brain

Canonical patterns of sharing:

In [1]:
R=50
prior = mmbr:::create_cov_canonical(R)

Paired sharing:

In [2]:
paired = matrix(0,R,R)
paired[1:2,1:2] = 1
prior[['paired_1']] = paired

Block sharing:

In [3]:
block = matrix(0,R,R)
block[1:R/2, 1:R/2] = 1
block[(R/2+1):R, (R/2+1):R] = 1
prior[['blocked_1']] = block

In [4]:
names(prior)

Now assign some weights:

1. singleton total 35%
    - singleton_1 has 10% (to be picked up later by ED methods)
    - singleton_2 to singleton_26 has 25% (1% each )
2. shared total 25%
3. paired 20% (hopefully picked up by ED methods)
4. blocked 20% (hopefully picked up by ED methods)

In [5]:
w = c(0.1, rep(0.01,25), rep(0, 24), rep(0.25/5,5), 0.2, 0.2)

In [6]:
sum(w)

In [7]:
prior = prior[which(w>0)]

In [8]:
names(prior)

In [9]:
artificial_mixture_50 = list(U=prior,w=w)

## Mixture from GTEx

Using [this workflow](https://github.com/stephenslab/gtexresults/blob/master/workflows/gtex6_mash_analysis.ipynb),

In [10]:
prior = readRDS('../data/FastQTLSumStats.mash.FL_PC3.rds')

In [11]:
names(prior)[1:30]

In [12]:
names(prior)[31:79]

In [13]:
names(prior)[80:length(names(prior))]

Here I will remove the singleton components because in MASH paper we don't see weights on those components anyways after we include others.

In [14]:
prior = prior[-c(31:79)]

In [15]:
names(prior)

In [16]:
length(prior)

I'll assign them equal weights so each component has about 3%. 

In [17]:
w = rep(1/length(prior), length(prior))
gtex_mixture = list(U=prior,w=w)

In [18]:
sum(w)

In [19]:
saveRDS(list(gtex_mixture=gtex_mixture, artificial_mixture_50=artificial_mixture_50), '../data/prior_simulation.rds')

## Using weights learned from simulated data

Workflow see [this notebook](https://gaow.github.io/mvarbvs/analysis/20200502_Prepare_ED_prior.html). 

In [17]:
dat = readRDS('../data/prior_simulation.rds')

In [18]:
names(dat$gtex_mixture)

Load ED mixture,

In [30]:
gtex_ed_mixture = readRDS('~/tmp/07-May-2020/gtex_mixture_identity.FLASH_PC3.ED.rds')

In [31]:
names(gtex_ed_mixture)

In [32]:
artificial_ed_mixture_50 = readRDS('~/tmp/07-May-2020/artificial_mixture_identity.FLASH_PC3.ED.rds')

In [33]:
names(artificial_ed_mixture_50)

Filter the components by weight cutoff `0.00001`,

In [34]:
tol = 0.00001
artificial_ed_mixture_50$U = artificial_ed_mixture_50$U[which(artificial_ed_mixture_50$w > tol)]
artificial_ed_mixture_50$w = artificial_ed_mixture_50$w[which(artificial_ed_mixture_50$w > tol)]
gtex_ed_mixture$U = gtex_ed_mixture$U[which(gtex_ed_mixture$w > tol)]
gtex_ed_mixture$w = gtex_ed_mixture$w[which(gtex_ed_mixture$w > tol)]

In [35]:
names(artificial_ed_mixture_50$U)

In [36]:
artificial_ed_mixture_50$w

In [37]:
names(gtex_ed_mixture$U)

In [38]:
gtex_ed_mixture$w

In [39]:
dat$artificial_mixture_50$ED = artificial_ed_mixture_50
dat$gtex_mixture$ED = gtex_ed_mixture

In [40]:
saveRDS(dat, '../data/prior_simulation.rds')