# Create prior mixture for simulation studies

This notebook contains scripts to create mixture prior for use with simulations in DSC.

## Artificial structure

Here I'm going to make for 50 conditions patterns of covariate with 
1. canonical
2. paired sharing eg a pair of tissues
3. block sharing eg brain

### With 50 conditions

Canonical patterns of sharing:

In [1]:
R=50
prior = mmbr:::create_cov_canonical(R)

Paired sharing:

In [2]:
paired = matrix(0,R,R)
paired[1:2,1:2] = 1
prior[['paired_1']] = paired

Block sharing:

In [3]:
block = matrix(0,R,R)
block[1:R/2, 1:R/2] = 1
block[(R/2+1):R, (R/2+1):R] = 1
prior[['blocked_1']] = block

In [4]:
names(prior)

Now assign some weights:

1. singleton total 35%
    - singleton_1 has 10% (to be picked up later by ED methods)
    - singleton_2 to singleton_26 has 25% (1% each )
2. shared total 25%
3. paired 20% (hopefully picked up by ED methods)
4. blocked 20% (hopefully picked up by ED methods)

In [5]:
w = c(0.1, rep(0.01,25), rep(0, 24), rep(0.25/5,5), 0.2, 0.2)

In [6]:
sum(w)

In [7]:
prior = prior[which(w>0)]

In [8]:
names(prior)

In [9]:
artificial_mixture_50 = list(U=prior,w=w)

### With 6 conditions

In [10]:
R=6
prior = mmbr:::create_cov_canonical(R)
paired = matrix(0,R,R)
paired[1:2,1:2] = 1
prior[['paired_1']] = paired
block = matrix(0,R,R)
block[1:R/2, 1:R/2] = 1
block[(R/2+1):R, (R/2+1):R] = 1
prior[['blocked_1']] = block

In [11]:
names(prior)

Now assign some weights:

1. singleton total 30%
    - singleton_1, singleton_3, singleton_5 each has 10%
2. shared total 30%
3. paired 20%
4. blocked 20%

In [12]:
w = c(0.1, 0, 0.1, 0, 0.1, 0, rep(0.3/5, 5), 0.2, 0.2)

In [13]:
sum(w)

In [14]:
prior = prior[which(w>0)]
names(prior)
artificial_mixture_6 = list(U=prior,w=w)

## Mixture from GTEx

Using [this workflow](20200502_Prepare_ED_prior.html),

```
sos run ~/GIT/gtexresults/workflows/mashr_flashr_workflow.ipynb flash \
    --cwd /project2/compbio/GTEx_eQTL/mashr_flashr_workflow_output \
    --data /project2/compbio/GTEx_eQTL/mashr_flashr_workflow_output/FastQTLSumStats.mash.rds \
    --effect-model EE -c midway2.yml -q midway2
sos run analysis/20200502_Prepare_ED_prior.ipynb ed \
    --cwd /project2/compbio/GTEx_eQTL/mashr_flashr_workflow_output \
    --model FastQTLSumStats.mash -c midway2.yml -q midway2
```

In [15]:
prior = readRDS('../data/FastQTLSumStats.mash.FL_PC3.ED.rds')

In [16]:
names(prior$U)

In [17]:
length(prior$U)

Most weighths are in `tFLASH` and `XX`,

In [18]:
names(prior$U)[prior$w>0.1]

But many other weights are also non-trivial

In [19]:
names(prior$U)[prior$w>0.001]

In [20]:
tol = 1E-12
U = prior$U[which(prior$w>tol)]
w = prior$w[which(prior$w>tol)]
names(U)
gtex_mixture = list(U=U,w=w)

In [21]:
saveRDS(list(gtex_mixture=gtex_mixture, artificial_mixture_50=artificial_mixture_50, artificial_mixture_6=artificial_mixture_6), '../data/prior_simulation.rds')

## Using weights learned from simulated data

Workflow see [this notebook](https://gaow.github.io/mvarbvs/analysis/20200502_Prepare_ED_prior.html). 

In [40]:
dat = readRDS('../data/prior_simulation.rds')

In [41]:
names(dat$gtex_mixture)

Load ED mixture,

In [42]:
gtex_ed_mixture = readRDS('~/tmp/07-May-2020/gtex_mixture_identity.FLASH_PC3.ED.rds')

In [43]:
names(gtex_ed_mixture)

In [44]:
artificial_ed_mixture_50 = readRDS('~/tmp/07-May-2020/artificial_mixture_identity.FLASH_PC3.ED.rds')

In [45]:
names(artificial_ed_mixture_50)

Filter the components by weight cutoff `1E-12`,

In [46]:
tol = 1E-12
artificial_ed_mixture_50$U = artificial_ed_mixture_50$U[which(artificial_ed_mixture_50$w > tol)]
artificial_ed_mixture_50$w = artificial_ed_mixture_50$w[which(artificial_ed_mixture_50$w > tol)]
gtex_ed_mixture$U = gtex_ed_mixture$U[which(gtex_ed_mixture$w > tol)]
gtex_ed_mixture$w = gtex_ed_mixture$w[which(gtex_ed_mixture$w > tol)]

In [47]:
names(artificial_ed_mixture_50$U)

In [48]:
artificial_ed_mixture_50$w

In [49]:
names(gtex_ed_mixture$U)

In [50]:
gtex_ed_mixture$w

In [51]:
dat$artificial_mixture_50$ED = artificial_ed_mixture_50
dat$gtex_mixture$ED = gtex_ed_mixture

In [52]:
saveRDS(dat, '../data/prior_simulation.rds')