## mvSuSiE benchmark updated

During the past few months we have implemented a few fixes with input from Yuxin who performed mvSuSiE analysis in GWAS context and ironed out some corner cases. Also progress from `udr` package offers us better estimate for mixture prior. We now rerun all the benchmark previously developed and look at updated results.

## Benchmark execution

Under the `dsc/mnm_prototype` directory,

```
sos run 20210224_MNM_Benchmark simulation_only
sos run 20210224_MNM_Benchmark extract_sumstats
sos run 20210224_MNM_Benchmark mixture_model
sos run 20210224_MNM_Benchmark mvSuSiE
sos run 20210224_MNM_Benchmark mthess
```


## Workflow implementations

In [None]:
[simulation_only]
script: interpreter= 'qsub', expand = True
#!/bin/bash

#SBATCH --time=36:00:00
#SBATCH --partition=broadwl
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=16000
#SBATCH --job-name={step_name}
#SBATCH --mail-type=BEGIN,END,FAIL

module load R
./gtex_qtl.dsc --host dsc_mnm.yml --target simulate_only --n_dataset 25000 -o mnm_sumstats -s existing -e ignore &> mnm_sumstats.log

In [None]:
#[extract_sumstats_1, mixture_model_1]
#download: dest_file = 'mixture_prior.ipynb'
#    https://raw.githubusercontent.com/cumc/bioworkflows/master/multivariate-fine-mapping/mixture_prior.ipynb

[extract_sumstats_2]
def get_cmd(m):
    return f'''
    cd {m} && ls *.rds | sed 's/\.rds//g' > analysis_units.txt && cd -
    sos run mixture_prior.ipynb extract_effects \
        --analysis-units {m}/analysis_units.txt \
        --datadir {m} --name {m:b} \
        -c ../../midway2.yml -q midway2 -s build &> extract_sumstats_{m:b}.log
    '''
cmds = [get_cmd(path(m)) for m in ["mnm_sumstats/artificial_mixture_identity", "mnm_sumstats/gtex_mixture_identity"]]
input: for_each = 'cmds'
script: interpreter= 'qsub', expand = True
#!/bin/bash
  
#SBATCH --time=36:00:00
#SBATCH --partition=broadwl
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=2000
#SBATCH --job-name={step_name}
#SBATCH --mail-type=BEGIN,END,FAIL

module load R
{_cmds}

In [None]:
[mixture_model_2]
def get_cmd(m):
    c1 = f'''
    sos run mixture_prior.ipynb ud --name {m} \
        -c ../../midway2.yml -q midway2 -s build &> ed_{m}.log
    '''
    c2 = f'''
    sos run mixture_prior.ipynb ud --ud-method teem --name {m} \
        -c ../../midway2.yml -q midway2 -s build &> teem_{m}.log
    '''
    c3 = f'''
    sos run mixture_prior.ipynb ed --name {m} \
        -c ../../midway2.yml -q midway2 -s build &> bovy_{m}.log
    '''
    return [c1,c2,c3]
cmds = sum([get_cmd(m) for m in ["artificial_mixture_identity", "gtex_mixture_identity"]], [])
input: for_each = 'cmds'
script: interpreter= 'qsub', expand = True
#!/bin/bash
  
#SBATCH --time=36:00:00
#SBATCH --partition=broadwl
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=2000
#SBATCH --job-name={step_name}
#SBATCH --mail-type=BEGIN,END,FAIL

module load R
{_cmds}

In [None]:
[mvSuSiE]
script: interpreter= 'qsub', expand = True
#!/bin/bash
  
#SBATCH --time=36:00:00
#SBATCH --partition=broadwl
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=2000
#SBATCH --job-name={step_name}
#SBATCH --mail-type=BEGIN,END,FAIL

module load R
./fixed_mix.dsc --host dsc_mnm.yml -o mnm_20210228 -s existing -e ignore &> mnm_20210228.log

In [None]:
[mthess]
script: interpreter= 'qsub', expand = True
#!/bin/bash
  
#SBATCH --time=36:00:00
#SBATCH --partition=broadwl
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=2000
#SBATCH --job-name={step_name}
#SBATCH --mail-type=BEGIN,END,FAIL

module load R
./fixed_mix.dsc --host dsc_mnm.yml --target mthess -o mthess_20210228 -s existing -e ignore --n_dataset 200 &> mthess_20210228.log