# GTEx pipeline execution interface

## Preprocessing
See [this page](https://gaow.github.io/mvarbvs/doc/writeup/GTEx7_Analysis_Plan.html#Preprocessing) for details. The imputation step was done with [Michigan Imputation Server](https://imputationserver.sph.umich.edu) because it uses Haplotype Reference Consortium (32,914 samples) reference panel which is not publicly available otherwise. So the genotype input would be genotype after imputation. Here is how to [prepare data](https://imputationserver.sph.umich.edu/start.html#!pages/help) for this service.

In [2]:
%sossave prep.sos -f -x
#!/usr/bin/env sos-runner
#fileformat=SOS1.0

# Usage:
# ./prep.sos download
# ./prep.sos 

%include ResourceManagement as RM
%include Misc as MC
%include DataWrestling as DW

[global]

#
# Auxiliary steps
#

[download]
# Resource preparation
sos_run('RM.plink', workdir = CONFIG['wd'])
sos_run('RM.minimac3', workdir = CONFIG['wd'])
sos_run('RM.vcftools', workdir = CONFIG['wd'])

#
# Workhorse
#

[data_summary]
input: CONFIG['genotype']
sos_run("MC.genotype_stats", workdir = CONFIG['wd'])

[genotype_preprocessing]
input: CONFIG['genotype']
sos_run("DW.vcf_by_chrom", workdir = CONFIG['wd'])

[rna_preprocessing]
input: CONFIG['rna_rpkm'], CONFIG['rna_cnts'], CONFIG['genotype'], CONFIG['sample_attr'] 
sos_run("MC.rnaseq:1", workdir =  CONFIG['wd']) 

Workflow saved to prep.sos


### Prepare computational resource

In [None]:
!./prep.sos download -c conf/20170507.conf -b ~/Documents/GTEx/bin -J 8 -j 1

### Data summary

In [None]:
!./prep.sos data_summary -c conf/20170507.conf -b ~/Documents/GTEx/bin -J 8 -j 1

### Genotype QC / imputation

In [None]:
!./prep.sos genotype_preprocessing -c conf/20170507.conf -b ~/Documents/GTEx/bin -J 8 -j 1

[Here is configuration](https://gaow.github.io/mvarbvs/img/UMichImputation.png) of imputation job on UMich server.

### RNA-seq preprocessing

In [None]:
!./prep.sos rna_preprocessing -c conf/20170507.conf -b ~/Documents/GTEx/bin -J 8 -j 1