# *pygenesig* a package for generating, validating and manipulating signatures
http://www.gregor-sturm.de/pygenesig/

![flow](img/information_flow.svg)

## Signature Generators
`SignatureGenerator` is an abstract class which can be easily extended to support different signature generation mechanisms
* Gini
* DEseq2
* Rank
* ...

## Signature Testers
-> Implementation of BioQC

## pygenesig for manipulating signatures
Load gmt files and compute the jaccard index. 

In [None]:
from pygenesig.file_formats import *
from pygenesig.tools import jaccard_mat
import seaborn as sns
from pylab import * 
%matplotlib inline

In [None]:
gtex_v3 = load_gmt("data/gtex_v3.gmt")
gtex_v6 = load_gmt("data/gtex_v6.gmt")
jm = jaccard_mat(gtex_v3, gtex_v6, as_matrix=True)

In [None]:
fig, ax = subplots(figsize=(15, 10))
sns.heatmap(jm, ax=ax, linewidths=.2)

# Signature Validation Pipeline

## Input data

In [None]:
exprs = read_expr('/pstore/home/sturmg/projects/gtex-signatures/data_processed/v3/exprs_processed.npy')
exprs

In [None]:
!head /pstore/home/sturmg/projects/gtex-signatures/data_processed/v3/target.csv

In [None]:
!head /pstore/home/sturmg/projects/gtex-signatures/data_processed/v3/rosetta_processed.csv

## Configuration file
**`config.py`**

In [None]:
!cat data/gtexv3_config.py

In [None]:
!validation crossvalidation data/gtexv3_config.py

## Inspect results

In [None]:
!ls results/gtexv3_config/

http://rkalbhpc014:8895/2017-03-30_signature_validation/results/gtexv3_config/

# Applying Signatures to GEO

* GMT-file, ExpressionSets -> BioQC -> Scores for each signatures

## What data? 
A subset of 81k samples/2900 studies from human, rat, mouse from GEO, which have 
* tissue and gene symbol annotated
* tissue in list of controlled vocabulary
* housekeeping genes expressed

Details at http://www.gregor-sturm.de/BioQC_GEO_analysis/sample-selection.html

## How to apply to a .gmt file?

In [None]:
!validation run_bioqc results/gtexv3_config/signatures.gmt

In [None]:
!validation merge_bioqc  results/gtexv3_config/signatures_bioqc

In [None]:
!ls results/gtexv3_config/

## Exploring BioQC results in Shiny App
Copy the result file to the shiny data directory: 

In [None]:
!cp results/gtexv3_config/signatures_bioqc.uniq.tsv \
/pstore/data/bioinfo/users/sturmg/gmt_geo_query/bioqc_res/gtexv3_presentation.uniq.tsv

### Server at: 
http://shiny-server.marathon.bahpc.roche.com:3015/users/sturmg/gmtquery/