# Inferring gene regulatory networks and transcription factor activities with GIRAFFE

The goal of GIRAFFE is to jointly estimate a gene regulatory network and transcription factor 
activities using a biologically informed matrix factorization.        
![giraffe](./giraffe_overview.png)


## Quick Start

This is a simple introductory example of GIRAFFE on a small, artificially generated, dataset. GIRAFFE requires three inputs:      
1. a normalized expression matrix with rows as genes and column as samples;       
2. a prior network with rows genes and columns as TFs. We often use a binary motif-based prior network, but alternatives are possible.
3. a symmetric protein-protein interaction network, with rows and columns as TFs. 

In [1]:
import giraffe
import numpy as np

# Generate toy data
G = 100 # Genes
TF = 20 # Transcription factors (proteins)
n = 10 # Samples (e.g. individuals)

# expression of size (G, n); prior of size (G, TF); PPI of size (TF, TF)
expression = np.random.random((G, n))
prior = np.random.randint(0, 2, size = (G, TF))
ppi = np.random.randint(0, 2, size = (TF, TF))
ppi ^= ppi.T
np.fill_diagonal(ppi, 1)

# Run GIRAFFE
giraffe_model = giraffe.Giraffe(expression, prior, ppi)

R_hat = giraffe_model.get_regulation() # Size (G, TF)
TFA_hat = giraffe_model.get_tfa() # Size (TF, n)

## Adjusting for variables of interest

Optionally, GIRAFFE can be customized to the requirements of the downstream application by
adjusting for variables of interest, such as confounders. 

In [2]:
import giraffe
import numpy as np

# Generate toy data
G = 100 # Genes
TF = 20 # Transcription factors (proteins)
n = 10 # Samples (e.g. individuals)

# expression of size (G, n); prior of size (G, TF); PPI of size (TF, TF)
expression = np.random.random((G, n))
prior = np.random.randint(0, 2, size = (G, TF))
ppi = np.random.randint(0, 2, size = (TF, TF))
ppi ^= ppi.T
np.fill_diagonal(ppi, 1)

# Variable we want to adjust for

possible_confounder = np.random.random(n)

# Run GIRAFFE
giraffe_model = giraffe.Giraffe(expression, prior, ppi, adjusting = possible_confounder)

R_hat = giraffe_model.get_regulation() # Size (G, TF)
TFA_hat = giraffe_model.get_tfa() # Size (TF, n)

## Promoting sparsity in the regulatory matrix

A second possibility to customize GIRAFFE is enforcing sparsity in the inferred regulatory network via l1-regularization. Stability selection <sup>1</sup> is a possibility to tune the strength of the l1-penalty. 

In [4]:
from grn_stability_selection import *

def giraffe_wrapper(expression, prior, ppi, regularization = 0):
    return giraffe.Giraffe(expression, prior, ppi, regularization = regularization).get_regulation()

stability = StabilitySelection(giraffe_wrapper, lambdas = np.linspace(1e-6, 1e-4, 10), K = 100, v = 30)
structure = stability(expression = expression, prior = prior, ppi = ppi)
R_giraffe = giraffe.Giraffe(expression, prior, ppi).get_regulation()
R_giraffe = np.where(structure, R_giraffe, 0)

## References

[1] Meinshausen, Nicolai, and Peter Bühlmann. "Stability selection." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72.4 (2010): 417-473.