# TIGER: Transcriptional Inference using Gene Expression and Regulatory data
Chen Chen<sup>1</sup>

<sup>1</sup> Department of Epidemiology and Biostatistics, University of Arizona Mel and Enid Zuckerman College of Public Health, Tucson, AZ, USA

# Introduction
The goal of TIGER<sup>1</sup> is to estimate gene regulatory network and transcription factor 
activities using Bayesian matrix factorization.        
![](TIGER.png)

## Installation

TIGER relies on [cmdstanr](https://mc-stan.org/cmdstanr/) for Beyesian Inference. 
You can install the latest beta release of the cmdstanr R package with


``` R
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
```

Then, you can use cmdstanr to install [CmdStan](https://mc-stan.org/users/interfaces/cmdstan.html), the shell interface to [Stan](https://mc-stan.org/) with


```R
install_cmdstan()
```

These two steps are usually enough if your C++ toolchain is set up properly. For example, use RTools 4.0 toolchain which contains a g++ 8 compiler and mingw32-make on Windows platform. If you see problems with installation, you can go to cmdstanr [installation](https://mc-stan.org/cmdstanr/articles/cmdstanr.html) for more information.     

After cmdstan is correctly installed, you can install the development version of TIGER with:


``` R
devtools::install_github("cchen22/TIGER")
```

We also need to install DoRothEA R package from [Bioconductor](https://bioconductor.org/packages/release/data/experiment/html/dorothea.html)

```R
BiocManager::install("dorothea")
```

# Quick start

This is a simple example of TIGER on a small dataset. TIGER requires two inputs:      
1. a normalized expression matrix with rows as genes and column as samples;       
2. a prior network with rows as TFs and column as genes. The network is signed and binarized (e.g., -1,0,1).       


First, we load the library

In [None]:
library(TIGER)

then, we load the data

In [None]:
expr = TIGER::expr
prior = TIGER::prior

We can run TIGER with default parameters as follows

In [None]:
ss = TIGER(expr,prior)

We can print the TFA score in first three samples aas follows

In [None]:
tgres = ss$Z
tgres[,1:3]

# Wokring with DoRothEA prior
TIGER provides some convenient functions to work with DoRothEA<sup>2</sup> prior database. 

DoRothEA provides regulons for two species - human and mouse. For example, if we have a human cancer expression matrix and want to estimate the TFA in each cancer sample, then we can use the following code to prepare the prior network. First, we start by loading dorothea pancancer database:

In [None]:
df = dorothea::dorothea_hs_pancancer

Then convert it to TIGER prior format (e.g., adjacency matrix) 

In [None]:
prior = el2adj(df[,-2])

# References

1- Chen, Chen, and Megha Padi. "Joint inference of transcription factor activity and context-specific regulatory networks." bioRxiv (2022).

2- Garcia-Alonso, Luz, et al. "Benchmark and integration of resources for the estimation of human transcription factor activities." Genome research 29.8 (2019): 1363-1375.