# Example Notebook: Fibroblast Gene Regulatory Network Inference

In [1]:
from src.GRNinference import inferGRN, crossvalidateGRN

Specify data and library arguments:
- `path_to_data`: address of CSV file containing gene expression data (formatted as genes x samples)
- `lib_dir`: address of directory containing sub-directories for all TF-target databases (included in repository as `data`)
- `lib_name`: string specifying the desired library to use for inference and refinement
  - Here, the [CHEA](https://pubmed.ncbi.nlm.nih.gov/20709693/) database of transcription factor targets is used

In [2]:
path_to_data = "data\\expression\\GSE133529_ProcessedDataFile.csv.gz"
lib_dir = "data\\"
lib_name = "CHEA"

### Run a single network inference:  
Here we'll point the function to the required arguments above and keep all other arguments as defaults. Access to a Dask dashboard will be provided during computation as a link printed to the workspace.

In [3]:
grn = inferGRN(path_to_data, lib_dir, lib_name)

http://127.0.0.1:8787/status
(211608, 3)
(997, 3)
(75, 4)


In [4]:
grn.head()

Unnamed: 0,index,TF,target,importance
803,136,AHR,SERPINE1,16.596548
550,91,TEAD2,CCND1,13.683494
529,112,SRF,ETV4,13.114272
260,201,CREB1,BCL6,9.740623
322,9,SMAD3,ATF3,8.491685


### Run k-fold cross validation for the inference pipeline:
Using the `crossvalidateGRN` function, we can specify the number of folds `k` and train multiple networks from training sets. After inference and refinement, each network will be concatenated into a single pandas dataframe with the fold specified as a separate column. We can additionally specify a save directory to output each network as a CSV file, along with the samples used for each training/testing set.  

In this case, we have 16 samples of bulk RNA-seq data, and so we'll use 8-fold cross validation.

In [5]:
k = 8
path_to_save = "data\\networks\\"

grn_all = crossvalidateGRN(path_to_data, lib_dir, lib_name, k, savedir=path_to_save)

http://127.0.0.1:8787/status
(203732, 3)
(961, 3)
(57, 4)
(205472, 3)
(1000, 3)
(80, 4)
(199778, 3)
(965, 3)
(87, 4)
(199457, 3)
(936, 3)
(45, 4)
(203652, 3)
(980, 3)
(91, 4)
(195636, 3)
(939, 3)
(61, 4)
(198941, 3)
(891, 3)
(62, 4)
(202262, 3)
(969, 3)
(76, 4)


In [6]:
grn_all.head()

Unnamed: 0,index,TF,target,importance
813,100,JUN,MMP14,29.487101
493,111,SRF,ETV4,25.413927
814,8,NFATC1,MMP14,15.443401
815,126,LEF1,MMP14,13.595153
516,91,TEAD2,ETV4,11.059693
