# Example Notebook: Fibroblast Gene Regulatory Network Inference

In [None]:
from src.GRNinference import inferGRN, crossvalidateGRN

Specify data and library arguments:
- `path_to_data`: address of CSV file containing gene expression data (formatted as genes x samples)
- `lib_dir`: address of directory containing sub-directories for all TF-target databases (included in repository as `data`)
- `lib_name`: string specifying the desired library to use for inference and refinement
  - Here, the [CHEA](https://pubmed.ncbi.nlm.nih.gov/20709693/) database of transcription factor targets is used

In [None]:
path_to_data = "data\\expression\\GSE133529_ProcessedDataFile.csv.gz"
lib_dir = "data\\"
lib_name = "CHEA"

### Run a single network inference:  
Here we'll point the function to the required arguments above and keep all other arguments as defaults. Access to a Dask dashboard will be provided during computation as a link printed to the workspace.

In [None]:
grn = inferGRN(path_to_data, lib_dir, lib_name)
grn

### Run k-fold cross validation for the inference pipeline:
Using the `crossvalidateGRN` function, we can specify the number of folds `k` and train multiple networks from training sets. After inference and refinement, each network will be concatenated into a single pandas dataframe with the fold specified as a separate column. We can additionally specify a save directory to output each network as a CSV file, along with the samples used for each training/testing set.  

In this case, we have 16 samples of bulk RNA-seq data, and so we'll use 8-fold cross validation.

In [None]:
k = 8
path_to_save = "data\\networks\\"

grn_all = crossvalidateGRN(path_to_data, lib_dir, lib_name, k, savedir=path_to_save)
grn_all