This repository contains python code to reproduce the causal networks inferred from differential gene expression in yeast as published in . The code relies on Findr which should be installed according to the instructions provided in the corresponding repository . Releases are archived on zenodo (doi:10.5281/zenodo.4340600).
The regulatory relationships between yeast genes as inferred using this pipeline (for tests P2, P2P3, P2P5 and P) and published in  are made made available as gzipped csv files in (
data/predicted_networks). The columns are in the following format:
regulator (name), target (name), weight (posterior probability).
The following files are needed and should be put in the
a. Expression data and genotypes from :
The data from the supporting information of  at [ https://figshare.com/s/83bddc1ddf3f97108ad4 ]. The following files are used:
|covariates for expression data||SI_Data_02_covariates.xlsx|
b. YEASTRACT ground truth data to compute precision and recall :
Regulation Matrices can be obtained from [ http://www.yeastract.com/formregmatrix.php ].
We retrieved the full ground-truth matrices containing all reported interactions of the following types from the YEASTRACT website: DNA binding evidence was used as the “Binding”, expression evidence including TFs acting as activators and those acting as inhibitors was used as the “Expression”, DNA binding and expression evidence was used as the “Binding & Expression”. Self regulation was removed from all ground truths. The matrices we retrieved are available as gzipped csv files in
c. Gene annotations from Ensembl :
We use a file listing all genes, pseudogenes, etc. from Ensemble release 83:
The file should be processed with sed commands given in
The result is a file where columns are separated by spaces, it contains gene name, start, end and a few more annotations.
Required python packages:
This pipeline has been tested in python version 3.7.4. The scripts requires Findr and the following packages:
- numpy, pandas,
- roman: to convert roman numerals to integers,
- matplotlib and seaborn.
Steps to run the analysis:
The scripts to run the analysis with Findr and to obtain binary causal networks for FDR thresholds given in . The scripts should be run in the order they are numbered, the shell script "run_all.sh" can run them all, however this may take a while. Therefore we recommend to run them in order:
The script in "subsamples" can be used to run Findr on randomly selected subsamples of the yeast data:
Ludl, A-A and Michoel, T (2020) Comparison between instrumental variable and mediation-based methods for reconstructing causal gene networks in yeast (accepted)
Findr paper: Wang, L and Michoel, T (2017) PLoS Comput Biol 13(8): e1005703.
Albert, F. W., Bloom, J. S., Siegel, J., Day, L., & Kruglyak, L. (2018). Genetics of trans-regulatory variation in gene expression. Elife, 7, e35471. doi:10.7554/eLife.35471
Yeastract regulatory network: http://www.yeastract.com/formregmatrix.php
YEASTRACT+: a portal for cross-species comparative genomics of transcription regulation in yeasts. Nucleic Acids Research, 48(D1):D642-D649 (doi:10.1093/nar/gkz859) P.T. Monteiro, J. Oliveira, P. Pais, M. Antunes, M. Palma, M. Cavalheiro, M. Galocha, C.P. Godinho, L.C. Martins, N. Bourbon, M.N. Mota, R.A. Ribeiro, R.Viana, I. Sá-Correia, M.C. Teixeira (2020)
Ensembl library for yeast (S. cerevisiae):
- Ensemble release 83 (gff3 file):
- Saccharomyces cerevisiae: http://www.ensembl.org/Saccharomyces_cerevisiae/Info/Index?db=core
- Ensembl Archives: http://www.ensembl.org/info/website/archives/index.html
- Ensemble release 83 (gff3 file):