tRNet

Training and interpretation of tRNet convolutional neural network predicting tRNA gene activity from ChIP-seq peaks.

Overview

tRNet is a convolutional neural network (CNN) architecturally based on BPNet (GitHub, paper. The model is aimed at specifically predicting the transcriptional activity of tRNA genes (transcribed by RNA Polymerase III) by using ChIP-seq data from the TFIIIB subunit, BRF1, to determine the activity status of predicted tRNA genes in different cellular contexts. The architecture and overview of the model and training are shown below.

This final model is trained using transfer learning from a simpler binary classification model trained to predict housekeeping and inactive tRNA genes from 200bp upstream sequences (see below).

This repository contains the Jupyter notebooks to recreate the analysis in our manuscript, which includes:

tRNet_model_training.ipynb: Model building and training.
tRNet_SHAP_tfmodisco_interpretation.ipynb: Model interpretation using SHAP contribution scores to predict motifs with TF-modicso.

Additionally, data/ contains fasta files with 200bp tRNA upstream sequences for housekeeping, repressed and inactive tRNAs, which are required as input for tRNet model training and validation. These classes were defined using significant BRF1 peaks called by MACS2 overlapping tRNA genes in human induced pluripotent stem cells (hiPSC), neural progenitors (NPC), neurons and cardiomyocyte (CM) cells. Housekeeping tRNAs have peaks in all 4 cells, repressed tRNAs are inactivated during differentiation, while inactive tRNAs are never bound by BRF1 and do not produce mature tRNA in any cell type.

The model can conceivably be used on any ChIP-seq experiment for any transcription factor (TF) in multiple conditions/treatments/cell-contexts in order to predict some pre-defined activity or binding of that TF to different genomic loci, and further, to analyse the motifs influencing this differential occupancy.

Usage

Please note that the following package dependencies and versions are required to run the Jupyter notebooks. These should be installed by running the notebooks, but any errors in running especially the interpretation notebook could be due to dependency version incompatibilities.

tensorflow==1.15.5
keras==2.2.4
deeplift==0.6.12.0
shap==0.29.3
modisco==0.5.14.1

To recreate the analysis in the manuscript, please clone the GitHub repository (clone git@github.com:drewjbeh/tRNet.git) and run the training and interpretation notebooks. The results from this notebook are already available in this repository:

kFoldCV_saved_models: Model h5 files from the K-fold cross-validation model training.
model: Binary classification model and final multi-task model architecture and weights.
SHAP_scores: SHAP DeepExplainer contribution scores for the multi-task model.
tf_modisco: TF-modisco pattern and motif searching results using SHAP contribution scores. Included are scores for each tRNA class in hd5 and PDF outputs of actual contribution scores for each task, metacluster, pattern and activity pattern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tRNet

Overview

Usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
SHAP_scores		SHAP_scores
data		data
docs/img		docs/img
kFoldCV_saved_models		kFoldCV_saved_models
model		model
tf_modisco		tf_modisco
LICENSE		LICENSE
README.md		README.md
tRNet_SHAP_tfmodisco_interpretation.ipynb		tRNet_SHAP_tfmodisco_interpretation.ipynb
tRNet_model_training.ipynb		tRNet_model_training.ipynb

License

nedialkova-lab/tRNet

Folders and files

Latest commit

History

Repository files navigation

tRNet

Overview

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages