Skip to content

slrvv/CENTRE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CENTRE - Short description

CENTRE is a machine learning framework that predicts enhancer target interactions in a cell-type-specific manner, using only gene expression and ChIP-seq data for three histone modifications for the cell type of interest. CENTRE uses various available datasets and extracts cell-type agnostic statistics to complement the cell-type specific information.

This repository holds the development version of CENTRE. For the paper version, go to https://github.com/slrvv/CENTRE_paper_version

title

Contact

Citation

Rapakoulia, T., Lopez Ruiz De Vargas, S., Omgba, P. A., Laupert, V., Ulitsky, I., & Vingron, M. (2023). CENTRE: A gradient boosting algorithm for Cell-type-specific ENhancer-Target pREdiction. Bioinformatics, 39(11), btad687. https://doi.org/10.1093/bioinformatics/btad687

Requirements

  • R (tested 4.0.0)
  • crupR
  • GenomicRanges and IRanges
  • metapod
  • RSQLite
  • xgboost

User Provided Data

CENTRE computes its features for classification based on user provided Histone ChIP-seq (H3K27ac, H3K4me3 and H3K4me1 ) and RNA-seq data for the cell-type of inetrest. As a dataframe with the genes of interest or the genes and enhancer pairs of interest.

User data :

  • Cell-type specific histone ChIP-seq in BAM format for H3K27ac, H3K4me3 and H3K4me1. Additionally, a Control ChIP-seq experiment to match the HM ChIP-seq is strongly advised but CENTRE can also run without it.
  • Cell -type specific RNA-seq TPM values for all genes. This dataframe will have three columns one with the ENSEMBL ID's, transcript ID's and one with the TPM values for all genes.
  • A dataframe with either the GENCODE ID's for the genes of interest or enhancer (cCREs-ELS) target (GENCODE ID's) pairs of interest.

CENTRE Generic Information

CENTRE uses precomputed datasets that the user needs to download either by using the CENTRE::downloadPredcomputedData() or downloading the data from http://owww.molgen.mpg.de/~CENTRE_data/PrecomputedData.db and adding it to the /inst/extdata folder.

PrecomputedData.db is a database containing precomputed Wilcoxon rank sum tests on the following data sets:

  • CAGE-seq dataset (Andersson et al.,2014)
  • DNAse hypersensitivity dataset (Thurman et al.,2012)
  • DNAse-seq gene expression dataset (Sheffield et al.,2013)
  • CRUP-EP gene expression dataset
  • Pearson Correlation between CRUP-EP(Enhancer Probability) and CRUP-PP (Promoter Probability) across 104 cell types

Clarification on select chromosome normalization

The function CENTRE::computeCellTypeFeatures() has a parameter called chr with which the user can provide a subset of chromosomes for which to quantile normalize. This can be used to make the normalization step faster but could change the outcome of the function and following predictions. For CENTRE to run as expected the user should normalize for all chromosomes and provide genome-wide ChIP-seq data.

Installing CENTRE

#Install the development version of crupR
if (!require("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_git("https://github.com/akbariomgba/crupR.git")
#Say yes to installing all required dependencies

#Install the development version of CENTRE
devtools::install_git("https://github.com/slrvv/CENTRE.git")
#Say yes to installing all required dependencies

Note: If the installation of any of the dependencies of CENTRE fails try running the script CENTRE/install/install_CENTRE.R

References

  • Andersson,R. et al. (2014) An atlas of active enhancers across human cell types and tissues. Nature, 507, 455–461.
  • Thurman,R.E. et al. (2012) The accessible chromatin landscape of the human genome. Nature, 489, 75–82.
  • Sheffield,N.C. et al. (2013) Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res., 23, 777–788.