Skip to content
Loic Mangnier edited this page Feb 17, 2022 · 15 revisions

Preliminary Steps

To build CRHs, you need to use the Activity-by-contact Score (Fulco et al. 2019). Please refer to: https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction to have deeper details on how to use the ABC-Score.

Biological Justification

Before introducing the code used to create CRHs, we are giving some formal definitions about what we consider as CRHs and their relevance in disease aetiology.
One of the main research area in genetics is to understand the biological mechanisms involved in human diseases. This remains a challenge since the majority of genetic variations (SNPs) are located within non-coding regions, making the interpretation of the underlying mechanisms difficult to interpret. Recent studies have proposed omni-genic models where auxiliary genes tend to interact with core genes at different levels possibly leading to diseases. To overcome this challenge we propose Cis-Regulatory Hubs (CRHs). Basically, CRHs are bipartite networks formed by the physical contacts between gene promoters and enhancers. Technically speaking, promoters and enhancers are nodes, while vertices represent the 3D contacts between each pair of nodes. CRHs were firstly proposed to capture direct and indirect connections between enhancers and promoters and their involvements in diseases.

Package dependencies

Before executing any code, please be sure that all necessary packages are installed either through CRAN repository or Bioconductor

library(igraph)
library(GenomicRanges)
library(rtracklayer)
library(org.Hs.eg.db)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(ensembldb)
library(EnsDb.Hsapiens.v86)

CRH construction and descriptive metrics

Since you obtained the output file from the ABC-Score, you should use process.ABC from code_CRHs script. Basically, the function adds new columns (typeOf, startProm, endProm) and removes X chromosome contacts, since we focus on autosomal chromosomes in our analysis. The data EnhancerPredictions_*.txt from the repo can be directly used here.
However for researcher who is interested in playing directly CRHs in iPSC-derived neurons or in the 3 post-mortem brain tissues, they are available directly in RDS format.

output_from_ABC = process.ABC("EnhancerPredictions_NEU_iPSC.txt")

Now we have file in the right format, we are able to create CRHs.

#If you need the pairs of promoters and distal elements in Pairs format 
#Useful when you want to compare CRHs directly with ABC-Score predictions or between different tissues
pairs = Pairs.from.ABC(output_from_ABC)

#CRHs can be built from the following code line
graph_from_ABC = create.CRHs.ABC(output_from_ABC)
#the return object is a igraph object with bipartite networks
#You can use all the functions provided by igraph to analyze graph_from_ABC

You need to add membership (the CRH), where the promoter or distal element belongs to analyze CRHs in the right way.

#Here we directly modify the data.frame, the column membership will be added
output_from_ABC = add.membership(graph_from_ABC, output_from_ABC)

CRH analysis

Components can be generated with igraph components function.

components = components(graph_from_ABC)
#returns:
#no : the number of clusters
#csize: the size of clusters
#membership: the cluster to which each element belongs

If you want to plot one specific CRH

decompose.ABC = decompose(graph_from_ABC)
plot(decompose.ABC[[1]], vertex.label=ifelse(names(V(decompose.ABC[[1]]))%in%unique(output_from_ABC$TargetGene), names(V(decompose.ABC[[1]])), NA))

example CRH

Now if you need the number of unique promoters or distal elements per CRHs.

#Number of unique promoters
N_prom_per_CRH = aggregate(TagetGene~membership, output_from_ABC , function(x) length(unique(x)))
#Number of unique distal elements
N_distal_per_CRH = aggregate(name~membership, output_from_ABC , function(x) length(unique(x)))

The mean number of contacts per promoter or distal element

#Mean number of contacts by promoter at CRH-level
N_connect_prom_per_CRH = aggregate(TargetGene~membership,output_from_ABC,function(x) mean(table(x)))
#Mean number of contacts by distal element at CRH-level
N_connect_dist_per_CRH = aggregate(name~membership,output_from_ABC, function(x) mean(table(x)))

In our paper we proposed to evaluate CRHs complexities through the prism of promoter and enhancer connections with 3 different metrics, monogamous relationships, 1-1-N connections and polygamous contacts, respectively.

Complexity.CRHs(graph_from_ABC, unique_enhancers, unique_genes, extract.central.genes=T)

CRHs and others 3D features

In order to compare CRHs with others well-known 3D features you need to have structures either in GRange format or directly in text-separated format.

CRHs and A/B Compartments

A/B compartments are high-order structures known to be associated with open-chromatin characteristics strongly defining active elements (A) and heterochromatin associated with inactive elements (B). In the master branch you will find taloggcc files representing A/B compartments in each of our cell-lines or tissues. The following code will give you the kind of compartment overlapped by CRHs, either AA, AB or BB.

load(taloggcc.RData)

AB_compartments = precompute_AB(taloggcc)
GRanges_CRHs = coverage.by.CRH(graph_ABC, output_from_ABC)$clusters

overlaps_CRHs_to_AB(GRanges_CRHs, AB_compartments)