# Infer a general regulator-regulator network on specified regions.

The ``infer_regulator_network`` command uses the pre-trained ChromBERT model to infer regulator–regulator co-association relationships on user-specified genomic regions.

**Note**: The remaining examples will only show the direct command usage. 

If you need to use Singularity container, please refer to the [`singularity_use.ipynb`](./singularity_use.ipynb) tutorial for detailed instructions on using `singularity exec` with `chrombert-tools`.


In [2]:
import pandas as pd
import numpy as np
import os
workdir="/mnt/Storage2/home/chenqianqian/projects/chrombert_tools/2.test/pull/ChromBERT-tools/examples/cli" # your workdir
os.chdir(workdir)

In [3]:
!chrombert-tools -h

Usage: chrombert-tools [OPTIONS] COMMAND [ARGS]...

  Type -h or --help after any subcommand for more information.

Options:
  -v, --verbose  Verbose logging
  -d, --debug    Post mortem debugging
  -V, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  embed_cell_cistrome             Extract cell-specific cistrome...
  embed_cell_gene                 Extract cell-specific gene embeddings
  embed_cell_region               Extract cell-specific region embeddings
  embed_cell_regulator            Extract cell-specific regulator...
  embed_cistrome                  Extract general cistrome embeddings on...
  embed_gene                      Extract general gene embeddings
  embed_region                    Extract general region embeddings
  embed_regulator                 Extract general regulator embeddings on...
  find_context_specific_cofactor  Find context-specific cofactors in...
  find_driver_in_transition       Find driver factors in cell

In [4]:
!chrombert-tools infer_regulator_network -h

Usage: chrombert-tools infer_regulator_network [OPTIONS]

  Infer general regulator-regulator network

Options:
  --region FILE                   Region BED file (focus regions).  [required]
  --regulator TEXT                Optional. Regulators to plot subnetworks,
                                  e.g. EZH2;BRD4;CTCF. Use ';' to separate.
  --odir DIRECTORY                Output directory.  [default: ./output]
  --genome [hg38|mm10]            Genome.  [default: hg38]
  --resolution [1kb|200bp|2kb|4kb]
                                  Resolution.  [default: 1kb]
  --chrombert-cache-dir DIRECTORY
                                  ChromBERT cache dir (contains config/ and
                                  checkpoint/ etc).  [default:
                                  ~/.cache/chrombert/data]
  --batch-size INTEGER            Batch size for region dataloader.  [default:
                                  64]
  --num-workers INTEGER           Number of dataloader workers.  [default: 8]
 

In [5]:
!chrombert-tools infer_regulator_network \
    --region "../data/CTCF_ENCFF664UGR_sample100.bed" \
    --regulator "ctcf;nanog;ezh2" \
    --odir "./output_trn" \
    --genome "hg38" \
    --resolution "1kb"

Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, We keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region),non-overlapping: 0
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 3, matched in ChromBERT: 3, not found: 0, not found regulator: []
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
use organisim hg38; max sequence length is 6391
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00,  1.01s/it]
Total graph nodes: 951
Total graph edges (threshold=0.636): 11503
Regulator subnetwork saved to: ./output_trn/subnetwork_nanog_k1_q0.980_thr0.636.pdf
Regulator subnetwork saved to: ./output_trn/sub

In [6]:
# regulator_cosine_similarity.tsv: cosine similarity matrix of regulators on this region
# total_graph_edge_*.tsv: edges in the regulatory network where cosine similarity >= threshold
# subnetwork_*.pdf: subnetworks for specified regulators

regulator_cosine_similarity = pd.read_csv("./output_trn/regulator_cosine_similarity.tsv", sep="\t",index_col=0)
total_graph_edge = pd.read_csv("./output_trn/total_graph_edge_threshold0.636_quantile0.980.tsv", sep="\t")



In [7]:
regulator_cosine_similarity

Unnamed: 0,5hmc,adnp,aebp2,aff1,aff4,ago1,ago2,ahr,ahrr,alkbh3,...,zscan20,zscan22,zscan23,zscan29,zscan31,zscan5a,zta,zxdb,zxdc,zzz3
5hmc,1.000000,0.161553,0.285241,0.158628,0.117248,0.127353,0.164635,0.140008,0.140390,0.256362,...,0.343546,0.136590,0.344879,0.193269,0.168963,0.255532,0.340011,0.150076,0.061059,0.330447
adnp,0.161553,1.000000,0.587140,0.387827,0.471895,0.130505,0.207243,0.277108,0.308542,0.250292,...,0.399306,0.333286,0.455049,0.514076,0.365677,0.465939,0.225964,0.436089,0.300675,0.241342
aebp2,0.285241,0.587140,1.000000,0.308597,0.402976,0.124346,0.206790,0.248920,0.429926,0.295569,...,0.407240,0.224415,0.319738,0.286058,0.308937,0.247846,0.316289,0.215994,0.166821,0.273573
aff1,0.158628,0.387827,0.308597,1.000000,0.681266,0.235524,0.285841,0.336590,0.390974,0.265273,...,0.386461,0.306672,0.318689,0.370916,0.413583,0.343913,0.262005,0.297290,0.231193,0.262453
aff4,0.117248,0.471895,0.402976,0.681266,1.000000,0.253977,0.326415,0.329043,0.368464,0.319714,...,0.380179,0.447794,0.403113,0.396646,0.423483,0.385116,0.332274,0.390634,0.394011,0.287089
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
zscan5a,0.255532,0.465939,0.247846,0.343913,0.385116,0.259383,0.276472,0.326592,0.212140,0.211508,...,0.757539,0.434079,0.870642,0.482686,0.472391,1.000000,0.179671,0.647394,0.424395,0.084673
zta,0.340011,0.225964,0.316289,0.262005,0.332274,0.036716,0.130013,0.184514,0.272651,0.305709,...,0.270659,0.260485,0.195705,0.184984,0.326114,0.179671,1.000000,0.087392,0.076405,0.419953
zxdb,0.150076,0.436089,0.215994,0.297290,0.390634,0.309995,0.294769,0.320665,0.209873,0.151475,...,0.573041,0.541771,0.619564,0.468207,0.393382,0.647394,0.087392,1.000000,0.497639,0.033969
zxdc,0.061059,0.300675,0.166821,0.231193,0.394011,0.333499,0.287851,0.590078,0.343713,0.153941,...,0.304457,0.302000,0.406280,0.343347,0.191912,0.424395,0.076405,0.497639,1.000000,0.017482


In [8]:
# threshold: 99th percentile of cosine similarity values from the upper triangle (excludes diagonal)
i_upper = np.triu_indices(len(regulator_cosine_similarity), k=1)
threshold = np.quantile(regulator_cosine_similarity.values[i_upper], 0.98)
threshold

0.6364899090145987

In [9]:
# total_graph_edge_*.tsv: edges in the regulatory network where cosine similarity >= threshold
total_graph_edge = pd.read_csv("./output_trn/total_graph_edge_threshold0.636_quantile0.980.tsv", sep="\t")
total_graph_edge

Unnamed: 0,node1,node2,cosine_similarity
0,5hmc,brdu,0.701982
1,5hmc,rloop,0.756476
2,5hmc,sirt1,0.664322
3,5hmc,znf823,0.641759
4,adnp,atf5,0.710570
...,...,...,...
11498,zscan20,zscan23,0.739037
11499,zscan20,zscan5a,0.757539
11500,zscan22,zscan31,0.712420
11501,zscan23,zscan5a,0.870642


In [10]:
total_graph_edge.query("node1 == 'nanog' or node2 == 'nanog'")

Unnamed: 0,node1,node2,cosine_similarity
1696,ctnnb1,nanog,0.69069
2374,eomes,nanog,0.758861
2415,ep300,nanog,0.678968
5293,nanog,pou5f1,0.755285
5294,nanog,smad2,0.660624
5295,nanog,sox2,0.749701
5296,nanog,tal1,0.636584
5297,nanog,tbxt,0.686592


In [11]:
nanog_subnetwork = regulator_cosine_similarity[['nanog']].query("nanog > 0.636")
nanog_subnetwork


Unnamed: 0,nanog
ctnnb1,0.69069
eomes,0.758861
ep300,0.678968
nanog,1.0
pou5f1,0.755285
smad2,0.660624
sox2,0.749701
tal1,0.636584
tbxt,0.686592


In [12]:
ctcf_subnetwork = regulator_cosine_similarity[['ctcf']].query("ctcf > 0.636")
ctcf_subnetwork


Unnamed: 0,ctcf
ctcf,1.0
dnase,0.704624
kdm5b,0.658326
rad21,0.851742
smc1a,0.84497
smc3,0.856302
srf,0.656536
stag1,0.8654
sumo1,0.66374
trim22,0.642802


In [13]:
total_graph_edge.query("node1 == 'ctcf' or node2 == 'ctcf'")

Unnamed: 0,node1,node2,cosine_similarity
1682,ctcf,dnase,0.704624
1683,ctcf,kdm5b,0.658326
1684,ctcf,rad21,0.851742
1685,ctcf,smc1a,0.84497
1686,ctcf,smc3,0.856302
1687,ctcf,srf,0.656536
1688,ctcf,stag1,0.8654
1689,ctcf,sumo1,0.66374
1690,ctcf,trim22,0.642802
1691,ctcf,zbtb2,0.898986
