# Infer cell-type-specific regulator-regulator networks

The ``infer_cell_regulator_network`` command fine-tunes ChromBERT on cell-type-specific accessibility data (BigWig + peaks) and then infers a cell-type-specific regulator-regulator network and key regulators. If a fine-tuned checkpoint is provided, fine-tuning is skipped and the TRN is inferred directly from the checkpoint.

**Note**: The remaining examples will only show the direct command usage. 

If you need to use Singularity container, please refer to the [`singularity_use.ipynb`](singularity_use.ipynb) tutorial for detailed instructions on using `singularity exec` with `chrombert-tools`.

## Example: 
infer key regulators and cell-type-specific regulator-regulator networks for myoblast

In [1]:
import pandas as pd
import numpy as np
import os
os.chdir("/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli")

In [2]:
os.environ["CUDA_VISIBLE_DEVICES"]='0'

In [3]:
!chrombert-tools -h

Usage: chrombert-tools [OPTIONS] COMMAND [ARGS]...

  Type -h or --help after any subcommand for more information.

Options:
  -v, --verbose  Verbose logging
  -d, --debug    Post mortem debugging
  -V, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  embed_cell_cistrome             Extract cell-specific cistrome...
  embed_cell_gene                 Extract cell-specific gene embeddings
  embed_cell_region               Extract cell-specific region embeddings
  embed_cell_regulator            Extract cell-specific regulator...
  embed_cistrome                  Extract general cistrome embeddings on...
  embed_gene                      Extract general gene embeddings
  embed_region                    Extract general region embeddings
  embed_regulator                 Extract general regulator embeddings on...
  find_context_specific_cofactor  Find context-specific cofactors in...
  find_driver_in_transition       Find driver factors in cell

In [4]:
!chrombert-tools infer_cell_key_regulator -h

Usage: chrombert-tools infer_cell_key_regulator [OPTIONS]

  Infer cell-specific key regulators

Options:
  --cell-type-bw FILE             Cell type accessibility BigWig file.
                                  [required]
  --cell-type-peak FILE           Cell type accessibility Peak BED file.
                                  [required]
  --ft-ckpt FILE                  Fine-tuned ChromBERT checkpoint. If
                                  provided, skip fine-tuning and use this
                                  ckpt.
  --genome [hg38|mm10]            Reference genome (hg38 or mm10).  [default:
                                  hg38]
  --resolution [200bp|1kb|2kb|4kb]
                                  ChromBERT resolution.  [default: 1kb]
  --odir DIRECTORY                Output directory.  [default: ./output]
  --mode [fast|full]              Fast: downsample regions to 20k for
                                  training; Full: use all regions.  [default:
                              

#### download myoblast bigwig and peak file from encode

In [5]:
# download myoblast 
# import subprocess
# if not os.path.exists('../data/myoblast_ENCFF647RNC_peak.bed'):
#     cmd = f'wget https://www.encodeproject.org/files/ENCFF647RNC/@@download/ENCFF647RNC.bed.gz -O ../data/myoblast_ENCFF647RNC_peak.bed'
#     subprocess.run(cmd, shell=True)

In [6]:
# import subprocess
# if not os.path.exists('../data/myoblast_ENCFF149ERN_signal.bigwig'):
#     cmd = f'wget https://www.encodeproject.org/files/ENCFF149ERN/@@download/ENCFF149ERN.bigWig -O ../data/myoblast_ENCFF149ERN_signal.bigwig'
#     subprocess.run(cmd, shell=True)    

## Run

In [7]:
!mkdir -p ./tmp

In [None]:
# takes approximately 20-60 minutes to run
!chrombert-tools infer_cell_key_regulator \
    --cell-type-bw "../data/myoblast_ENCFF149ERN_signal.bigwig" \
    --cell-type-peak "../data/myoblast_ENCFF647RNC_peak.bed" \
    --odir "./output_infer_cell_key_regulator" \
    --genome "hg38" \
    --resolution "1kb"  2> "./tmp/infer_cell_key_regulator.stderr.log" # redirect stderr to log file
    

Stage 1: Praparing the dataset
Total regions: 324464


In [None]:
# factor_importance_rank.csv: ranked key regulators for myoblast with three columns:
#   - factors: regulator names
#   - similarity: cosine similarity of regulator embeddings between up-regulated and unchanged regions
#   - ranks: importance ranking

factor_importance_rank = pd.read_csv("./output_infer_cell_regulator_network/results/factor_importance_rank.csv")
factor_importance_rank.head(n=25)


Unnamed: 0,factors,similarity,rank
0,myf5,0.184096,1
1,yap1,0.208994,2
2,cbx6,0.21147,3
3,ring1,0.245052,4
4,tead1,0.269947,5
5,myod1,0.287573,6
6,tcf21,0.30273,7
7,myog,0.313926,8
8,cbx7,0.323908,9
9,chd4,0.328688,10


### Load the fine-tuned checkpoint to infer key regulators and TRN for myoblast (skip fine-tuning)

In [None]:
# takes approximately 3-5 minutes to run
!chrombert-tools infer_cell_key_regulator \
    --cell-type-bw "../data/myoblast_ENCFF149ERN_signal.bigwig" \
    --cell-type-peak "../data/myoblast_ENCFF647RNC_peak.bed" \
    --ft-ckpt "/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_infer_cell_key_regulator/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=3-step=226.ckpt" \
    --odir "./output_infer_cell_key_regulator_load_cpkt" \
    --genome "hg38" \
    --resolution "1kb"  2> "./tmp/infer_cell_trn.stderr2.log" # redirect stderr to log file

Stage 1: Praparing the dataset
Total regions: 324464
Fast mode: downsampling to 20k regions
Finished stage 1
Use fine-tuned ChromBERT checkpoint file: /mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_infer_cell_regulator_network/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=3-step=226.ckpt to infer cell-specific trn
use organisim hg38; max sequence length is 6391
Loading checkpoint from /mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli/output_infer_cell_regulator_network/train/try_00_seed_55/lightning_logs/lightning_logs/version_0/checkpoints/epoch=3-step=226.ckpt
Loading from pl module, remove prefix 'model.'
Loaded 110/110 parameters
Finished stage 2
Stage 3: generate regulator embedding on different activity regions
Finished stage 3
Stage 4: find key regulator
Finished stage 4: identify cell-specific key regulators (top 25)
        factors  similarity 

In [17]:
factor_importance_rank = pd.read_csv("./output_infer_cell_regulator_network_load_cpkt/results/factor_importance_rank.csv")
factor_importance_rank.head(n=25)

Unnamed: 0,factors,similarity,rank
0,myf5,0.184096,1
1,yap1,0.208994,2
2,cbx6,0.21147,3
3,ring1,0.245052,4
4,tead1,0.269947,5
5,myod1,0.287573,6
6,tcf21,0.30273,7
7,myog,0.313926,8
8,cbx7,0.323908,9
9,chd4,0.328688,10
