# Using ChromBERT-tools with a Singularity Container

This notebook demonstrates how to use ChromBERT-tools commands with a Singularity container.

## Key Singularity Parameters

- `--nv`: Enable NVIDIA GPU support (required for GPU acceleration)
- `--bind`: Mount local directories into the container (format: `--bind /local/path:/container/path`)
- `--pwd`: Set working directory inside the container

## Notes

- All `chrombert-tools` commands work the same way inside the container.
- Running chromBERT-tools in the container produces the same outputs (format and directory structure) as running it on the host after a normal installation.
- For detailed command usage and output analysis, refer to other tutorial notebooks (e.g., `embed.ipynb`, `infer_cell_trn.ipynb`).


In [1]:
import os
workdir="/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli"
os.chdir(workdir)
os.environ["CUDA_VISIBLE_DEVICES"] = "2" # gpu device

In [2]:
sif_file = "../env_sif/chrombert.sif" # your sif file
! singularity exec {sif_file} chrombert-tools -h

[34mINFO:   [0m fuse2fs not found, will not be able to mount EXT3 filesystems
Usage: chrombert-tools [OPTIONS] COMMAND [ARGS]...

  Type -h or --help after any subcommand for more information.

Options:
  -v, --verbose  Verbose logging
  -d, --debug    Post mortem debugging
  -V, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  embed_cell_cistrome         Extract cell-specific cistrome embeddings...
  embed_cell_gene             Extract cell-specific gene embeddings
  embed_cell_region           Extract cell-specific region embeddings
  embed_cell_regulator        Extract cell-specific regulator embeddings...
  embed_cistrome              Extract general cistrome embeddings on...
  embed_gene                  Extract general gene embeddings
  embed_region                Extract general region embeddings
  embed_regulator             Extract general regulator embeddings on...
  find_driver_in_dual_region  Find driver factors in dual funct

In [4]:
# Define example data file
region_file = '../data/CTCF_ENCFF664UGR_sample100.bed'


## Basic Usage: Check Available Commands


In [5]:
! singularity exec {sif_file} chrombert-tools embed_regulator -h

[34mINFO:   [0m fuse2fs not found, will not be able to mount EXT3 filesystems
Usage: chrombert-tools embed_regulator [OPTIONS]

  Extract general regulator embeddings on specified regions

Options:
  --region FILE                   Region file.  [required]
  --regulator TEXT                Regulators of interest, e.g. EZH2 or
                                  EZH2;BRD4. Use ';' to separate multiple
                                  regulators.  [required]
  --odir DIRECTORY                Output directory.  [default: ./output]
  --genome [hg38|mm10]            Genome.  [default: hg38]
  --resolution [1kb|200bp|2kb|4kb]
                                  Resolution.  [default: 1kb]
  --batch-size INTEGER            Batch size.  [default: 64]
  --num-workers INTEGER           Dataloader workers.  [default: 8]
  --chrombert-cache-dir DIRECTORY
                                  ChromBERT cache dir (contains config/
                                  checkpoint/ etc).  [default:
           

## Example 1: Extract Regulator Embeddings

This example demonstrates running `embed_regulator` with all necessary Singularity parameters.


In [6]:
# Run embed_regulator command inside Singularity container
# --nv: Enable NVIDIA GPU
# --bind: Mount local directory to container
# --pwd: Set working directory inside container
! singularity exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools embed_regulator \
    --region {region_file} \
    --regulator "EZH2;BRD4;CTCF;FOXA3;myod1;myF5" \
    --odir "./output_emb_regulator_singularity" \
    --genome "hg38" \
    --resolution "1kb"

[34mINFO:   [0m fuse2fs not found, will not be able to mount EXT3 filesystems
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions), non-overlapping: 0
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 6, matched in ChromBERT: 5, not found: 1, not found regulator: ['foxa3']
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
use organisim hg38; max sequence length is 6391
100%|█████████████████████████████████████████████| 2/2 [00:03<00:00,  1.51s/it]
Finished!
Saved me

## Example 2: Infer transcriptional regulatory networks (TRNs)

This example demonstrates running `infer_trn` with all necessary Singularity parameters.


In [8]:
! singularity exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools infer_trn \
    --region "../data/CTCF_ENCFF664UGR_sample100.bed" \
    --regulator "ctcf;nanog;ezh2" \
    --odir "./output_trn_singularity" \
    --genome "hg38" \
    --resolution "1kb"

[34mINFO:   [0m fuse2fs not found, will not be able to mount EXT3 filesystems
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions), non-overlapping: 0
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 3, matched in ChromBERT: 3, not found: 0, not found regulator: []
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
use organisim hg38; max sequence length is 6391
100%|█████████████████████████████████████████████| 2/2 [00:03<00:00,  1.72s/it]
Total graph nodes: 765
Total graph edges (threshold=0.712): 5752
Regulator subnetwork saved to: ./output_trn_singularity/subnetwork_ctcf_k1.pdf
Regulator subnetwork saved to: ./output_trn_singularity/s

## Example 3: Infer cell-type-specific transcriptional regulatory networks (TRNs)

This example demonstrates running `infer_cell_trn` with all necessary Singularity parameters.

In [10]:
! singularity exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools infer_cell_trn \
    --cell-type-bw "../data/myoblast_ENCFF149ERN_signal.bigwig" \
    --cell-type-peak "../data/myoblast_ENCFF647RNC_peak.bed" \
    --odir "./output_infer_cell_trn_sif" \
    --genome "hg38" \
    --resolution "1kb"  2> "./tmp/infer_cell_trn.sif.stderr.log" # redirect stderr to log file

Stage 1: Praparing the dataset
Finished stage 1
Stage 2: Fine-tuning the model

[Attempt 0/2] seed=55
use organisim hg38; max sequence length is 6391
Epoch 0:  20%|████▍                 | 800/4000 [02:19<09:17,  5.74it/s, v_num=0]
Validation: |                                             | 0/? [00:00<?, ?it/s][A
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s][A
Validation DataLoader 0: 100%|████████████████| 250/250 [00:24<00:00, 10.23it/s][A
Epoch 0:  40%|▍| 1600/4000 [05:02<07:34,  5.28it/s, v_num=0, default_validation/[A
Validation: |                                             | 0/? [00:00<?, ?it/s][A
Validation:   0%|                                       | 0/250 [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|                          | 0/250 [00:00<?, ?it/s][A
Validation DataLoader 0: 100%|████████████████| 250/250 [00:25<00:00,  9.83it/s][A
Epoch 0:  60%

## Example 4: Identify driver factors distinguishing two sets of genomic regions.

This example demonstrates running `find_driver_in_dual_region` with all necessary Singularity parameters.


In [9]:
! singularity exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools find_driver_in_dual_region \
    --function1-bed "../data/hESC_GSM1003524_EZH2.bed;../data/hESC_GSM1498900_H3K27me3.bed" \
    --function2-bed "../data/hESC_GSM1003524_EZH2.bed" \
    --dual-regulator "EZH2" \
    --ignore-regulator "H3K27me3;H3K27me3/H3K4me3" \
    --odir "./output_find_driver_in_dual_region_sif" \
    --genome "hg38" \
    --resolution "1kb"  2> "./tmp/find_driver_in_dual_region.log" # redirect stderr to log file

Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 1, matched in ChromBERT: 1, not found: 0, not found regulator: []
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 2, matched in ChromBERT: 2, not found: 0, not found regulator: []
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Stage 1: Praparing the dataset
  Function1 regions (positive): 5736
  Function2 regions (negative): 5272
  Total dataset size: 11008
  Fast mode: downsampling to 20k regions (10k per class)
Finished stage 1
Stage 2: Fine-tuning the model

[Attempt 0/2] seed=55
use organisim hg38; max sequence length is 6391
Ignoring 206 cistromes and 2 regulators
Epoch 0:  20%|████▍                 | 440/2202 [01:15<05:02,  5.83it/s, v_num=0]
Validati

## Analyzing Output Files

The output files generated using Singularity are identical to those from direct command-line execution - both methods produce the same results., refer to other tutorial notebooks