# Using ChromBERT-tools with a Singularity Container

This notebook demonstrates how to use ChromBERT-tools commands with a Singularity container.

## Key Singularity Parameters

- `--nv`: Enable NVIDIA GPU support (required for GPU acceleration)

- `--bind`: Mount local directories into the container (format: `--bind /local/path:/container/path`)

- `--pwd`: Set working directory inside the container

## Notes

- All `chrombert-tools` commands work the same way inside the container.

- Running chromBERT-tools in the container produces the same outputs (format and directory structure) as running it on the host after a normal installation.

- For detailed command usage and output analysis, refer to other tutorial notebooks (e.g., `embed.ipynb`, `infer_cell_key_regulator.ipynb`).

- You don’t need to launch Jupyter from the container image.

In [1]:
import os
workdir="/mnt/Storage2/home/chenqianqian/projects/chrombert_tools/2.test/pull/ChromBERT-tools/examples/cli" # your workdir
os.chdir(workdir)
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # gpu device

In [2]:
sif_file = "/mnt/Storage2/home/chenqianqian/projects/chrombert_tools/2.test/pull/chrombert.sif" # your image file


! singularity exec --nv {sif_file} chrombert-tools -h

[34mINFO:   [0m fuse2fs not found, will not be able to mount EXT3 filesystems
Usage: chrombert-tools [OPTIONS] COMMAND [ARGS]...

  Type -h or --help after any subcommand for more information.

Options:
  -v, --verbose  Verbose logging
  -d, --debug    Post mortem debugging
  -V, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  embed_cell_cistrome             Extract cell-specific cistrome...
  embed_cell_gene                 Extract cell-specific gene embeddings
  embed_cell_region               Extract cell-specific region embeddings
  embed_cell_regulator            Extract cell-specific regulator...
  embed_cistrome                  Extract general cistrome embeddings on...
  embed_gene                      Extract general gene embeddings
  embed_region                    Extract general region embeddings
  embed_regulator                 Extract general regulator embeddings on...
  find_context_specific_cofactor  Find context-specif

In [3]:
# Define example data file
region_file = '../data/CTCF_ENCFF664UGR_sample100.bed'


## Basic Usage: Check Available Commands


In [4]:
! singularity exec {sif_file} chrombert-tools embed_regulator -h

[34mINFO:   [0m fuse2fs not found, will not be able to mount EXT3 filesystems
Usage: chrombert-tools embed_regulator [OPTIONS]

  Extract general regulator embeddings on specified regions

Options:
  --region FILE                   Region file.  [required]
  --regulator TEXT                Regulators of interest, e.g. EZH2 or
                                  EZH2;BRD4. Use ';' to separate multiple
                                  regulators.  [required]
  --odir DIRECTORY                Output directory.  [default: ./output]
  --oname TEXT                    Output name of the regulator embeddings.
                                  [default: regulator_emb]
  --genome [hg38|mm10]            Genome.  [default: hg38]
  --resolution [1kb|200bp|2kb|4kb]
                                  Resolution.  [default: 1kb]
  --batch-size INTEGER            Batch size.  [default: 64]
  --num-workers INTEGER           Dataloader workers.  [default: 8]
  --chrombert-cache-dir DIRECTORY
            

## Example 1: Extract Regulator Embeddings

This example demonstrates running `embed_regulator` with all necessary Singularity parameters.


In [5]:
# Run embed_regulator command inside Singularity container
# --nv: Enable NVIDIA GPU
# --bind: Mount local directory to container
# --pwd: Set working directory inside container
! singularity exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools embed_regulator \
    --region {region_file} \
    --regulator "EZH2;BRD4;CTCF;FOXA3;myod1;myF5" \
    --odir "./output_emb_regulator_singularity" \
    --genome "hg38" \
    --resolution "1kb"

[34mINFO:   [0m fuse2fs not found, will not be able to mount EXT3 filesystems
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, We keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region),non-overlapping: 0
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 6, matched in ChromBERT: 5, not found: 1, not found regulator: ['foxa3']
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
use organisim hg38; max sequence length is 6391
100%|██████████

## Example 2: Infer regulator-regulator networks

This example demonstrates running `infer_regulator_network` with all necessary Singularity parameters.


In [6]:
! singularity exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools infer_regulator_network \
    --region "../data/CTCF_ENCFF664UGR_sample100.bed" \
    --regulator "ctcf;nanog;ezh2" \
    --odir "./output_trn_singularity_1kb" \
    --genome "hg38" \
    --resolution "1kb"

[34mINFO:   [0m fuse2fs not found, will not be able to mount EXT3 filesystems
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, We keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region),non-overlapping: 0
Note: All regulator names were converted to lowercase for matching.
Regulator count summary - requested: 3, matched in ChromBERT: 3, not found: 0, not found regulator: []
ChromBERT regulators: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_regulators_list.txt
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
use organisim hg38; max sequence length is 6391
100%|█████████████████████████████████████████████| 2/2 [00:04<00:00,  2.32s/it]
Total graph nodes: 951
Total graph edges (threshold=0.636): 11503
Regulator subnetwork saved to: ./output_trn_singula

## Example 3: Impute cistromes

This example demonstrates running `impute_cistrome` with all necessary Singularity parameters.

In [7]:
! singularity exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools impute_cistrome \
    --cistrome "BCL11A:GM12878;BRD4:MCF7;CTCF:HepG2;MYC:H1;MYC:h9;SPI1:GSM2702714" \
    --region "../data/CTCF_ENCFF664UGR_sample100.bed" \
    --odir "./output_impute" \
    --genome "hg38" \
    --resolution "1kb"

[34mINFO:   [0m fuse2fs not found, will not be able to mount EXT3 filesystems
Region summary - total: 100, overlapping with ChromBERT: 100 (one region may overlap multiple ChromBERT regions, We keep overlaps with ≥50% coverage of either the ChromBERT bin or the input region),non-overlapping: 0
celltype: h1 has no corresponding wild type dnase data in ChromBERT.
Note: All cistromes names were converted to lowercase for matching.
Cistromes count summary - requested: 6, matched in ChromBERT: 5, not found: 1, not found cistromes: ['myc:h1']
ChromBERT cistromes metas: /mnt/Storage/home/chenqianqian/.cache/chrombert/data/config/hg38_6k_meta.tsv
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.
Your supervised_file does not contain the 'label' column. Please verify whether ground truth column ('label') is required. If it is not needed, you may disregard this message.

## Example 4: Infer cell-type-specific key regulators

This example demonstrates running `infer_cell_key_regulator` with all necessary Singularity parameters.

In [None]:
! singularity exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools infer_cell_key_regulator \
    --cell-type-bw "../data/myoblast_ENCFF149ERN_signal.bigwig" \
    --cell-type-peak "../data/myoblast_ENCFF647RNC_peak.bed" \
    --odir "./output_infer_cell_key_regulator" \
    --genome "hg38" \
    --resolution "1kb"  2> "./tmp/infer_cell_key_regulator.sif.stderr.log" # redirect stderr to log file

## Example 5: find context-specific cofactors of EZH2.

This example demonstrates running `find_context_specific_cofactor` with all necessary Singularity parameters.


In [None]:
! singularity exec --nv \
    --bind /mnt/Storage2/home/chenqianqian/:/mnt/Storage2/home/chenqianqian/ \
    --pwd {workdir} \
    {sif_file} \
    chrombert-tools find_context_specific_cofactor \
    --function1-bed "../data/hESC_GSM1003524_EZH2.bed;../data/hESC_GSM1498900_H3K27me3.bed" \
    --function2-bed "../data/hESC_GSM1003524_EZH2.bed" \
    --dual-regulator "EZH2" \
    --ignore-regulator "H3K27me3;H3K27me3/H3K4me3" \
    --odir "./output_find_context_specific_cofactor_sif" \
    --genome "hg38" \
    --resolution "1kb"  2> "./tmp/find_context_specific_cofactor_region.log" # redirect stderr to log file

## Analyzing Output Files

The output files generated using Singularity are identical to those from direct command-line execution - both methods produce the same results., refer to other tutorial notebooks