# Identify driver factors in cell state transitions.

The ``find_driver_in_transition`` command identifies key transcription factors that drive changes in gene expression and/or chromatin accessibility during cell state transitions (e.g., differentiation or reprogramming).

You can run this command with:
- expression only,
- accessibility only, or
- both expression and accessibility.

Provide the corresponding input files for the analyses you want to perform.

# Example:

Find driver regulators in fibroblast-to-myoblast transition using both expression and accessibility


In [1]:
import pandas as pd
import numpy as np
import os
os.chdir("/mnt/Storage2/home/chenqianqian/projects/chrombert/chrombert_tools/ChromBERT-tools/examples/cli")

In [2]:
os.environ["CUDA_VISIBLE_DEVICES"]='2'

In [3]:
!chrombert-tools -h

Usage: chrombert-tools [OPTIONS] COMMAND [ARGS]...

  Type -h or --help after any subcommand for more information.

Options:
  -v, --verbose  Verbose logging
  -d, --debug    Post mortem debugging
  -V, --version  Show the version and exit.
  -h, --help     Show this message and exit.

Commands:
  embed_cell_cistrome         Extract cell-specific cistrome embeddings...
  embed_cell_gene             Extract cell-specific gene embeddings
  embed_cell_region           Extract cell-specific region embeddings
  embed_cell_regulator        Extract cell-specific regulator embeddings...
  embed_cistrome              Extract general cistrome embeddings on...
  embed_gene                  Extract general gene embeddings
  embed_region                Extract general region embeddings
  embed_regulator             Extract general regulator embeddings on...
  find_driver_in_dual_region  Find driver factors in dual functional...
  find_driver_in_transition   Find driver factors in cell state transit

In [4]:
!chrombert-tools find_driver_in_transition -h

Usage: chrombert-tools find_driver_in_transition [OPTIONS]

  Find driver factors in cell state transitions.

  This tool identifies key transcription factors that drive cell state
  transitions by analyzing changes in gene expression and/or chromatin
  accessibility between two cell states.

  You must provide at least one of the following: - Expression data (--exp-
  tpm1 and --exp-tpm2) - Accessibility data (--acc-peak1, --acc-peak2, --acc-
  signal1, --acc-signal2)

  Providing both expression and accessibility data yields more confident
  results.

Options:
  --exp-tpm1 FILE                 Expression (TPM) file for cell state 1. CSV
                                  format with 'gene_id' and 'tpm' columns.
  --exp-tpm2 FILE                 Expression (TPM) file for cell state 2. CSV
                                  format with 'gene_id' and 'tpm' columns.
  --acc-peak1 FILE                Chromatin accessibility peak BED file for
                                  cell state 1.
 

## Run

In [None]:
# Runtime estimates:
#   - Fast mode: ~3-5 hours
#     (uses all ~19,620 genes for expression analysis, but downsamples 
#      chromatin accessibility regions to 20k for faster training)
#
# Note: Both modes use the complete gene expression dataset. The 'fast' mode 
# only downsamples chromatin accessibility regions, not gene data.

# So this downsampled 5000 genes for expression analysis for test (10-40 minutes)

!chrombert-tools find_driver_in_transition \
  --exp-tpm1 "../data/fibroblast_expression_sample5000.csv" \
  --exp-tpm2 "../data/myoblast_expression_sample5000.csv" \
  --acc-peak1 "../data/fibroblast_ENCFF184KAM_peak.bed" \
  --acc-peak2 "../data/myoblast_ENCFF647RNC_peak.bed" \
  --acc-signal1 "../data/fibroblast_ENCFF361BTT_signal.bigwig" \
  --acc-signal2 "../data/myoblast_ENCFF149ERN_signal.bigwig" \
  --genome 'hg38' \
  --resolution '1kb' \
  --odir output_find_driver_in_transition \
  --direction "2-1" 2> "./tmp/hg38_1kb.stderr.log"

Stage 1: prepare dataset
Expression dataset already exists in output_find_driver_in_transition/exp/dataset
Processing stage 1: prepare chromatin accessibility dataset
Finished Stage 1
Whether to train ChromBERT to predict expression changes in cell state transition: True
Whether to train ChromBERT to predict chromatin accessibility changes in cell state transition: True
Processing stage 2 (exp): train ChromBERT to predict expression changes in cell state transition
Stage 2 (exp): train ChromBERT to predict expression changes in cell state transition

[Attempt 0/2] seed=55
use organisim hg38; max sequence length is 6391
Epoch 0:  20%|████▍                 | 336/1688 [03:54<15:43,  1.43it/s, v_num=0]
Validation: |                                             | 0/? [00:00<?, ?it/s][A
Validation: |                                             | 0/? [00:00<?, ?it/s][A
Validation DataLoader 0:   0%|                          | 0/105 [00:00<?, ?it/s][A
Validation DataLoader 0: 100%|██████████