In [None]:
from sctoolbox.utils.jupyter import bgcolor, _compare_version

# change the background of input cells
bgcolor("PowderBlue", select=[2, 4])

nb_name = "0A_tobias.ipynb"

_compare_version(nb_name)

#  05 - TOBIAS
<hr style="border:2px solid black"> </hr>

## 1 - Description

This notebook provides a streamlined interface for ATAC-seq footprinting analysis using the [TOBIAS](https://github.com/loosolab/TOBIAS) toolkit, wrapped via custom functions that leverage [TOBIAS-Snakemake](https://github.com/loosolab/TOBIAS_snakemake).

 **Key features:**

 * **Tn5 bias correction** — Adjusts for insertion bias in ATAC-seq libraries.
 * **Footprint scoring** — Computes footprint strength across regulatory regions.
 * **Binding site inference** — Distinguishes between bound and unbound transcription factor (TF) sites.
 * **Visualization** — Generates footprint plots for individual conditions or comparative analyses.

 At the end of this notebook, a TOBIAS configuration YAML file is generated.
 Since the full pipeline is compute-intensive, please execute it from the command line as described in the **Running the Pipeline** section below.

 For complete documentation, refer to:

 * [TOBIAS repository](https://github.com/loosolab/TOBIAS)
 * [TOBIAS Wiki](https://github.com/loosolab/TOBIAS/wiki/) for more information
 * [TOBIAS-Snakemake repository](https://github.com/loosolab/TOBIAS_snakemake)


---

## 2 - Setup

In [None]:
import pandas as pd
pd.set_option('display.max_columns', None)  # no limit to the number of columns shown
import sctoolbox
from sctoolbox import settings
import sctoolbox.utils as utils
import sctoolbox.tools as tools
import os

sctoolbox.settings.settings_from_config("config.yaml", key="0A")

---

## 3 - Load anndata

<h1><center>⬐ Fill in input data here ⬎</center></h1>

In [None]:
# Input/Output
last_notebook_adata = "anndata_4.h5ad"

---

In [None]:
adata = utils.adata.load_h5ad(last_notebook_adata)

with pd.option_context("display.max.rows", 5, "display.max.columns", None):
    display(adata)
    display(adata.obs)

---

## 4 - General input

Set the parameters to build the TOBIAS config YAML file.

<h1><center>⬐ Fill in input data here ⬎</center></h1>

In [None]:
# Set the column to evaluate. Pseudobulk bamfiles are created based on this column. 
groupby = "sample"

# Barcode column  of the adata.obs table. If None index is used.
barcode_column = None

# Ouput directory
output = "../tobias/"

# The path to the ATAC experiment BAM file
# Must contain all cells
path_bam = "test_data/10k_PBMCs_sampled_bam.bam"

# The path to the FASTA file for the organism
fasta = "homo_sapiens.104.mainChr.fa"

# The path to the (uncompressed) GTF file for the organism
gtf = "homo_sapiens.104.genes.gtf"

# The path to the blacklist file to use in the TOBIAS run (optional)
# If 'None', a mock blacklist file will be generated for the run
blacklist = 'hg38.blacklist.bed'

# The Path to the motif file for the organism (JASPAR or MEME)
motifs = "individual_motifs/*"

# If the ATAC modality already has a column with read tags matching the tags used in the bam file
# give the name of the column here
# Must match "ATAC:<name of coulmn in anndata.obs table>"
bam_barcodes = 'CB'

# If bam_barcodes is None give the name of the column that contains the raw ATAC barcodes
# Must match "ATAC:<name of column in anndata.obs table>"
raw_barcodes_ATAC = None

# Name of the organism from which the data stems
# options = ["mouse", "human", "zebrafish"]
organism = "human"

# Give the name of the TOBIAS config yaml file in the format of "<name of file>.yml"
# It cannot be 'config.yml' 
yaml = "TOBIAS_config.yml"

---

## 5 - Prepare TOBIAS run

In [None]:
input_dir, _, yml = tools.tobias.prepare_tobias(adata,
                                                groupby,
                                                output=output,
                                                path_bam=path_bam,
                                                barcode_column=barcode_column,
                                                barcode_tag='CB',
                                                fasta=fasta,
                                                motifs=motifs,
                                                gtf=gtf,
                                                blacklist=blacklist,
                                                organism=organism,
                                                yml=yaml,
                                                plot_comparison=True,
                                                plot_correction=True,
                                                plot_venn=True,
                                                coverage=False,
                                                wilson=False,
                                                threads=4)

config_yaml = os.path.join(input_dir, yml)

---

## 6. Executing the TOBIAS Pipeline

This section outlines how to launch the TOBIAS-Snakemake workflow using the configuration file generated above.

More information on TOBIAS-Snakemake can be found in the [TOBIAS-Snakemake Wiki](https://github.com/loosolab/TOBIAS_snakemake/wiki).

 **Prerequisites:**

 * Clone the [TOBIAS-Snakemake](https://github.com/loosolab/TOBIAS_snakemake) repository.
 * Ensure [TOBIAS-Snakemake](https://github.com/loosolab/TOBIAS_snakemake) is installed (see “Getting Started” in the repository’s [README](https://github.com/loosolab/TOBIAS_snakemake)).
 * Verify that Conda is available for creating rule-specific environments.

 **Notes:**

 * The `--use-conda` flag automates environment setup for each rule.
 * Tweak `--cores` based on your available CPU resources.
 * For advanced options or troubleshooting, refer to the TOBIAS-Snakemake documentation.

 **Command:**
 In your terminal, move to the [TOBIAS-Snakemake](https://github.com/loosolab/TOBIAS_snakemake) directory and execute the following command:

In [None]:
print('Bash Command:')
print(f"""snakemake \\
   --configfile {config_yaml} \\
   --use-conda \\
   --cores 10""")