🌐 branch-assembler.com · 🔬 Companion viewer: VariantPaths (variantpaths.com)
BRANCH is a HiFi-read genome assembler built to be state-of-the-art at low-frequency copy-number variants. It produces a lossless, CN-aware assembly graph where branches are graph bifurcations (not tumor clones). Every variant call carries VAF evidence from reads, in-silico PCR, and k-mer counts.
Phase 0 end-to-end pipeline is working on both HiFi and ONT R10.4.1 samples (--read-tech ont, see docs/ont-support.md). Known gaps: unitig collapse not yet final, RAM consumption not fully deterministic, CPU utilisation low (single-threaded overlap), per-contig chromosome projection pending (see branch project below).
reader → graph_build → graph_compactor → graph_filter → assemble
branch assemble— reads → minimizer overlap → lossless graph → GFA + FASTA + PAF + BED.branch analyze— mosdepth regions → copy-number inference with paralog awareness.branch project(v0.4.3) — three-layer reference projection: linear (CHM13 + GRCh38 via minimap2), pangenome (HPRC v1.1 via minigraph / GraphAligner), somatic delta vs. nearest pangenome path. Map in, comparison out — every branch is matched against the standard human genome and a public collection of known DNA variation, the residual edit distance to the closest known sequence is reported, brand-new changes are flagged. Seedocs/branch-project-design.md.
- BED: branch intervals (chrom, start, end, branch_id, VAF, CN).
- Consensus FASTA per branch.
- VAF evidence channels: supporting reads, primer-bracketed in-silico PCR amplicons, k-mer counts on read sequence.
- Genome-wide repeat CN for main path and every branch, normalised against single-copy reference amplicons.
VariantPaths is the
companion standalone viewer. It reads .vpf (topology + VAF + dbVar
match) and .vpz (alt-path nucleotide sequences) files, both built
from BRANCH outputs:
# 1. BRANCH produces bubble BED + GFA per sample
branch assemble --fastq sample.fq --out sample.gfa --out-reads sample.gaf
branch analyze --graph sample.gfa --reads sample.gaf --out-bed sample.bubbles.bed
# 2. Aggregate + classify across samples (dbVar overlap, recurrence, VAF)
python3 phase_d/scripts/11_branch_atlas.py \
--inputs "sampleA.bubbles.bed:sampleA,sampleB.bubbles.bed:sampleB" \
--out per_bubble_master.tsv
# 3. Convert to portable .vpf (topology) + .vpz (sequences)
# schlein-lab/variantpaths/build_vpf.py
# schlein-lab/variantpaths/build_vpz.py
# 4. Open in VariantPaths
variantpaths sample.vpf sample.vpz reference.faSee variantpaths.com for full feature docs.
cmake -S . -B build
cmake --build build -j
ctest --test-dir build --output-on-failure
Requires C++20, CMake ≥ 3.20, zlib, pthread. Vendored in third_party/: htslib, ksw2, abPOA. CUDA backend is opt-in via -DBRANCH_BUILD_CUDA=ON.
A sbatch-compatible driver lives in workflow/; adapt the partition, account, and paths to your own site. The pipeline is filesystem-agnostic — point --fastq / --bam at your reads and --out at a writable output directory.
- C++20 core.
- htslib (BAM/CRAM I/O), ksw2 (affine-gap alignment), abPOA (partial-order consensus).
- CMake build; ASan + TSan required in CI.
src/— core C++ sources.docs/architecture.md— pipeline internals, graph data model, classification problem.docs/branch-project-design.md— reference-projection subcommand design.docs/cnv_roadmap.md— low-frequency CNV roadmap.docs/graph-format-spec.md— on-disk graph format.workflow/— SLURM / Snakemake driver scaffold.tests/— unit + integration tests (GoogleTest).
Lossless graph, CN-aware nodes, VAF-tagged edges, SV-first phasing, multi-allelic branches.