Skip to content

BABAPPAlign v1.2.0

Choose a tag to compare

@sinhakrishnendu sinhakrishnendu released this 10 Feb 09:45
· 15 commits to main since this release
132a20d

BABAPPAlign v1.2.0

Major Update: Native Codon Alignment Mode

BABAPPAlign v1.2.0 introduces a fully integrated codon alignment mode.

The alignment core (affine-gap Gotoh DP with explicit traceback matrices) remains unchanged. This release extends functionality without modifying the alignment engine.


New Features

Codon Mode

Run:

babappalign cds.fasta --model babappascore --mode codon

The program will:

  1. Validate CDS sequences

    • Length divisible by 3
    • No internal stop codons
    • Valid nucleotide alphabet
  2. Translate CDS → protein

  3. Perform embedding-based progressive protein alignment

  4. Back-map to codon alignment (PAL2NAL-style logic)

Outputs:

cds.protein.aln.fasta   (intermediate protein alignment)
cds.codon.aln.fasta     (final codon alignment)

No external PAL2NAL dependency is required.

Gap penalties are automatically scaled in codon mode to maintain biological consistency.


CLI Improvements

The -o/--output option has been removed.

Output filenames are automatically generated:

Protein mode:
input.protein.aln.fasta

Codon mode:
input.protein.aln.fasta
input.codon.aln.fasta

This simplifies usage and prevents accidental overwriting.


Alignment Core Integrity

The alignment engine is unchanged:

  • Three-state affine-gap DP (M, Ix, Iy)
  • Explicit traceback matrices
  • Exact dynamic programming
  • Neural scoring performed outside DP recursion

Scientific reproducibility from previous versions is preserved.


Model Loading Improvements

  • HuggingFace warnings suppressed
  • No pooler initialization warnings
  • Explicit model resolution maintained
  • SHA-256 checksum printed at runtime

Validation

Codon mode validated against PAL2NAL for:

  • Frame preservation
  • Triplet gap integrity
  • Alignment length consistency
  • Per-site identity comparison

Installation

pip install --upgrade babappalign

Requirements

  • External trained scoring model (BABAPPAScore)
  • Model distributed via Zenodo

Concept DOI:
https://doi.org/10.5281/zenodo.18053200


Summary

v1.2.0 adds native codon alignment while preserving the exact alignment core.
The tool now supports both protein and CDS workflows in a single, reproducible framework.