# Test Notebook: Alignment Module

This notebook tests the `alignment` module for speech-to-text alignment.

**Features tested:**
1. Module imports and structure
2. Data classes (AlignmentResult, AlignedWord, AlignedToken)
3. WFST factor transducer construction
4. Tokenizers (Character, BPE, Phoneme) - via unified `text_frontend`
5. Audio segmentation - via unified `audio_frontend`
6. LIS (Longest Increasing Subsequence) utilities
7. MFA backend availability
8. Gentle backend availability
9. Full WFST Aligner integration
10. **Segment-wise alignment API** (for use with stitching_utils)
11. **Ground truth data** loading
12. **Run WFST alignment** on sample audio
13. **Accuracy comparison** (frame error, IoU metrics)
14. **Listening test** (audio preview of aligned words)
15. **MFA aligner test** (if installed)
16. **Gentle aligner test** (if installed)

**Architecture:**
The alignment module produces **SEGMENT-WISE** results (each segment aligned independently).
For global alignment, use `stitching_utils` to combine segments.

**Alignment Backends:**
| Backend | Description | Fuzzy Support | Languages |
|---------|-------------|---------------|-----------|
| `WFSTAligner` | k2-based factor transducer | ‚úÖ Yes | 1100+ (MMS) |
| `MFAAligner` | Montreal Forced Aligner | ‚ùå No | 50+ |
| `GentleAligner` | Kaldi-based (English) | ‚ùå No | English only |

**Key methods:**
- `aligner.align_segments(waveform, text)` ‚Üí `List[SegmentAlignmentResult]` (no stitching)
- `aligner.align(waveform, text, stitch=True)` ‚Üí `AlignmentResult` (with optional stitching)

**Accuracy Testing:**
- Ground truth from MMS-FA CTC alignment (50fps frame rate)
- Metrics: Frame error (start/end), IoU (boundary overlap)
- Audio preview: `preview_word(idx)`, `preview_word_by_name("CURIOSITY")`, `preview_all_words()`

**Installation (Colab):**
```bash
# GPU Version
pip install k2==1.24.4.dev20251030+cuda12.6.torch2.9.0 -f https://k2-fsa.github.io/k2/cuda.html

# CPU Version (use --no-deps to avoid env changes)
pip install k2==1.24.4.dev20251029+cpu.torch2.9.0 --no-deps -f https://k2-fsa.github.io/k2/cpu.html

# Common dependencies
pip install pytorch-lightning cmudict g2p_en pydub
pip install git+https://github.com/huangruizhe/lis.git

# Optional: MFA
pip install montreal-forced-aligner

# Optional: Gentle
pip install gentle  # or docker run -p 8765:8765 lowerquality/gentle
```

## Setup

In [None]:
!rm -rf /content/torchaudio_aligner

In [None]:
# =============================================================================
# Install Dependencies (run once)
# =============================================================================
# Uncomment and run the appropriate section for your environment

# ===== GPU Version (Colab with GPU) =====
# !pip install k2==1.24.4.dev20251030+cuda12.6.torch2.9.0 -f https://k2-fsa.github.io/k2/cuda.html
# !pip install pytorch-lightning
# !pip install cmudict g2p_en
# !pip install pydub
# !pip install git+https://github.com/huangruizhe/lis.git

# ===== CPU Version (Colab CPU or local) =====
# Note: --no-deps to avoid changing the Python environment
!pip install k2==1.24.4.dev20251029+cpu.torch2.9.0 --no-deps -f https://k2-fsa.github.io/k2/cpu.html
!pip install pytorch-lightning
!pip install cmudict g2p_en
!pip install pydub
!pip install git+https://github.com/huangruizhe/lis.git
!pip install torchcodec

# ===== Optional: MFA (Montreal Forced Aligner) =====
# !pip install montreal-forced-aligner
# # Or via conda: conda install -c conda-forge montreal-forced-aligner

# ===== Optional: Gentle Aligner =====
# !pip install gentle
# # Or via Docker: docker run -p 8765:8765 lowerquality/gentle

In [None]:
# =============================================================================
# Setup: Configure Imports
# =============================================================================

import sys
import os
from pathlib import Path

# ===== CONFIGURATION =====
GITHUB_REPO = "https://github.com/huangruizhe/torchaudio_aligner.git"
BRANCH = "dev"  # Use 'dev' for testing, 'main' for stable
# =========================

# Test result tracking
test_results = {}

def setup_imports():
    """Setup Python path for imports based on environment."""
    
    IN_COLAB = 'google.colab' in sys.modules
    
    if IN_COLAB:
        repo_path = '/content/torchaudio_aligner'
        src_path = f'{repo_path}/src'
        
        if not os.path.exists(repo_path):
            print(f"Cloning repository (branch: {BRANCH})...")
            os.system(f'git clone -b {BRANCH} {GITHUB_REPO} {repo_path}')
            print("Repository cloned")
        else:
            print(f"Updating repository (branch: {BRANCH})...")
            os.system(f'cd {repo_path} && git fetch origin && git checkout {BRANCH} && git pull origin {BRANCH}')
            print("Repository updated")
    else:
        possible_paths = [
            Path(".").absolute().parent / "src",
            Path(".").absolute() / "src",
        ]
        
        src_path = None
        for p in possible_paths:
            if p.exists() and (p / "alignment").exists():
                src_path = str(p.absolute())
                break
        
        if src_path is None:
            raise FileNotFoundError("src directory not found")
        
        print(f"Running locally from: {src_path}")
    
    if src_path not in sys.path:
        sys.path.insert(0, src_path)
    
    return src_path

src_path = setup_imports()

import torch
import logging
logging.basicConfig(level=logging.INFO)

# Check dependencies
print()
print("=" * 60)
print("Checking dependencies...")
print("=" * 60)

# Check k2
try:
    import k2
    print(f"‚úÖ k2 version:")
    ! pip show k2
    K2_AVAILABLE = True
except ImportError:
    print("‚ö†Ô∏è k2 not available - WFST tests will be limited")
    print("   Install with: pip install k2 -f https://k2-fsa.github.io/k2/cpu.html")
    K2_AVAILABLE = False

# Check lis
try:
    import lis
    print("‚úÖ lis library available")
    LIS_AVAILABLE = True
except ImportError:
    print("‚ö†Ô∏è lis not available - LIS tests will be skipped")
    print("   Install with: pip install git+https://github.com/huangruizhe/lis.git")
    LIS_AVAILABLE = False

print(f"   Device: {'cuda' if torch.cuda.is_available() else 'cpu'}")

## Test 1: Module Imports and Structure

In [None]:
print("=" * 60)
print("Test 1: Module Imports and Structure")
print("=" * 60)

try:
    from alignment import (
        # Data classes
        AlignmentResult,
        AlignedWord,
        AlignedToken,
        AlignmentConfig,
        # Base class
        AlignerBackend,
        # Backends
        WFSTAligner,
        MFAAligner,
        GentleAligner,
        # API functions
        align,
        align_long_audio,
        get_aligner,
        list_backends,
    )
    
    print("üì¶ Imports successful!")
    
    # List available backends
    backends = list_backends()
    print("\nüîß Available backends:")
    for name, info in backends.items():
        status = "üöß" if info.get("status") == "placeholder" else "‚úÖ"
        print(f"   {status} {name}: {info['description']}")
        print(f"      Languages: {info['languages']}")
        print(f"      Fuzzy alignment: {info['fuzzy']}")
    
    test_results["Test 1"] = "‚úÖ PASSED"
    print(f"\n‚úÖ Test 1 PASSED - Module imports successful")
except Exception as e:
    test_results["Test 1"] = "‚ùå FAILED"
    print(f"\n‚ùå Test 1 FAILED: {e}")
    import traceback
    traceback.print_exc()

## Test 2: Data Classes

In [None]:
print("=" * 60)
print("Test 2: Data Classes (AlignmentConfig, AlignedWord, AlignmentResult)")
print("=" * 60)

try:
    # Test AlignmentConfig
    print("\nüìã AlignmentConfig:")
    config = AlignmentConfig(
        backend="wfst",
        language="eng",
        segment_size=15.0,
        overlap=2.0,
        skip_penalty=-0.5,
        return_penalty=-18.0,
    )
    print(f"   ‚Ä¢ Backend: {config.backend}")
    print(f"   ‚Ä¢ Device: {config.device}")
    print(f"   ‚Ä¢ Segment size: {config.segment_size}s")
    print(f"   ‚Ä¢ Skip penalty: {config.skip_penalty}")
    print(f"   ‚Ä¢ Return penalty: {config.return_penalty}")
    
    # Test AlignedWord
    print("\nüìù AlignedWord:")
    word = AlignedWord(
        word="hello",
        start_time=100,
        end_time=150,
        phones=[
            AlignedToken(token_id="h", timestamp=100, score=0.9),
            AlignedToken(token_id="e", timestamp=110, score=0.85),
            AlignedToken(token_id="l", timestamp=120, score=0.88),
            AlignedToken(token_id="l", timestamp=130, score=0.92),
            AlignedToken(token_id="o", timestamp=140, score=0.87),
        ],
    )
    print(f"   ‚Ä¢ Word: '{word.word}'")
    print(f"   ‚Ä¢ Start: {word.start_seconds:.2f}s")
    print(f"   ‚Ä¢ End: {word.end_seconds:.2f}s")
    print(f"   ‚Ä¢ Duration: {word.duration:.2f}s")
    print(f"   ‚Ä¢ Phones: {[p.token_id for p in word.phones]}")
    
    # Test AlignmentResult
    print("\nüìä AlignmentResult:")
    result = AlignmentResult(
        word_alignments={
            0: AlignedWord("hello", 100, 150),
            1: AlignedWord("world", 160, 220),
        },
        unaligned_indices=[(2, 3)],
    )
    print(f"   ‚Ä¢ Aligned words: {result.num_aligned_words}")
    print(f"   ‚Ä¢ Aligned text: '{result.aligned_text}'")
    print(f"   ‚Ä¢ Unaligned regions: {result.unaligned_indices}")
    
    # Test Audacity export
    labels = result.to_audacity_labels()
    print(f"\nüè∑Ô∏è Audacity labels format:")
    for line in labels.split('\n'):
        print(f"   {line}")
    
    test_results["Test 2"] = "‚úÖ PASSED"
    print(f"\n‚úÖ Test 2 PASSED - Data classes work correctly")
except Exception as e:
    test_results["Test 2"] = "‚ùå FAILED"
    print(f"\n‚ùå Test 2 FAILED: {e}")
    import traceback
    traceback.print_exc()

## Test 3: WFST Factor Transducer Construction

In [None]:
print("=" * 60)
print("Test 3: WFST Factor Transducer Construction")
print("=" * 60)

if not K2_AVAILABLE:
    test_results["Test 3"] = "‚è≠Ô∏è SKIPPED"
    print("‚è≠Ô∏è Test 3 SKIPPED - k2 not available")
    print("   Install with: pip install k2 -f https://k2-fsa.github.io/k2/cpu.html")
else:
    try:
        from alignment.wfst import (
            make_factor_transducer_word_level_index_with_skip,
            flatten_list,
        )
        
        # Simulated tokenized text: [[h,e,l,l,o], [w,o,r,l,d]]
        # Using fake token IDs
        tokenized_text = [
            [7, 4, 11, 11, 14],   # hello
            [22, 14, 17, 11, 3],  # world
        ]
        
        print(f"\nüìù Tokenized text: {tokenized_text}")
        print(f"   Flattened: {flatten_list(tokenized_text)}")
        
        # Build factor transducer
        graph, word_sym, token_sym = make_factor_transducer_word_level_index_with_skip(
            tokenized_text,
            skip_penalty=-0.5,
            return_penalty=-18.0,
        )
        
        print(f"\nüîß Factor Transducer:")
        print(f"   ‚Ä¢ States: {graph.shape[0]}")
        print(f"   ‚Ä¢ Arcs: {graph.num_arcs}")
        print(f"   ‚Ä¢ Skip ID: {graph.skip_id}")
        print(f"   ‚Ä¢ Return ID: {graph.return_id}")
        
        print(f"\nüìñ Symbol tables:")
        print(f"   ‚Ä¢ Word index table: {word_sym}")
        print(f"   ‚Ä¢ Token table (first 5): {dict(list(token_sym.items())[:5])}...")
        
        test_results["Test 3"] = "‚úÖ PASSED"
        print(f"\n‚úÖ Test 3 PASSED - Factor transducer construction works")
    except Exception as e:
        test_results["Test 3"] = "‚ùå FAILED"
        print(f"\n‚ùå Test 3 FAILED: {e}")
        import traceback
        traceback.print_exc()

## Test 4: Tokenizers

In [None]:
print("=" * 60)
print("Test 4: Tokenizers (via text_frontend)")
print("=" * 60)

try:
    # Import from unified text_frontend
    from text_frontend import (
        TokenizerInterface,
        CharTokenizer,
        create_tokenizer_from_labels,
    )
    
    # Create tokenizer with MMS-FA style labels
    labels = ('-', 'a', 'i', 'e', 'n', 'o', 'u', 't', 's', 'r', 'm', 'k', 'l', 'd', 
              'g', 'h', 'y', 'b', 'p', 'w', 'c', 'v', 'j', 'z', 'f', "'", 'q', 'x', '*')
    
    tokenizer = create_tokenizer_from_labels(labels, blank_token='-', unk_token='*')
    
    print(f"\nüî§ CharTokenizer (MMS-FA style):")
    print(f"   ‚Ä¢ Vocab size: {len(tokenizer.token2id)}")
    print(f"   ‚Ä¢ Blank ID: {tokenizer.blk_id}")
    print(f"   ‚Ä¢ UNK ID: {tokenizer.unk_id}")
    print(f"   ‚Ä¢ Implements TokenizerInterface: {isinstance(tokenizer, TokenizerInterface)}")
    
    # Test encoding
    text = "hello world"
    normalized = tokenizer.text_normalize(text)
    encoded = tokenizer.encode(normalized)
    decoded = tokenizer.decode(encoded)
    
    print(f"\nüìù Encoding test:")
    print(f"   ‚Ä¢ Original: '{text}'")
    print(f"   ‚Ä¢ Normalized: '{normalized}'")
    print(f"   ‚Ä¢ Encoded: {encoded}")
    print(f"   ‚Ä¢ Decoded: {decoded}")
    
    # Test flatten
    flattened = tokenizer.encode_flatten(normalized)
    print(f"   ‚Ä¢ Flattened: {flattened}")
    
    # Test with OOV characters
    text_oov = "hello ‰Ω†Â•Ω world"
    normalized_oov = tokenizer.text_normalize(text_oov)
    encoded_oov = tokenizer.encode(normalized_oov)
    
    print(f"\nüåê OOV handling:")
    print(f"   ‚Ä¢ Original: '{text_oov}'")
    print(f"   ‚Ä¢ Normalized: '{normalized_oov}'")
    print(f"   ‚Ä¢ Encoded: {encoded_oov}")
    
    test_results["Test 4"] = "‚úÖ PASSED"
    print(f"\n‚úÖ Test 4 PASSED - Tokenizers work correctly")
except Exception as e:
    test_results["Test 4"] = "‚ùå FAILED"
    print(f"\n‚ùå Test 4 FAILED: {e}")
    import traceback
    traceback.print_exc()

## Test 5: Audio Segmentation (via audio_frontend)

In [None]:
print("=" * 60)
print("Test 5: Audio Segmentation (via audio_frontend)")
print("=" * 60)

try:
    # Import from unified audio_frontend
    from audio_frontend import (
        segment_waveform,
        AudioSegment,
        SegmentationResult,
    )
    
    # Create test waveform: 30 seconds at 16kHz
    waveform = torch.randn(480000)  # (T,) - 1D
    sample_rate = 16000
    
    print(f"\nüéµ Input waveform:")
    print(f"   ‚Ä¢ Shape: {waveform.shape}")
    print(f"   ‚Ä¢ Duration: {waveform.shape[0] / sample_rate:.2f}s")
    
    # Segment with overlap using audio_frontend
    result = segment_waveform(
        waveform,
        sample_rate=sample_rate,
        segment_size=15.0,    # 15 seconds
        overlap=2.0,          # 2 seconds overlap
    )
    
    print(f"\n‚úÇÔ∏è Segmentation result (SegmentationResult):")
    print(f"   ‚Ä¢ Num segments: {result.num_segments}")
    print(f"   ‚Ä¢ Segment size: {result.segment_size_samples} samples ({result.segment_size_samples/sample_rate:.2f}s)")
    print(f"   ‚Ä¢ Overlap: {result.overlap_samples} samples ({result.overlap_samples/sample_rate:.2f}s)")
    print(f"   ‚Ä¢ Original duration: {result.original_duration_seconds:.2f}s")
    
    # Get batched tensors
    segments, lengths = result.get_waveforms_batched()
    offsets = torch.tensor([seg.offset_samples for seg in result.segments])
    
    print(f"\nüì¶ Batched tensors:")
    print(f"   ‚Ä¢ Segments shape: {segments.shape}")
    print(f"   ‚Ä¢ Lengths: {lengths.tolist()}")
    print(f"   ‚Ä¢ Offsets: {offsets.tolist()}")
    
    # Verify AudioSegment objects
    print(f"\nüîç First segment (AudioSegment):")
    seg0 = result.segments[0]
    print(f"   ‚Ä¢ Waveform shape: {seg0.waveform.shape}")
    print(f"   ‚Ä¢ Offset: {seg0.offset_samples} samples ({seg0.offset_seconds:.2f}s)")
    print(f"   ‚Ä¢ Duration: {seg0.duration_seconds:.2f}s")
    print(f"   ‚Ä¢ Index: {seg0.segment_index}")
    
    # Verify overlap
    step = result.segment_size_samples - result.overlap_samples
    expected_offsets = [i * step for i in range(result.num_segments)]
    # Allow for small differences due to extra_samples
    offsets_match = all(abs(a - e) < 200 for a, e in zip(offsets.tolist(), expected_offsets))
    
    print(f"\nüìê Overlap verification:")
    print(f"   ‚Ä¢ Step size: {step} samples ({step/sample_rate:.2f}s)")
    print(f"   ‚Ä¢ Offsets approximately match expected: {'‚úÖ' if offsets_match else '‚ùå'}")
    
    test_results["Test 5"] = "‚úÖ PASSED"
    print(f"\n‚úÖ Test 5 PASSED - Segmentation works correctly")
except Exception as e:
    test_results["Test 5"] = "‚ùå FAILED"
    print(f"\n‚ùå Test 5 FAILED: {e}")
    import traceback
    traceback.print_exc()

## Test 6: LIS Utilities

In [None]:
print("=" * 60)
print("Test 6: LIS (Longest Increasing Subsequence) Utilities")
print("=" * 60)

if not LIS_AVAILABLE:
    test_results["Test 6"] = "‚è≠Ô∏è SKIPPED"
    print("‚è≠Ô∏è Test 6 SKIPPED - lis library not available")
    print("   Install with: pip install git+https://github.com/huangruizhe/lis.git")
else:
    try:
        from alignment.wfst.lis_utils import (
            compute_lis,
            remove_outliers,
            remove_isolated_words,
            find_unaligned_regions,
        )
        
        # Test LIS computation
        # Simulating word indices from multiple overlapping segments
        word_indices = [1, 5, 2, 6, 3, 7, 4, 8, 9, 10, 11, 15, 12, 16, 13]
        
        print(f"\nüìà LIS computation:")
        print(f"   ‚Ä¢ Input: {word_indices}")
        
        lis_result = compute_lis(word_indices)
        print(f"   ‚Ä¢ LIS: {lis_result}")
        print(f"   ‚Ä¢ LIS length: {len(lis_result)}")
        
        # Verify LIS is increasing
        is_increasing = all(lis_result[i] < lis_result[i+1] for i in range(len(lis_result)-1))
        print(f"   ‚Ä¢ Is strictly increasing: {'‚úÖ' if is_increasing else '‚ùå'}")
        
        # Test outlier removal
        print(f"\nüîç Outlier removal:")
        with_outliers = [5, 100, 10, 15, 20, 25, 30, 35, 200]
        cleaned = remove_outliers(with_outliers, scan_range=3, outlier_threshold=50)
        print(f"   ‚Ä¢ Input: {with_outliers}")
        print(f"   ‚Ä¢ Cleaned: {cleaned}")
        
        # Test unaligned region detection
        print(f"\nüï≥Ô∏è Unaligned region detection:")
        aligned = set(lis_result)
        rg_min, rg_max = min(lis_result), max(lis_result)
        unaligned = find_unaligned_regions(rg_min, rg_max, aligned)
        print(f"   ‚Ä¢ Aligned range: [{rg_min}, {rg_max}]")
        print(f"   ‚Ä¢ Aligned indices: {sorted(aligned)}")
        print(f"   ‚Ä¢ Unaligned regions: {unaligned}")
        
        test_results["Test 6"] = "‚úÖ PASSED"
        print(f"\n‚úÖ Test 6 PASSED - LIS utilities work correctly")
    except Exception as e:
        test_results["Test 6"] = "‚ùå FAILED"
        print(f"\n‚ùå Test 6 FAILED: {e}")
        import traceback
        traceback.print_exc()

## Test 7: MFA Backend Availability

In [None]:
print("=" * 60)
print("Test 7: MFA Backend Availability")
print("=" * 60)

try:
    from alignment import MFAAligner, AlignmentConfig
    
    config = AlignmentConfig(backend="mfa", language="english_us_arpa")
    aligner = MFAAligner(config)
    
    print(f"\nüîß MFA Aligner:")
    print(f"   ‚Ä¢ Backend name: {aligner.name}")
    print(f"   ‚Ä¢ Acoustic model: {aligner.acoustic_model}")
    print(f"   ‚Ä¢ Dictionary: {aligner.dictionary}")
    print(f"   ‚Ä¢ Supported languages (sample): {aligner.SUPPORTED_LANGUAGES[:5]}...")
    
    # Check if MFA is available
    mfa_available = aligner._check_mfa_available()
    
    if mfa_available:
        print(f"\n‚úÖ MFA CLI is installed and available")
        test_results["Test 7"] = "‚úÖ PASSED"
    else:
        print(f"\n‚ö†Ô∏è MFA CLI not installed (optional)")
        print(f"   Install with: conda install -c conda-forge montreal-forced-aligner")
        test_results["Test 7"] = "‚ö†Ô∏è MFA NOT INSTALLED"
    
    print(f"\n‚úÖ Test 7 PASSED - MFA backend class works")
except Exception as e:
    test_results["Test 7"] = "‚ùå FAILED"
    print(f"\n‚ùå Test 7 FAILED: {e}")
    import traceback
    traceback.print_exc()

## Test 8: Gentle Backend Availability

In [None]:
print("=" * 60)
print("Test 8: Gentle Backend Availability")
print("=" * 60)

try:
    from alignment import GentleAligner, AlignmentConfig
    
    config = AlignmentConfig(backend="gentle")
    aligner = GentleAligner(config)
    
    print(f"\nüîß Gentle Aligner:")
    print(f"   ‚Ä¢ Backend name: {aligner.name}")
    print(f"   ‚Ä¢ Server URL: {aligner.server_url}")
    print(f"   ‚Ä¢ Supported languages: {aligner.SUPPORTED_LANGUAGES}")
    
    # Check availability
    python_available = aligner._check_gentle_python()
    server_available = aligner._check_gentle_server()
    
    print(f"\nüì° Availability:")
    print(f"   ‚Ä¢ Python API: {'‚úÖ' if python_available else '‚ùå not installed'}")
    print(f"   ‚Ä¢ Server (localhost:8765): {'‚úÖ' if server_available else '‚ùå not running'}")
    
    if python_available or server_available:
        test_results["Test 8"] = "‚úÖ PASSED"
        print(f"\n‚úÖ Gentle is available")
    else:
        test_results["Test 8"] = "‚ö†Ô∏è GENTLE NOT INSTALLED"
        print(f"\n‚ö†Ô∏è Gentle not available (optional)")
        print(f"   Install: git clone https://github.com/lowerquality/gentle && cd gentle && ./install.sh")
        print(f"   Or start server: docker run -p 8765:8765 lowerquality/gentle")
    
    print(f"\n‚úÖ Test 8 PASSED - Gentle backend class works")
except Exception as e:
    test_results["Test 8"] = "‚ùå FAILED"
    print(f"\n‚ùå Test 8 FAILED: {e}")
    import traceback
    traceback.print_exc()

## Test 10: Segment-wise Alignment (for stitching_utils)

In [None]:
print("=" * 60)
print("Test 10: Segment-wise Alignment (for stitching_utils)")
print("=" * 60)

if not K2_AVAILABLE or not LIS_AVAILABLE:
    missing = []
    if not K2_AVAILABLE:
        missing.append("k2")
    if not LIS_AVAILABLE:
        missing.append("lis")
    test_results["Test 10"] = "‚è≠Ô∏è SKIPPED"
    print(f"‚è≠Ô∏è Test 10 SKIPPED - Missing dependencies: {', '.join(missing)}")
else:
    try:
        from alignment import WFSTAligner, AlignmentConfig, SegmentAlignmentResult
        
        print("\nüìã SegmentAlignmentResult data class:")
        print(f"   ‚Ä¢ Available: ‚úÖ")
        
        # Test SegmentAlignmentResult
        from alignment.base import AlignedToken
        test_tokens = [
            AlignedToken(1, 10, 0.9, {"wid": 0}),
            AlignedToken(2, 20, 0.85, {"wid": 1}),
        ]
        seg_result = SegmentAlignmentResult(
            tokens=test_tokens,
            segment_index=0,
            frame_offset=0,
            rejected=False,
            score=0.95,
        )
        
        print(f"   ‚Ä¢ Num tokens: {len(seg_result)}")
        print(f"   ‚Ä¢ Word indices: {seg_result.get_word_indices()}")
        print(f"   ‚Ä¢ Rejected: {seg_result.rejected}")
        print(f"   ‚Ä¢ Score: {seg_result.score:.2f}")
        
        # Test WFSTAligner.align_segments method exists
        print(f"\nüîß WFSTAligner.align_segments method:")
        config = AlignmentConfig(
            backend="wfst",
            segment_size=15.0,
            overlap=2.0,
        )
        aligner = WFSTAligner(config)
        
        has_align_segments = hasattr(aligner, 'align_segments')
        print(f"   ‚Ä¢ Method exists: {'‚úÖ' if has_align_segments else '‚ùå'}")
        
        if has_align_segments:
            import inspect
            sig = inspect.signature(aligner.align_segments)
            params = list(sig.parameters.keys())
            print(f"   ‚Ä¢ Parameters: {params}")
            print(f"   ‚Ä¢ Returns: List[SegmentAlignmentResult]")
        
        # Test align() with stitch=False option
        print(f"\nüîß WFSTAligner.align(stitch=False) option:")
        sig = inspect.signature(aligner.align)
        params = dict(sig.parameters)
        has_stitch_param = 'stitch' in params
        print(f"   ‚Ä¢ 'stitch' parameter exists: {'‚úÖ' if has_stitch_param else '‚ùå'}")
        if has_stitch_param:
            default = params['stitch'].default
            print(f"   ‚Ä¢ Default value: {default}")
        
        # Show usage example
        print(f"\nüìù Usage with stitching_utils:")
        print(f"   # Get segment-wise results")
        print(f"   segment_results = aligner.align_segments(waveform, text)")
        print(f"   ")
        print(f"   # Convert to stitching_utils format")
        print(f"   from stitching_utils import SegmentAlignment, stitch_alignments")
        print(f"   stitch_input = [")
        print(f"       SegmentAlignment(")
        print(f"           tokens=seg.tokens,")
        print(f"           segment_index=seg.segment_index,")
        print(f"           frame_offset=seg.frame_offset,")
        print(f"           rejected=seg.rejected,")
        print(f"       )")
        print(f"       for seg in segment_results")
        print(f"   ]")
        print(f"   final = stitch_alignments(stitch_input, method='lis')")
        
        test_results["Test 10"] = "‚úÖ PASSED"
        print(f"\n‚úÖ Test 10 PASSED - Segment-wise alignment API ready")
    except Exception as e:
        test_results["Test 10"] = "‚ùå FAILED"
        print(f"\n‚ùå Test 10 FAILED: {e}")
        import traceback
        traceback.print_exc()

In [None]:
print("=" * 60)
print("Test 9: WFST Aligner Integration")
print("=" * 60)

if not K2_AVAILABLE or not LIS_AVAILABLE:
    missing = []
    if not K2_AVAILABLE:
        missing.append("k2")
    if not LIS_AVAILABLE:
        missing.append("lis")
    test_results["Test 9"] = "‚è≠Ô∏è SKIPPED"
    print(f"‚è≠Ô∏è Test 9 SKIPPED - Missing dependencies: {', '.join(missing)}")
else:
    try:
        from alignment import WFSTAligner, AlignmentConfig
        
        config = AlignmentConfig(
            backend="wfst",
            segment_size=15.0,
            overlap=2.0,
            skip_penalty=-0.5,
            return_penalty=-18.0,
        )
        
        aligner = WFSTAligner(config)
        
        print(f"\nüîß WFST Aligner:")
        print(f"   ‚Ä¢ Backend name: {aligner.name}")
        print(f"   ‚Ä¢ Config segment_size: {config.segment_size}s")
        print(f"   ‚Ä¢ Config skip_penalty: {config.skip_penalty}")
        
        print(f"\nüìù To use WFST aligner:")
        print(f"   from labeling_utils import load_model")
        print(f"   from alignment import align")
        print(f"   ")
        print(f"   model = load_model('mms-fa')")
        print(f"   result = align(waveform, text, model_backend=model)")
        print(f"   ")
        print(f"   for idx, word in result.word_alignments.items():")
        print(f"       print(f'{{word.word}}: {{word.start_seconds:.2f}}s')")
        
        test_results["Test 9"] = "‚úÖ PASSED"
        print(f"\n‚úÖ Test 9 PASSED - WFST Aligner class works")
    except Exception as e:
        test_results["Test 9"] = "‚ùå FAILED"
        print(f"\n‚ùå Test 9 FAILED: {e}")
        import traceback
        traceback.print_exc()

## üìã Test Summary

In [None]:
print("=" * 60)
print("üìã TEST RESULTS SUMMARY")
print("=" * 60)

# Display test results
print("\n" + "-" * 40)
for test_name, result in test_results.items():
    print(f"  {result}  {test_name}")
print("-" * 40)

# Count results
passed = sum(1 for r in test_results.values() if "‚úÖ" in r)
failed = sum(1 for r in test_results.values() if "‚ùå" in r)
skipped = sum(1 for r in test_results.values() if "‚è≠Ô∏è" in r)
warning = sum(1 for r in test_results.values() if "‚ö†Ô∏è" in r)
total = len(test_results)

print(f"\n  Total: {total} tests")
print(f"  ‚úÖ Passed:  {passed}")
if warning > 0:
    print(f"  ‚ö†Ô∏è Warning: {warning}")
if skipped > 0:
    print(f"  ‚è≠Ô∏è Skipped: {skipped}")
if failed > 0:
    print(f"  ‚ùå Failed:  {failed}")

print("\n" + "=" * 60)
if failed == 0:
    print("üéâ All tests passed (or skipped due to optional dependencies)!")
else:
    print(f"‚ö†Ô∏è {failed} test(s) failed - please check above for details")
print("=" * 60)

print("\nüì¶ To enable all tests, install:")
print("   pip install k2 -f https://k2-fsa.github.io/k2/cpu.html")
print("   pip install git+https://github.com/huangruizhe/lis.git")

print("\nüèóÔ∏è Architecture note:")
print("   The alignment module uses unified frontends:")
print("   ‚Ä¢ text_frontend: TokenizerInterface, CharTokenizer, create_tokenizer_from_labels")
print("   ‚Ä¢ audio_frontend: segment_waveform, AudioSegment, SegmentationResult")
print("   This eliminates duplicate code and provides a consistent API.")

## Test 11: Alignment Accuracy Test (with Ground Truth)

In [None]:
print("=" * 60)
print("Test 11: Alignment Accuracy Test (with Ground Truth)")
print("=" * 60)

# Skip if dependencies not available
if not K2_AVAILABLE:
    test_results["Test 11"] = "‚è≠Ô∏è SKIPPED"
    print("‚è≠Ô∏è Test 11 SKIPPED - k2 not available")
else:
    try:
        import torchaudio
        from IPython.display import Audio, display
        
        # =================================================================
        # Ground Truth Data (from MMS-FA CTC alignment)
        # =================================================================
        # Transcript: "I HAD THAT CURIOSITY BESIDE ME AT THIS MOMENT"
        # Frame rate: 50fps (20ms per frame)
        
        GROUND_TRUTH_WORDS = [
            {"word": "I", "start": 31, "end": 35, "score": 0.78},
            {"word": "HAD", "start": 37, "end": 44, "score": 0.84},
            {"word": "THAT", "start": 45, "end": 53, "score": 0.52},
            {"word": "CURIOSITY", "start": 56, "end": 92, "score": 0.89},
            {"word": "BESIDE", "start": 95, "end": 116, "score": 0.94},
            {"word": "ME", "start": 118, "end": 124, "score": 0.67},
            {"word": "AT", "start": 126, "end": 129, "score": 0.66},
            {"word": "THIS", "start": 131, "end": 139, "score": 0.70},
            {"word": "MOMENT", "start": 143, "end": 157, "score": 0.88},
        ]
        
        GROUND_TRUTH_CHARS = [
            {"char": "|", "start": 0, "end": 31, "score": 1.00},
            {"char": "I", "start": 31, "end": 35, "score": 0.78},
            {"char": "|", "start": 35, "end": 37, "score": 0.80},
            {"char": "H", "start": 37, "end": 39, "score": 1.00},
            {"char": "A", "start": 39, "end": 41, "score": 0.96},
            {"char": "D", "start": 41, "end": 44, "score": 0.65},
            {"char": "|", "start": 44, "end": 45, "score": 1.00},
            {"char": "T", "start": 45, "end": 47, "score": 0.55},
            {"char": "H", "start": 47, "end": 49, "score": 1.00},
            {"char": "A", "start": 49, "end": 52, "score": 0.03},
            {"char": "T", "start": 52, "end": 53, "score": 1.00},
            {"char": "|", "start": 53, "end": 56, "score": 1.00},
            {"char": "C", "start": 56, "end": 61, "score": 0.97},
            {"char": "U", "start": 61, "end": 63, "score": 1.00},
            {"char": "R", "start": 63, "end": 67, "score": 0.75},
            {"char": "I", "start": 67, "end": 75, "score": 0.88},
            {"char": "O", "start": 75, "end": 79, "score": 0.99},
            {"char": "S", "start": 79, "end": 83, "score": 1.00},
            {"char": "I", "start": 83, "end": 86, "score": 0.89},
            {"char": "T", "start": 86, "end": 90, "score": 0.78},
            {"char": "Y", "start": 90, "end": 92, "score": 0.70},
            {"char": "|", "start": 92, "end": 95, "score": 0.66},
            {"char": "B", "start": 95, "end": 98, "score": 1.00},
            {"char": "E", "start": 98, "end": 102, "score": 1.00},
            {"char": "S", "start": 102, "end": 109, "score": 1.00},
            {"char": "I", "start": 109, "end": 111, "score": 1.00},
            {"char": "D", "start": 111, "end": 113, "score": 0.93},
            {"char": "E", "start": 113, "end": 116, "score": 0.66},
            {"char": "|", "start": 116, "end": 118, "score": 1.00},
            {"char": "M", "start": 118, "end": 121, "score": 0.67},
            {"char": "E", "start": 121, "end": 124, "score": 0.67},
            {"char": "|", "start": 124, "end": 126, "score": 0.49},
            {"char": "A", "start": 126, "end": 127, "score": 1.00},
            {"char": "T", "start": 127, "end": 129, "score": 0.50},
            {"char": "|", "start": 129, "end": 131, "score": 0.51},
            {"char": "T", "start": 131, "end": 132, "score": 1.00},
            {"char": "H", "start": 132, "end": 134, "score": 1.00},
            {"char": "I", "start": 134, "end": 136, "score": 0.75},
            {"char": "S", "start": 136, "end": 139, "score": 0.36},
            {"char": "|", "start": 139, "end": 143, "score": 0.50},
            {"char": "M", "start": 143, "end": 146, "score": 1.00},
            {"char": "O", "start": 146, "end": 149, "score": 1.00},
            {"char": "M", "start": 149, "end": 152, "score": 1.00},
            {"char": "E", "start": 152, "end": 153, "score": 1.00},
            {"char": "N", "start": 153, "end": 155, "score": 0.66},
            {"char": "T", "start": 155, "end": 157, "score": 0.51},
            {"char": "|", "start": 157, "end": 169, "score": 0.96},
        ]
        
        # Transcript text
        TRANSCRIPT = "I HAD THAT CURIOSITY BESIDE ME AT THIS MOMENT"
        
        # Frame parameters
        FRAME_RATE = 50  # frames per second (20ms per frame)
        SAMPLE_RATE = 16000
        
        print("\nüìã Ground Truth Data:")
        print(f"   ‚Ä¢ Transcript: '{TRANSCRIPT}'")
        print(f"   ‚Ä¢ Frame rate: {FRAME_RATE} fps (20ms/frame)")
        print(f"   ‚Ä¢ Words: {len(GROUND_TRUTH_WORDS)}")
        print(f"   ‚Ä¢ Characters: {len(GROUND_TRUTH_CHARS)}")
        
        print("\nüìù Word-level ground truth:")
        for w in GROUND_TRUTH_WORDS:
            start_sec = w['start'] / FRAME_RATE
            end_sec = w['end'] / FRAME_RATE
            print(f"   {w['word']:12s} ({w['score']:.2f}): [{w['start']:4d}, {w['end']:4d}) = [{start_sec:.2f}s, {end_sec:.2f}s)")
        
        test_results["Test 11"] = "‚úÖ PASSED"
        print(f"\n‚úÖ Test 11 PASSED - Ground truth data loaded")
        
    except Exception as e:
        test_results["Test 11"] = "‚ùå FAILED"
        print(f"\n‚ùå Test 11 FAILED: {e}")
        import traceback
        traceback.print_exc()

## Test 12: Run WFST Alignment on Sample Audio

In [None]:
print("=" * 60)
print("Test 12: Run WFST Alignment on Sample Audio")
print("=" * 60)

# Skip if dependencies not available
if not K2_AVAILABLE or not LIS_AVAILABLE:
    missing = []
    if not K2_AVAILABLE:
        missing.append("k2")
    if not LIS_AVAILABLE:
        missing.append("lis")
    test_results["Test 12"] = "‚è≠Ô∏è SKIPPED"
    print(f"‚è≠Ô∏è Test 12 SKIPPED - Missing dependencies: {', '.join(missing)}")
else:
    try:
        import torchaudio
        from alignment import WFSTAligner, AlignmentConfig, SegmentAlignmentResult
        
        # =================================================================
        # Load Sample Audio
        # =================================================================
        # We need a sample audio file. Try to download or use existing.
        
        print("\nüéµ Loading sample audio...")
        
        # Option 1: Use torchaudio's built-in sample (LibriSpeech)
        try:
            # Try to load from examples folder first
            example_path = "../examples/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
            waveform, sr = torchaudio.load(example_path)
            print(f"   ‚Ä¢ Loaded from: {example_path}")
        except:
            # Download LibriSpeech sample
            SPEECH_URL = "https://pytorch-tutorial-assets.s3.amazonaws.com/VOiCES_devkit/source-16k/train/sp0307/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav"
            print(f"   ‚Ä¢ Downloading sample audio...")
            waveform, sr = torchaudio.load(SPEECH_URL)
            print(f"   ‚Ä¢ Downloaded from PyTorch assets")
        
        # Resample if needed
        if sr != 16000:
            waveform = torchaudio.functional.resample(waveform, sr, 16000)
            sr = 16000
        
        # Use only first channel if stereo
        if waveform.size(0) > 1:
            waveform = waveform[0:1]
        
        duration_sec = waveform.size(1) / sr
        print(f"   ‚Ä¢ Sample rate: {sr}")
        print(f"   ‚Ä¢ Shape: {waveform.shape}")
        print(f"   ‚Ä¢ Duration: {duration_sec:.2f}s")
        
        # The transcript for this sample
        # Note: This is a short sample, so we're using a short transcript
        # that matches approximately (you may need to adjust based on actual content)
        
        # For this VOiCES sample, the transcript is:
        # "I HAD THAT CURIOSITY BESIDE ME AT THIS MOMENT"
        
        print(f"\nüìù Transcript: '{TRANSCRIPT}'")
        
        # =================================================================
        # Load MMS-FA Model
        # =================================================================
        print("\nüîß Loading MMS-FA model...")
        
        try:
            from labeling_utils import load_model
            model = load_model("mms-fa")
            print("   ‚Ä¢ Model loaded: mms-fa")
        except ImportError:
            print("   ‚ö†Ô∏è labeling_utils not available")
            print("   Trying torchaudio bundle directly...")
            
            # Fallback: use torchaudio bundle directly
            bundle = torchaudio.pipelines.MMS_FA
            model = bundle.get_model()
            model = model.to("cpu")
            
            # Create a mock model backend
            class MockModelBackend:
                def __init__(self, model, bundle):
                    self._model = model
                    self._bundle = bundle
                    
                def get_emissions(self, waveforms, lengths):
                    with torch.inference_mode():
                        emissions, emission_lengths = self._model(waveforms.squeeze(-1))
                    return emissions, emission_lengths
                
                def get_vocab_info(self):
                    class VocabInfo:
                        labels = tuple(bundle.get_labels())
                        blank_token = '-'
                        unk_token = '*'
                        blank_id = labels.index(blank_token) if blank_token in labels else 0
                        unk_id = labels.index(unk_token) if unk_token in labels else None
                    return VocabInfo()
            
            model = MockModelBackend(model, bundle)
            print("   ‚Ä¢ Model loaded: torchaudio MMS_FA bundle")
        
        # =================================================================
        # Run WFST Alignment
        # =================================================================
        print("\nüîß Running WFST alignment...")
        
        config = AlignmentConfig(
            backend="wfst",
            segment_size=15.0,  # Short segment for this test
            overlap=2.0,
            skip_penalty=-0.5,
            return_penalty=-18.0,
        )
        
        aligner = WFSTAligner(config)
        aligner.set_model(model)
        
        # Run alignment
        result = aligner.align(waveform.squeeze(0), TRANSCRIPT)
        
        print(f"\nüìä Alignment Results:")
        print(f"   ‚Ä¢ Aligned words: {result.num_aligned_words}")
        print(f"   ‚Ä¢ Unaligned regions: {result.unaligned_indices}")
        
        # Store for comparison
        aligned_words = result.word_alignments
        
        print(f"\nüìù Word-level alignment results:")
        for idx, word in sorted(aligned_words.items()):
            start_frame = int(word.start_time)
            end_frame = int(word.end_time) if word.end_time else start_frame + 10
            start_sec = start_frame / FRAME_RATE
            end_sec = end_frame / FRAME_RATE if word.end_time else "?"
            print(f"   [{idx:2d}] {str(word.word):12s}: frame [{start_frame:4f}, {end_frame:4f}) = [{start_sec:.2f}s, {end_sec}s)")
        
        test_results["Test 12"] = "‚úÖ PASSED"
        print(f"\n‚úÖ Test 12 PASSED - WFST alignment completed")
        
    except Exception as e:
        test_results["Test 12"] = "‚ùå FAILED"
        print(f"\n‚ùå Test 12 FAILED: {e}")
        import traceback
        traceback.print_exc()

## Test 13: Alignment Accuracy Comparison

In [None]:
print("=" * 60)
print("Test 13: Alignment Accuracy Comparison")
print("=" * 60)

# Skip if Test 12 didn't run
if "Test 12" not in test_results or "PASSED" not in test_results.get("Test 12", ""):
    test_results["Test 13"] = "‚è≠Ô∏è SKIPPED"
    print("‚è≠Ô∏è Test 13 SKIPPED - Test 12 (alignment) did not pass")
else:
    try:
        print("\nüìä Comparing alignment results to ground truth...")
        
        # =================================================================
        # Compute Accuracy Metrics
        # =================================================================
        
        def compute_frame_error(pred_start, pred_end, gt_start, gt_end):
            """Compute frame-level error between prediction and ground truth."""
            start_error = abs(pred_start - gt_start)
            end_error = abs(pred_end - gt_end) if pred_end and gt_end else 0
            return start_error, end_error
        
        def compute_iou(pred_start, pred_end, gt_start, gt_end):
            """Compute Intersection over Union for alignment boundaries."""
            if pred_end is None:
                pred_end = pred_start + 10  # Estimate
            
            intersection_start = max(pred_start, gt_start)
            intersection_end = min(pred_end, gt_end)
            intersection = max(0, intersection_end - intersection_start)
            
            union_start = min(pred_start, gt_start)
            union_end = max(pred_end, gt_end)
            union = union_end - union_start
            
            return intersection / union if union > 0 else 0
        
        print("\nüìù Word-by-word comparison:")
        print("-" * 80)
        print(f"{'Word':<12} {'GT Start':<10} {'Pred Start':<12} {'Œî Start':<10} {'IoU':<8}")
        print("-" * 80)
        
        total_start_error = 0
        total_iou = 0
        matched_words = 0
        
        for gt_word in GROUND_TRUTH_WORDS:
            word = gt_word["word"]
            gt_start = gt_word["start"]
            gt_end = gt_word["end"]
            
            # Find matching word in predictions
            pred_word = None
            for idx, aligned in aligned_words.items():
                if aligned.word and aligned.word.upper() == word.upper():
                    pred_word = aligned
                    break
            
            if pred_word:
                pred_start = pred_word.start_time
                pred_end = pred_word.end_time if pred_word.end_time else pred_start + (gt_end - gt_start)
                
                start_err, end_err = compute_frame_error(pred_start, pred_end, gt_start, gt_end)
                iou = compute_iou(pred_start, pred_end, gt_start, gt_end)
                
                total_start_error += start_err
                total_iou += iou
                matched_words += 1
                
                status = "‚úÖ" if start_err <= 5 else ("‚ö†Ô∏è" if start_err <= 10 else "‚ùå")
                print(f"{word:<12} {gt_start:<10} {pred_start:<12} {start_err:<10} {iou:.2f}     {status}")
            else:
                print(f"{word:<12} {gt_start:<10} {'N/A':<12} {'N/A':<10} {'N/A':<8} ‚ùå")
        
        print("-" * 80)
        
        # =================================================================
        # Summary Statistics
        # =================================================================
        if matched_words > 0:
            avg_start_error = total_start_error / matched_words
            avg_iou = total_iou / matched_words
            
            print(f"\nüìà Accuracy Summary:")
            print(f"   ‚Ä¢ Matched words: {matched_words}/{len(GROUND_TRUTH_WORDS)}")
            print(f"   ‚Ä¢ Avg start frame error: {avg_start_error:.1f} frames ({avg_start_error * 20:.0f}ms)")
            print(f"   ‚Ä¢ Avg IoU: {avg_iou:.2%}")
            
            # Thresholds for pass/fail
            if avg_start_error <= 5 and avg_iou >= 0.7:
                print(f"\n‚úÖ Alignment accuracy: EXCELLENT")
            elif avg_start_error <= 10 and avg_iou >= 0.5:
                print(f"\n‚ö†Ô∏è Alignment accuracy: ACCEPTABLE")
            else:
                print(f"\n‚ùå Alignment accuracy: NEEDS IMPROVEMENT")
        
        test_results["Test 13"] = "‚úÖ PASSED"
        print(f"\n‚úÖ Test 13 PASSED - Accuracy comparison complete")
        
    except Exception as e:
        test_results["Test 13"] = "‚ùå FAILED"
        print(f"\n‚ùå Test 13 FAILED: {e}")
        import traceback
        traceback.print_exc()

## Test 14: Listening Test (Audio Preview)

In [None]:
print("=" * 60)
print("Test 14: Listening Test (Audio Preview)")
print("=" * 60)

# Skip if Test 12 didn't run
if "Test 12" not in test_results or "PASSED" not in test_results.get("Test 12", ""):
    test_results["Test 14"] = "‚è≠Ô∏è SKIPPED"
    print("‚è≠Ô∏è Test 14 SKIPPED - Test 12 (alignment) did not pass")
else:
    try:
        from IPython.display import Audio, display, HTML
        
        print("\nüéß Audio Preview: Ground Truth vs Prediction")
        print("   Following torchaudio's forced_alignment_tutorial.py pattern")
        
        # Ratio to convert frames to samples
        # Frame rate: 50fps, Sample rate: 16000
        # Samples per frame = 16000 / 50 = 320
        SAMPLES_PER_FRAME = sr // FRAME_RATE
        
        def get_audio_segment(start_frame, end_frame, padding_frames=2):
            """Extract audio segment by frame indices."""
            start_frame = max(0, int(start_frame) - padding_frames)
            end_frame = int(end_frame) + padding_frames
            x0 = start_frame * SAMPLES_PER_FRAME
            x1 = min(end_frame * SAMPLES_PER_FRAME, waveform.size(1))
            return waveform[:, x0:x1]
        
        # Show comparison for each word
        print("\n" + "=" * 70)
        print("Comparing Ground Truth vs Prediction for each word:")
        print("=" * 70)
        
        for gt_word in GROUND_TRUTH_WORDS:
            word = gt_word["word"]
            gt_start = gt_word["start"]
            gt_end = gt_word["end"]
            
            # Find matching prediction
            pred_word = None
            for idx, aligned in aligned_words.items():
                if aligned.word and aligned.word.upper() == word.upper():
                    pred_word = aligned
                    break
            
            print(f"\n{'='*70}")
            display(HTML(f"<h3>{word}</h3>"))
            
            # Ground Truth audio
            gt_audio = get_audio_segment(gt_start, gt_end)
            gt_start_sec = gt_start / FRAME_RATE
            gt_end_sec = gt_end / FRAME_RATE
            print(f"üéØ Ground Truth: frames [{gt_start}, {gt_end}) = [{gt_start_sec:.3f}s - {gt_end_sec:.3f}s]")
            display(Audio(gt_audio.numpy(), rate=sr))
            
            # Prediction audio
            if pred_word:
                pred_start = int(pred_word.start_time)
                pred_end = int(pred_word.end_time) if pred_word.end_time else pred_start + (gt_end - gt_start)
                pred_audio = get_audio_segment(pred_start, pred_end)
                pred_start_sec = pred_start / FRAME_RATE
                pred_end_sec = pred_end / FRAME_RATE
                
                delta = abs(pred_start - gt_start)
                status = "‚úÖ" if delta <= 5 else ("‚ö†Ô∏è" if delta <= 10 else "‚ùå")
                
                print(f"üîÆ Prediction:   frames [{pred_start}, {pred_end}) = [{pred_start_sec:.3f}s - {pred_end_sec:.3f}s]  Œî={delta} frames {status}")
                display(Audio(pred_audio.numpy(), rate=sr))
            else:
                print(f"üîÆ Prediction:   ‚ùå NOT FOUND")
        
        print(f"\n{'='*70}")
        test_results["Test 14"] = "‚úÖ PASSED"
        print(f"\n‚úÖ Test 14 PASSED - Audio preview complete")
        
    except Exception as e:
        test_results["Test 14"] = "‚ùå FAILED"
        print(f"\n‚ùå Test 14 FAILED: {e}")
        import traceback
        traceback.print_exc()

## Test 15: MFA Aligner Test

In [None]:
print("=" * 60)
print("Test 15: MFA Aligner Test")
print("=" * 60)

try:
    from alignment import MFAAligner, AlignmentConfig

    print("\nüîß MFA Aligner Configuration:")
    config = AlignmentConfig(backend="mfa", language="english_us_arpa")
    aligner = MFAAligner(config)

    print(f"   ‚Ä¢ Backend name: {aligner.name}")
    print(f"   ‚Ä¢ Acoustic model: {aligner.acoustic_model}")
    print(f"   ‚Ä¢ Dictionary: {aligner.dictionary}")

    # Check if MFA is available
    mfa_available = aligner._check_mfa_available()
    print(f"\nüì° MFA CLI available: {'‚úÖ' if mfa_available else '‚ùå'}")

    if not mfa_available:
        # Skip test if MFA not installed
        print("\n‚è≠Ô∏è MFA not installed - skipping alignment test")
        print("\nüì¶ To install MFA:")
        print("   conda install -c conda-forge montreal-forced-aligner")
        print("   # Or: pip install montreal-forced-aligner")
        test_results["Test 15"] = "‚è≠Ô∏è SKIPPED (MFA not installed)"
        print(f"\n‚è≠Ô∏è Test 15 SKIPPED - MFA not available")
    else:
        # Try to run alignment on sample audio
        print("\nüîÑ Running MFA alignment on sample audio...")

        # Load sample audio (reuse from Test 12)
        if 'waveform' in dir() and 'TRANSCRIPT' in dir():
            result = aligner.align(waveform.squeeze(0), TRANSCRIPT)

            print(f"\nüìä MFA Alignment Results:")
            print(f"   ‚Ä¢ Aligned words: {result.num_aligned_words}")
            print(f"   ‚Ä¢ Backend: {result.metadata.get('backend', 'N/A')}")

            if result.word_alignments:
                print("\nüìù Word-level results:")
                for idx, word in sorted(result.word_alignments.items())[:5]:
                    print(f"   [{idx}] {word.word}: {word.start_time} - {word.end_time}")
                if len(result.word_alignments) > 5:
                    print(f"   ... and {len(result.word_alignments) - 5} more")

            test_results["Test 15"] = "‚úÖ PASSED"
            print(f"\n‚úÖ Test 15 PASSED - MFA alignment completed")
        else:
            print("   ‚ö†Ô∏è Sample audio not available (run Test 12 first)")
            test_results["Test 15"] = "‚è≠Ô∏è SKIPPED (no audio)"
            print(f"\n‚è≠Ô∏è Test 15 SKIPPED - No sample audio")

except Exception as e:
    test_results["Test 15"] = "‚ùå FAILED"
    print(f"\n‚ùå Test 15 FAILED: {e}")
    import traceback
    traceback.print_exc()

## Test 16: Gentle Aligner Test

In [None]:
print("=" * 60)
print("Test 16: Gentle Aligner Test")
print("=" * 60)

try:
    from alignment import GentleAligner, AlignmentConfig

    print("\nüîß Gentle Aligner Configuration:")
    config = AlignmentConfig(backend="gentle")
    aligner = GentleAligner(config)

    print(f"   ‚Ä¢ Backend name: {aligner.name}")
    print(f"   ‚Ä¢ Server URL: {aligner.server_url}")
    print(f"   ‚Ä¢ Supported languages: {aligner.SUPPORTED_LANGUAGES}")

    # Check availability
    python_available = aligner._check_gentle_python()
    server_available = aligner._check_gentle_server()

    print(f"\nüì° Availability:")
    print(f"   ‚Ä¢ Python API: {'‚úÖ' if python_available else '‚ùå not installed'}")
    print(f"   ‚Ä¢ Server ({aligner.server_url}): {'‚úÖ' if server_available else '‚ùå not running'}")

    if not python_available and not server_available:
        # Skip test if Gentle not available
        print("\n‚è≠Ô∏è Gentle not available - skipping alignment test")
        print("\nüì¶ To install Gentle:")
        print("   git clone https://github.com/lowerquality/gentle && cd gentle && ./install.sh")
        print("   Or start server: docker run -p 8765:8765 lowerquality/gentle")
        test_results["Test 16"] = "‚è≠Ô∏è SKIPPED (Gentle not installed)"
        print(f"\n‚è≠Ô∏è Test 16 SKIPPED - Gentle not available")
    else:
        # Try to run alignment on sample audio
        print("\nüîÑ Running Gentle alignment on sample audio...")

        # Load sample audio (reuse from Test 12)
        if 'waveform' in dir() and 'TRANSCRIPT' in dir():
            # Use 4 threads for parallelization
            result = aligner.align(waveform.squeeze(0), TRANSCRIPT, nthreads=4)

            print(f"\nüìä Gentle Alignment Results:")
            print(f"   ‚Ä¢ Aligned words: {result.num_aligned_words}")
            print(f"   ‚Ä¢ Unaligned regions: {result.unaligned_indices}")
            print(f"   ‚Ä¢ Backend: {result.metadata.get('backend', 'N/A')}")

            if result.word_alignments:
                print("\nüìù Word-level results:")
                for idx, word in sorted(result.word_alignments.items())[:5]:
                    end_str = f"{word.end_time}" if word.end_time else "?"
                    print(f"   [{idx}] {word.word}: {word.start_time} - {end_str}")
                if len(result.word_alignments) > 5:
                    print(f"   ... and {len(result.word_alignments) - 5} more")

            test_results["Test 16"] = "‚úÖ PASSED"
            print(f"\n‚úÖ Test 16 PASSED - Gentle alignment completed")
        else:
            print("   ‚ö†Ô∏è Sample audio not available (run Test 12 first)")
            test_results["Test 16"] = "‚è≠Ô∏è SKIPPED (no audio)"
            print(f"\n‚è≠Ô∏è Test 16 SKIPPED - No sample audio")

except Exception as e:
    test_results["Test 16"] = "‚ùå FAILED"
    print(f"\n‚ùå Test 16 FAILED: {e}")
    import traceback
    traceback.print_exc()

## Interactive: Listen to All Aligned Words

In [None]:
# =================================================================
# Interactive: Listen to all aligned words
# =================================================================
# Run this cell to hear each aligned word with audio players

if "Test 12" in test_results and "PASSED" in test_results.get("Test 12", ""):
    from IPython.display import Audio, display
    
    print("üéß Listen to each aligned word:")
    print("-" * 40)
    
    # Helper function to preview words (local implementation)
    def preview_aligned_words(waveform, word_alignments, sample_rate=16000, frame_rate=50, max_words=20):
        """Preview aligned words with audio players."""
        samples_per_frame = sample_rate // frame_rate
        
        for idx, word in sorted(word_alignments.items())[:max_words]:
            start_frame = int(word.start_time)
            end_frame = int(word.end_time) if word.end_time else start_frame + 10
            
            # Add padding
            start_frame = max(0, start_frame - 2)
            end_frame = end_frame + 2
            
            # Extract audio segment
            x0 = start_frame * samples_per_frame
            x1 = min(end_frame * samples_per_frame, waveform.size(-1))
            
            if waveform.dim() == 2:
                segment = waveform[:, x0:x1]
            else:
                segment = waveform[x0:x1]
            
            start_sec = start_frame / frame_rate
            end_sec = end_frame / frame_rate
            
            print(f"\n[{idx}] '{word.word}' ({start_sec:.2f}s - {end_sec:.2f}s)")
            display(Audio(segment.numpy(), rate=sample_rate))
    
    # Preview the aligned words
    preview_aligned_words(waveform, aligned_words, sample_rate=sr, frame_rate=FRAME_RATE)
    
    if len(aligned_words) > 20:
        print(f"\n... showing first 20 of {len(aligned_words)} words")
else:
    print("‚è≠Ô∏è Alignment not available - run Test 12 first")