# üß™ Model-Lab Multi-Platform Compatibility Test

**Date:** January 8, 2026  
**Purpose:** Test model-lab across CPU, MPS, GPU (CUDA), and TPU platforms  
**VS Code Support:** Works with VS Code Jupyter + Google Colab extension

---

## üöÄ Two Ways to Test

### Option 1: Local Testing with UV Venv (Recommended for MPS)

**For testing on your Mac with Apple Silicon:**

1. **Select the UV Kernel in VS Code**:

   - Click "Select Kernel" (top-right of notebook)
   - Choose `Model Lab (UV Python 3.12)` from the dropdown
   - This uses your UV venv with all dependencies installed

2. **Run Cell 2** (Install Dependencies) - Skip this, already installed in UV venv

3. **Run Cell 4** (MPS Test) - Tests all 4 models on Apple Silicon GPU

4. **Results** - Get real MPS performance metrics

**MPS bypass notes (what Cell 4 does automatically):**

- LFM processor is loaded on CPU first, then moved to MPS to avoid the vendor CUDA default crash. (See `docs/LFM_MPS_FIX_SUMMARY.md` for details.)
- SeamlessM4T loads with fallbacks: direct MPS ‚Üí low_cpu_mem_usage on CPU then move ‚Üí CPU-only if needed. The device used is printed in the log.

---

### Option 2: Cloud Testing with Google Colab (For GPU/TPU)

**For testing on Google's cloud infrastructure:**

1. **Click "Select Kernel"** (top-right corner of notebook)
2. **Choose "Colab"** from dropdown
3. **Select Runtime Type**:
   - **CPU** - Free (always available)
   - **GPU (T4)** - Free with limits (~12hr sessions)
   - **TPU (v2-8/v5e)** - Free with limits
4. **Run Cell 2** (Install Dependencies) - One time only for cloud environment
5. **Run the matching test cell** (3, 5, or 6)

---

## üìã What Gets Tested

### Models (All 4):

- ‚úÖ **Whisper** (tiny) - OpenAI's ASR model
- ‚úÖ **Faster-Whisper** (tiny) - Optimized CTranslate2 version
- ‚úÖ **LFM-2.5-Audio** (1.5B) - Liquid AI's foundation model
- ‚úÖ **SeamlessM4T** (v2-large) - Meta's multilingual ASR + translation

**üìù Model Size Notes:**

- **Test Files** (this notebook, colab_test.py): Hardcoded to "tiny" for fast testing
- **Production Code** (harness/registry.py): Configurable, defaults to "large-v3" for best accuracy
- Change model sizes in production via config: `{"model_name": "base|small|medium|large-v3"}`

### Test Cells:

| Cell  | Name                 | Best For                     | Kernel            |
| ----- | -------------------- | ---------------------------- | ----------------- |
| **2** | Install Dependencies | First time cloud testing     | Colab             |
| **3** | CPU Test             | Baseline on any system       | Any (Colab or UV) |
| **4** | MPS Test             | Apple Silicon GPU (M1/M2/M3) | **UV Kernel** ‚≠ê  |
| **5** | GPU Test             | NVIDIA CUDA (T4 on Colab)    | Colab GPU         |
| **6** | TPU Test             | Google TPU (Colab TPU)       | Colab TPU         |

---

## üéØ Quick Start Steps

**For MPS Testing (Local Mac):**

1. **Select Kernel** ‚Üí `Model Lab (UV Python 3.12)`
2. **Run Cell 4** ‚Üí MPS test begins
3. **Wait ~3-5 min** ‚Üí Results show

**For GPU Testing (Colab Cloud):**

1. **Select Kernel** ‚Üí `Colab`
2. **Select Runtime** ‚Üí GPU (T4)
3. **Run Cell 2** ‚Üí Install dependencies (one time)
4. **Run Cell 5** ‚Üí GPU test begins
5. **Wait ~2-3 min** ‚Üí Results show

**For TPU Testing (Colab Cloud):**

1. **Select Kernel** ‚Üí `Colab`
2. **Select Runtime** ‚Üí TPU (v5e)
3. **Run Cell 2** ‚Üí Install dependencies
4. **Run Cell 6** ‚Üí TPU test begins
5. **Wait ~2-3 min** ‚Üí Results show

---

## üìä Performance Expectations

| Platform | Device | Whisper | Faster-Whisper | RTF   |
| -------- | ------ | ------- | -------------- | ----- |
| **MPS**  | M3 GPU | ~3.2s   | ~2.8s          | 0.64x |
| **GPU**  | T4     | ~1.8s   | ~1.2s          | 0.24x |
| **CPU**  | Any    | ~14s    | ~10s           | 2.8x  |

**RTF = Real-Time Factor (lower = faster than real-time)**

---

**üëâ Start with Cell 4 (MPS) using UV Kernel**  
**üëâ Then try Cell 5 (GPU) using Colab kernel if you want cloud results**


In [None]:
# üîß Cell 2: Install Dependencies (Run First in Colab!)
# This cell is now runtime-aware to avoid breaking TPU runtimes by installing CUDA wheels.

import os, sys
from pathlib import Path

print("=" * 70)
print("üì¶ INSTALLING DEPENDENCIES (runtime-aware)")
print("=" * 70)


def _is_colab_tpu_runtime() -> bool:
    # Heuristics that are cheap and don't require importing torch/tensorflow.
    return bool(
        os.environ.get("COLAB_TPU_ADDR")
        or os.environ.get("TPU_NAME")
        or os.environ.get("XRT_TPU_CONFIG")
        or os.environ.get("TPU_ACCELERATOR_TYPE")
    )


def _has_nvidia_smi() -> bool:
    return Path("/usr/bin/nvidia-smi").exists() or Path("/bin/nvidia-smi").exists()


is_tpu = _is_colab_tpu_runtime()
is_gpu = _has_nvidia_smi()
print(f"\nüß≠ Runtime detection:")
print(f"  TPU runtime detected: {is_tpu}")
print(f"  GPU runtime detected: {is_gpu}")
print(f"  Python: {sys.version.split()[0]}")

# Core ML packages
if is_tpu:
    print("\n1Ô∏è‚É£ TPU runtime: installing PyTorch/XLA stack (avoid CUDA wheels)")
    # IMPORTANT: torch_xla wheels are tightly coupled to torch version.
    # Installing torch_xla[tpu] from the libtpu index will pull a compatible torch build.
    !pip install -q 'torch_xla[tpu]==2.9.0' -f https://storage.googleapis.com/libtpu-releases/index.html
    print("   ‚úÖ Installed torch_xla[tpu]==2.9.0 (and compatible torch deps)")
    print(
        "   ‚ö†Ô∏è IMPORTANT: Restart the runtime now (Runtime ‚Üí Restart), then re-run Cell 2 and proceed."
    )
elif is_gpu:
    print("\n1Ô∏è‚É£ GPU runtime: installing PyTorch with CUDA support...")
    !pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
    print("   ‚úÖ Installed CUDA PyTorch")
else:
    print("\n1Ô∏è‚É£ CPU runtime: installing standard PyTorch (CPU)...")
    !pip install -q torch torchvision torchaudio
    print("   ‚úÖ Installed CPU PyTorch")

print("\n2Ô∏è‚É£ Installing ASR models...")
!pip install -q transformers openai-whisper faster-whisper

print("\n3Ô∏è‚É£ Installing LFM-2.5-Audio...")
!pip install -q liquid-audio

print("\n4Ô∏è‚É£ SeamlessM4T...")
print("   ‚ÑπÔ∏è SeamlessM4T is included in transformers (already installed above)")

# TensorFlow is not required for these PyTorch-based tests; avoid installing it automatically.
print("\n5Ô∏è‚É£ TensorFlow...")
print("   ‚ÑπÔ∏è Skipping TensorFlow installs by default (not required for this notebook).")

print("\n" + "=" * 70)
print("‚úÖ DEPENDENCIES INSTALLED")
print("=" * 70)
print("\nüìã Next: run ONE test cell matching your runtime:")
print("   ‚Ä¢ Cell 3: CPU")
print("   ‚Ä¢ Cell 4: MPS (local Mac)")
print("   ‚Ä¢ Cell 5: GPU")
print("   ‚Ä¢ Cell 6: TPU")

üì¶ INSTALLING DEPENDENCIES

1Ô∏è‚É£ Installing PyTorch with CUDA support...
2Ô∏è‚É£ Installing ASR models...
3Ô∏è‚É£ Installing LFM-2.5-Audio...
4Ô∏è‚É£ Installing SeamlessM4T...
   ‚ÑπÔ∏è SeamlessM4T is included in transformers (already installed above)
5Ô∏è‚É£ Handling TensorFlow conflict with torch-xla...
   ‚úÖ Replaced tensorflow with tensorflow-cpu
6Ô∏è‚É£ Installing TPU support (for TPU runtime)...
   ‚úÖ TPU support installed

‚úÖ ALL DEPENDENCIES INSTALLED

üìã Next: Run ONE test cell below matching your runtime:
   ‚Ä¢ Cell 3: CPU (any runtime)
   ‚Ä¢ Cell 4: MPS (local Mac)
   ‚Ä¢ Cell 5: GPU (Colab GPU)
   ‚Ä¢ Cell 6: TPU (Colab TPU)


In [None]:
# üîµ Cell 3: CPU Performance Test
# Run this cell when using CPU runtime (works on any platform)

import torch
import sys
import numpy as np
import time
import copy
import warnings
import os

# Suppress torch_xla warning on CPU (torch_xla only needed on TPU)
os.environ["PJRT_DEVICE"] = "cpu"
warnings.filterwarnings("ignore", message=".*tensorflow.*conflict.*torch-xla.*")
warnings.filterwarnings("ignore", message=".*Defaulting to PJRT_DEVICE.*")
warnings.filterwarnings("ignore", message=".*Some weights of.*were not initialized.*")
warnings.filterwarnings("ignore", message=".*Both `max_new_tokens`.*")

print("=" * 70)
print("üîµ CPU PERFORMANCE TEST")
print("=" * 70)

# Hardware detection
print(f"\nüìä System Info:")
print(f"  Python: {sys.version.split()[0]}")
print(f"  PyTorch: {torch.__version__}")
print(f"  CUDA Available: {torch.cuda.is_available()}")
print(f"  Device: CPU")

# Load models on CPU
print(f"\nü§ñ Loading Models on CPU...")
import whisper
from faster_whisper import WhisperModel

start_load = time.time()
whisper_model_cpu = whisper.load_model("tiny", device="cpu")
faster_model_cpu = WhisperModel("tiny", device="cpu")

# Try LFM with detailed error reporting
has_lfm = False
lfm_model_cpu = None
lfm_processor_cpu = None
lfm_error = None
try:
    from liquid_audio import LFM2AudioModel, LFM2AudioProcessor

    lfm_model_cpu = LFM2AudioModel.from_pretrained("LiquidAI/LFM2.5-Audio-1.5B", device="cpu")
    lfm_processor_cpu = LFM2AudioProcessor.from_pretrained(
        "LiquidAI/LFM2.5-Audio-1.5B", device="cpu"
    )
    has_lfm = True
except Exception as e:
    lfm_error = str(e)
    print(f"‚ö†Ô∏è  LFM failed: {lfm_error[:100]}...")

# Try SeamlessM4T with detailed error reporting
has_seamless = False
seamless_model = None
seamless_processor = None
seamless_error = None
try:
    from transformers import SeamlessM4TModel, SeamlessM4TProcessor

    seamless_processor = SeamlessM4TProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
    seamless_model = SeamlessM4TModel.from_pretrained("facebook/seamless-m4t-v2-large")
    has_seamless = True
except Exception as e:
    seamless_error = str(e)
    print(f"‚ö†Ô∏è  SeamlessM4T failed: {seamless_error[:100]}...")

load_time = time.time() - start_load
model_count = 2 + (1 if has_lfm else 0) + (1 if has_seamless else 0)
print(f"‚úÖ {model_count} models loaded in {load_time:.1f}s")

# Create test audio (5 seconds)
print(f"\nüéµ Creating test audio (5s @ 16kHz)...")
sample_rate = 16000
test_audio_5s = np.random.randn(int(sample_rate * 5)).astype(np.float32)

# Benchmark
print(f"\nüìä Benchmarking on CPU...")
benchmarks = {}

# Whisper
try:
    start = time.time()
    result = whisper_model_cpu.transcribe(test_audio_5s, fp16=False)
    benchmarks["whisper_cpu"] = time.time() - start
    print(f"  ‚úÖ Whisper: {benchmarks['whisper_cpu']:.2f}s")
except Exception as e:
    print(f"  ‚ùå Whisper failed: {str(e)[:60]}...")

# Faster-Whisper
try:
    start = time.time()
    segments, info = faster_model_cpu.transcribe(test_audio_5s)
    list(segments)  # Force evaluation
    benchmarks["faster_whisper_cpu"] = time.time() - start
    print(f"  ‚úÖ Faster-Whisper: {benchmarks['faster_whisper_cpu']:.2f}s")
except Exception as e:
    print(f"  ‚ùå Faster-Whisper failed: {str(e)[:60]}...")

# LFM
if has_lfm and lfm_model_cpu:
    try:
        start = time.time()
        with torch.no_grad():
            # Use proper LFM inference API: generate_sequential with ChatState
            from liquid_audio import ChatState
            import torch

            # Create chat state for ASR
            chat = ChatState(lfm_processor_cpu)
            chat.new_turn("system")
            chat.add_text("Perform ASR.")
            chat.end_turn()

            # Convert audio to tensor and ensure 2D shape (channels, samples)
            if isinstance(test_audio_5s, np.ndarray):
                audio_tensor = torch.from_numpy(test_audio_5s).float()
            else:
                audio_tensor = test_audio_5s.float()

            if len(audio_tensor.shape) == 1:
                audio_tensor = audio_tensor.unsqueeze(0)  # Add channel dimension

            chat.new_turn("user")
            chat.add_audio(audio_tensor, sample_rate)
            chat.end_turn()

            chat.new_turn("assistant")

            # Generate transcription using generate_sequential
            text_tokens = []
            for token in lfm_model_cpu.generate_sequential(**chat, max_new_tokens=64):
                if token.numel() == 1:  # Text token
                    text_tokens.append(token)

            # Decode text
            if text_tokens:
                text_tensor = torch.stack(text_tokens, 1)
                transcription = lfm_processor_cpu.text.decode(text_tensor[0])
            else:
                transcription = ""

        benchmarks["lfm2_5_cpu"] = time.time() - start
        print(f"  ‚úÖ LFM-2.5-Audio: {benchmarks['lfm2_5_cpu']:.2f}s")
    except Exception as e:
        print(f"  ‚ùå LFM inference failed: {str(e)[:60]}...")

# SeamlessM4T
if has_seamless and seamless_model and seamless_processor:
    try:
        start = time.time()
        with torch.no_grad():
            inputs = seamless_processor(
                audios=test_audio_5s, sampling_rate=sample_rate, return_tensors="pt"
            )
            _ = seamless_model.generate(**inputs, tgt_lang="eng", max_length=100)
        benchmarks["seamlessm4t_cpu"] = time.time() - start
        print(f"  ‚úÖ SeamlessM4T: {benchmarks['seamlessm4t_cpu']:.2f}s")
    except Exception as e:
        print(f"  ‚ùå SeamlessM4T inference failed: {str(e)[:60]}...")

# Results
print(f"\n" + "=" * 70)
print(f"üìà CPU RESULTS:")
print(f"=" * 70)
for model_name, inference_time in sorted(benchmarks.items()):
    rtf = inference_time / 5.0
    status = "üü¢ Real-time" if rtf < 1.0 else "üü° Slower"
    print(f"  {model_name:25s}: {inference_time:6.2f}s (RTF: {rtf:.3f}x) {status}")

if benchmarks:
    fastest = min(benchmarks, key=benchmarks.get)
    print(f"\nüèÜ Fastest: {fastest.upper()}")
print(f"‚úÖ CPU test complete!")

# Export benchmarks for summary cell
benchmarks_cpu = copy.deepcopy(benchmarks)

üîµ CPU PERFORMANCE TEST

üìä System Info:
  Python: 3.12.12
  PyTorch: 2.9.0+cpu
  CUDA Available: False
  Device: CPU

ü§ñ Loading Models on CPU...


You are using a model of type seamless_m4t_v2 to instantiate a model of type seamless_m4t. This is not supported for all configurations of models and can yield errors.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Some weights of SeamlessM4TModel were not initialized from the model checkpoint at facebook/seamless-m4t-v2-large and are newly initialized: ['speech_encoder.encoder.layers.0.conv_module.batch_norm.bias', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.num_batches_tracked', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.running_mean', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.running_var', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.weight', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.bias', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.num_batches_tracked', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.running_mean', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.running_var', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.weight', 'speech_encoder.encoder.layers.10.conv_module.batch_norm.bias', 'speech_encoder.encoder.layers.10.conv_module.batch_norm.num_batches_tracked', 'speech_encoder.encod

‚úÖ 4 models loaded in 55.8s

üéµ Creating test audio (5s @ 16kHz)...

üìä Benchmarking on CPU...
  ‚úÖ Whisper: 1.97s
  ‚úÖ Faster-Whisper: 3.42s
  ‚úÖ LFM-2.5-Audio: 18.55s


  inputs = seamless_processor(audios=test_audio_5s, sampling_rate=sample_rate, return_tensors="pt")
Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=1024) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  ‚úÖ SeamlessM4T: 164.89s

üìà CPU RESULTS:
  faster_whisper_cpu       :   3.42s (RTF: 0.685x) üü¢ Real-time
  lfm2_5_cpu               :  18.55s (RTF: 3.709x) üü° Slower
  seamlessm4t_cpu          : 164.89s (RTF: 32.978x) üü° Slower
  whisper_cpu              :   1.97s (RTF: 0.395x) üü¢ Real-time

üèÜ Fastest: WHISPER_CPU
‚úÖ CPU test complete!


In [None]:
# üü¢ Cell 4: MPS (Apple Silicon) Performance Test
# Run this cell when using local Mac with M1/M2/M3 chip

import torch
import sys
import numpy as np
import time
import copy

print("=" * 70)
print("üü¢ MPS (APPLE SILICON) PERFORMANCE TEST")
print("=" * 70)

# Hardware detection
mps_available = torch.backends.mps.is_available()
print(f"\nüìä System Info:")
print(f"  Python: {sys.version.split()[0]}")
print(f"  PyTorch: {torch.__version__}")
print(f"  MPS Available: {mps_available}")
print(f"  Device: {'MPS (Apple GPU)' if mps_available else 'CPU (MPS not available)'}")

if not mps_available:
    print("\n" + "=" * 70)
    print("‚ö†Ô∏è  MPS NOT AVAILABLE")
    print("=" * 70)
    print("üí° Run this on Apple Silicon Mac (M1/M2/M3)")
    print("üí° Or use Cell 3 (CPU) or Cell 5 (GPU) instead")
else:
    # Load models with MPS
    print(f"\nü§ñ Loading Models...")
    import whisper
    from faster_whisper import WhisperModel

    start_load = time.time()
    # Whisper supports MPS
    whisper_model = whisper.load_model("tiny", device="mps")
    # Faster-Whisper doesn't support MPS, use CPU
    faster_model = WhisperModel("tiny", device="cpu")

    # LFM - Test with detailed error reporting and CPU-first processor bypass
    has_lfm = False
    lfm_model = None
    lfm_processor = None
    lfm_load_error = None
    try:
        from liquid_audio import LFM2AudioModel, LFM2AudioProcessor

        lfm_model = LFM2AudioModel.from_pretrained("LiquidAI/LFM2.5-Audio-1.5B", device="mps")
        try:
            lfm_processor = LFM2AudioProcessor.from_pretrained(
                "LiquidAI/LFM2.5-Audio-1.5B", device="cpu"
            )
            # Move processor to MPS if available; keep on CPU if it fails (processor is lightweight)
            try:
                lfm_processor = lfm_processor.to("mps")
            except Exception:
                print("   ‚ÑπÔ∏è LFM processor staying on CPU (move to MPS not required)")
        except Exception as e_proc:
            print(f"   ‚ö†Ô∏è LFM processor CPU-first load failed: {str(e_proc)[:200]}")
        has_lfm = True
    except Exception as e:
        lfm_load_error = str(e)
        print(f"‚ö†Ô∏è  LFM MPS loading failed: {lfm_load_error[:400]}")
        if "CUDA" in lfm_load_error:
            print("   ‚ÑπÔ∏è Vendor CUDA bug on MPS; skipping LFM inference here")

    # SeamlessM4T - Test with detailed error reporting
    has_seamless = False
    seamless_model = None
    seamless_processor = None
    seamless_error = None
    try:
        from transformers import SeamlessM4TModel, SeamlessM4TProcessor

        seamless_processor = SeamlessM4TProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
        seamless_model = SeamlessM4TModel.from_pretrained("facebook/seamless-m4t-v2-large").to(
            "mps"
        )
        has_seamless = True
    except Exception as e:
        seamless_error = str(e)
        print(f"‚ö†Ô∏è  SeamlessM4T initial load failed: {seamless_error[:400]}")
        # Fallback 1: load with low_cpu_mem_usage on CPU then move to MPS
        try:
            print("üîÅ Attempting fallback: load with low_cpu_mem_usage on CPU, then move to MPS...")
            from transformers import SeamlessM4TModel, SeamlessM4TProcessor

            seamless_processor = SeamlessM4TProcessor.from_pretrained(
                "facebook/seamless-m4t-v2-large"
            )
            seamless_model = SeamlessM4TModel.from_pretrained(
                "facebook/seamless-m4t-v2-large",
                low_cpu_mem_usage=True,
            )
            seamless_model.to("mps")
            has_seamless = True
            print("‚úÖ SeamlessM4T loaded via fallback and moved to MPS")
        except Exception as e_f:
            print(f"‚ùå Fallback load failed: {str(e_f)[:400]}")
            # Final fallback: load on CPU only (so it can at least be benchmarked on CPU)
            try:
                print("‚ö†Ô∏è Final fallback: loading SeamlessM4T on CPU (will run on CPU)")
                from transformers import SeamlessM4TModel, SeamlessM4TProcessor

                seamless_processor = SeamlessM4TProcessor.from_pretrained(
                    "facebook/seamless-m4t-v2-large"
                )
                seamless_model = SeamlessM4TModel.from_pretrained("facebook/seamless-m4t-v2-large")
                has_seamless = True
            except Exception as e_f2:
                print(f"‚ùå Final fallback failed‚ÄîSeamlessM4T unavailable: {str(e_f2)[:400]}")

    model_count = 2 + (1 if has_lfm else 0) + (1 if has_seamless else 0)
    print(f"‚úÖ {model_count} models loaded in {time.time() - start_load:.1f}s")
    print(f"   ‚Ä¢ Whisper: MPS ‚ö°")
    print(f"   ‚Ä¢ Faster-Whisper: CPU (no MPS support)")
    if has_lfm:
        proc_device = (
            "CPU"
            if lfm_processor is None
            else (lfm_processor.device.type.upper() if hasattr(lfm_processor, "device") else "CPU")
        )
        print(f"   ‚Ä¢ LFM-2.5-Audio: MPS ‚ö° (processor on {proc_device})")
    if has_seamless:
        device_note = "MPS" if next(seamless_model.parameters()).device.type == "mps" else "CPU"
        print(f"   ‚Ä¢ SeamlessM4T: {device_note} ‚ö°")

    # Create test audio
    print(f"\nüéµ Creating test audio (5s @ 16kHz)...")
    sample_rate = 16000
    test_audio_5s = np.random.randn(int(sample_rate * 5)).astype(np.float32)

    # Benchmark
    print(f"\nüìä Benchmarking...")
    benchmarks = {}

    # Whisper on MPS
    try:
        start = time.time()
        result = whisper_model.transcribe(test_audio_5s)
        benchmarks["whisper_mps"] = time.time() - start
        print(f"  ‚úÖ Whisper (MPS): {benchmarks['whisper_mps']:.2f}s")
    except Exception as e:
        print(f"  ‚ùå Whisper failed: {str(e)[:60]}...")

    # Faster-Whisper on CPU
    try:
        start = time.time()
        segments, info = faster_model.transcribe(test_audio_5s)
        list(segments)
        benchmarks["faster_whisper_cpu"] = time.time() - start
        print(f"  ‚úÖ Faster-Whisper (CPU): {benchmarks['faster_whisper_cpu']:.2f}s")
    except Exception as e:
        print(f"  ‚ùå Faster-Whisper failed: {str(e)[:60]}...")

    # LFM on MPS
    if has_lfm and lfm_model:
        try:
            start = time.time()
            with torch.no_grad():
                # Use proper LFM inference API: generate_sequential with ChatState
                from liquid_audio import ChatState
                import torch

                # Create chat state for ASR
                chat = ChatState(lfm_processor)
                chat.new_turn("system")
                chat.add_text("Perform ASR.")
                chat.end_turn()

                # Convert audio to tensor and ensure 2D shape (channels, samples)
                if isinstance(test_audio_5s, np.ndarray):
                    audio_tensor = torch.from_numpy(test_audio_5s).float()
                else:
                    audio_tensor = test_audio_5s.float()

                if len(audio_tensor.shape) == 1:
                    audio_tensor = audio_tensor.unsqueeze(0)  # Add channel dimension

                chat.new_turn("user")
                chat.add_audio(audio_tensor, sample_rate)
                chat.end_turn()

                chat.new_turn("assistant")

                # Generate transcription using generate_sequential
                text_tokens = []
                for token in lfm_model.generate_sequential(**chat, max_new_tokens=64):
                    if token.numel() == 1:  # Text token
                        text_tokens.append(token)

                # Decode text
                if text_tokens:
                    text_tensor = torch.stack(text_tokens, 1)
                    transcription = lfm_processor.text.decode(text_tensor[0])
                else:
                    transcription = ""

            benchmarks["lfm2_5_mps"] = time.time() - start
            print(f"  ‚úÖ LFM-2.5-Audio (MPS): {benchmarks['lfm2_5_mps']:.2f}s")
        except Exception as e:
            print(f"  ‚ùå LFM inference failed: {str(e)[:120]}...")

    # SeamlessM4T on MPS/CPU
    if has_seamless and seamless_model and seamless_processor:
        try:
            start = time.time()
            with torch.no_grad():
                inputs = seamless_processor(
                    audios=test_audio_5s, sampling_rate=sample_rate, return_tensors="pt"
                )
                inputs = {k: v.to(seamless_model.device) for k, v in inputs.items()}
                # SeamlessM4T uses tgt_lang parameter for language control
                _ = seamless_model.generate(**inputs, tgt_lang="eng", max_length=64)
            device_note = seamless_model.device.type.upper()
            benchmarks[f"seamlessm4t_{device_note.lower()}"] = time.time() - start
            print(
                f"  ‚úÖ SeamlessM4T ({device_note}): {benchmarks[f'seamlessm4t_{device_note.lower()}']:.2f}s"
            )
        except Exception as e:
            print(f"  ‚ùå SeamlessM4T inference failed: {str(e)[:120]}...")

    # Results
    print(f"\n" + "=" * 70)
    print(f"üìà MPS RESULTS:")
    print(f"=" * 70)
    for model_name, inference_time in sorted(benchmarks.items()):
        rtf = inference_time / 5.0
        device = "MPS" if "mps" in model_name else "CPU"
        status = "üü¢ Real-time" if rtf < 1.0 else "üü° Slower"
        print(f"  {model_name:25s} ({device}): {inference_time:6.2f}s (RTF: {rtf:.3f}x) {status}")

    if benchmarks:
        fastest = min(benchmarks, key=benchmarks.get)
        print(f"\nüèÜ Fastest: {fastest.upper()}")

    # Export benchmarks for summary cell
    benchmarks_mps = copy.deepcopy(benchmarks)

    print(f"‚úÖ MPS test complete!")

üü¢ MPS (APPLE SILICON) PERFORMANCE TEST

üìä System Info:
  Python: 3.12.10
  PyTorch: 2.9.1
  MPS Available: True
  Device: MPS (Apple GPU)

ü§ñ Loading Models...


You are using a model of type seamless_m4t_v2 to instantiate a model of type seamless_m4t. This is not supported for all configurations of models and can yield errors.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Some weights of SeamlessM4TModel were not initialized from the model checkpoint at facebook/seamless-m4t-v2-large and are newly initialized: ['speech_encoder.encoder.layers.0.conv_module.batch_norm.bias', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.num_batches_tracked', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.running_mean', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.running_var', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.weight', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.bias', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.num_batches_tracked', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.running_mean', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.running_var', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.weight', 'speech_encoder.encoder.layers.10.conv_module.batch_norm.bias', 'speech_encoder.encoder.layers.10.conv_module.batch_norm.num_batches_tracked', 'speech_encoder.encod

‚úÖ 4 models loaded in 47.6s
   ‚Ä¢ Whisper: MPS ‚ö°
   ‚Ä¢ Faster-Whisper: CPU (no MPS support)
   ‚Ä¢ LFM-2.5-Audio: MPS ‚ö° (processor on MPS)
   ‚Ä¢ SeamlessM4T: MPS ‚ö°

üéµ Creating test audio (5s @ 16kHz)...

üìä Benchmarking...
  ‚úÖ Whisper (MPS): 0.82s
  ‚úÖ Faster-Whisper (CPU): 10.69s
  ‚úÖ LFM-2.5-Audio (MPS): 2.57s


  inputs = seamless_processor(audios=test_audio_5s, sampling_rate=sample_rate, return_tensors="pt")
Both `max_new_tokens` (=256) and `max_length`(=64) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=1024) and `max_length`(=64) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  ‚úÖ SeamlessM4T (MPS): 26.40s

üìà MPS RESULTS:
  faster_whisper_cpu        (CPU):  10.69s (RTF: 2.139x) üü° Slower
  lfm2_5_mps                (MPS):   2.57s (RTF: 0.515x) üü¢ Real-time
  seamlessm4t_mps           (MPS):  26.40s (RTF: 5.280x) üü° Slower
  whisper_mps               (MPS):   0.82s (RTF: 0.165x) üü¢ Real-time

üèÜ Fastest: WHISPER_MPS
‚úÖ MPS test complete!


In [4]:
# üü° Cell 5: GPU (NVIDIA CUDA) Performance Test
# Run this cell when using Colab GPU runtime (T4/A100/V100)

import torch
import sys
import numpy as np
import time
import copy
import warnings
import os

# Suppress torch_xla warning on GPU (torch_xla only needed on TPU)
os.environ["PJRT_DEVICE"] = "cpu"
warnings.filterwarnings("ignore", message=".*tensorflow.*conflict.*torch-xla.*")
warnings.filterwarnings("ignore", message=".*Defaulting to PJRT_DEVICE.*")
warnings.filterwarnings("ignore", message=".*Some weights of.*were not initialized.*")
warnings.filterwarnings("ignore", message=".*Both `max_new_tokens`.*")

print("=" * 70)
print("üü° GPU (NVIDIA CUDA) PERFORMANCE TEST")
print("=" * 70)

# Hardware detection
cuda_available = torch.cuda.is_available()
print(f"\nüìä System Info:")
print(f"  Python: {sys.version.split()[0]}")
print(f"  PyTorch: {torch.__version__}")
print(f"  CUDA Available: {cuda_available}")

if cuda_available:
    print(f"  GPU: {torch.cuda.get_device_name()}")
    print(f"  GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")
    print(f"  CUDA Version: {torch.version.cuda}")
    print(f"  Device: GPU (CUDA)")
else:
    print(f"  Device: CPU (GPU not available)")

if not cuda_available:
    print("\n" + "=" * 70)
    print("‚ö†Ô∏è  GPU NOT AVAILABLE")
    print("=" * 70)
    print("üí° In VS Code: Select Kernel ‚Üí Colab ‚Üí GPU runtime")
    print("üí° Or use Cell 3 (CPU) or Cell 4 (MPS) instead")
else:
    # Load models on GPU
    print(f"\nü§ñ Loading Models on GPU...")
    import whisper
    from faster_whisper import WhisperModel

    start_load = time.time()
    whisper_model = whisper.load_model("tiny", device="cuda")
    faster_model = WhisperModel("tiny", device="cuda", compute_type="float16")

    # LFM with error handling
    has_lfm = False
    lfm_model = None
    lfm_processor = None
    lfm_device = None
    try:
        from liquid_audio import LFM2AudioModel, LFM2AudioProcessor

        try:
            # Load on CPU first, then move to CUDA (similar to SeamlessM4T)
            lfm_model = LFM2AudioModel.from_pretrained("LiquidAI/LFM2.5-Audio-1.5B", device="cpu")
            lfm_model = lfm_model.to("cuda")
            lfm_processor = LFM2AudioProcessor.from_pretrained(
                "LiquidAI/LFM2.5-Audio-1.5B", device="cpu"
            )
            # Move processor to CUDA to avoid CPU index/device mismatch; fall back to CPU if it fails
            try:
                lfm_processor = lfm_processor.to("cuda")
            except Exception:
                print("   ‚ÑπÔ∏è LFM processor staying on CPU (decode will move tokens to CPU)")
            has_lfm = True
            lfm_device = "GPU"
        except Exception as e:
            print(f"‚ö†Ô∏è LFM CUDA loading failed: {str(e)[:200]}")
            # Try CPU fallback for LFM
            try:
                lfm_model = LFM2AudioModel.from_pretrained(
                    "LiquidAI/LFM2.5-Audio-1.5B", device="cpu"
                )
                lfm_processor = LFM2AudioProcessor.from_pretrained(
                    "LiquidAI/LFM2.5-Audio-1.5B", device="cpu"
                )
                has_lfm = True
                lfm_device = "CPU"
            except:
                pass
    except ImportError:
        pass

    # SeamlessM4T with error handling
    has_seamless = False
    seamless_model = None
    seamless_processor = None
    try:
        from transformers import SeamlessM4TModel, SeamlessM4TProcessor

        seamless_processor = SeamlessM4TProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
        seamless_model = SeamlessM4TModel.from_pretrained("facebook/seamless-m4t-v2-large").to(
            "cuda"
        )
        has_seamless = True
    except Exception as e:
        pass

    model_count = 2 + (1 if has_lfm else 0) + (1 if has_seamless else 0)
    load_time = time.time() - start_load
    print(f"‚úÖ {model_count} models loaded in {load_time:.1f}s")
    print(f"   ‚Ä¢ Whisper: GPU ‚ö°")
    print(f"   ‚Ä¢ Faster-Whisper: GPU ‚ö°")
    if has_lfm:
        print(f"   ‚Ä¢ LFM-2.5-Audio: {lfm_device} {'‚ö°' if lfm_device == 'GPU' else '(CUDA bug)'}")
    if has_seamless:
        print(f"   ‚Ä¢ SeamlessM4T: GPU ‚ö°")

    # Create test audio
    print(f"\nüéµ Creating test audio (5s @ 16kHz)...")
    sample_rate = 16000
    test_audio_5s = np.random.randn(int(sample_rate * 5)).astype(np.float32)

    # Benchmark
    print(f"\nüìä Benchmarking...")
    benchmarks = {}

    # Whisper on GPU
    try:
        start = time.time()
        result = whisper_model.transcribe(test_audio_5s, fp16=True)
        benchmarks["whisper_gpu"] = time.time() - start
        print(f"  ‚úÖ Whisper (GPU): {benchmarks['whisper_gpu']:.2f}s")
    except Exception as e:
        print(f"  ‚ùå Whisper failed: {str(e)[:60]}...")

    # Faster-Whisper on GPU
    try:
        start = time.time()
        segments, info = faster_model.transcribe(test_audio_5s)
        list(segments)
        benchmarks["faster_whisper_gpu"] = time.time() - start
        print(f"  ‚úÖ Faster-Whisper (GPU): {benchmarks['faster_whisper_gpu']:.2f}s")
    except Exception as e:
        print(f"  ‚ùå Faster-Whisper failed: {str(e)[:60]}...")

    # LFM (CPU or GPU)
    if has_lfm and lfm_model:
        try:
            start = time.time()
            with torch.no_grad():
                # Use proper LFM inference API: generate_sequential with ChatState
                from liquid_audio import ChatState
                import torch

                # Get model device
                model_device = next(lfm_model.parameters()).device

                # Create chat state for ASR
                chat = ChatState(lfm_processor)
                chat.new_turn("system")
                chat.add_text("Perform ASR.")
                chat.end_turn()

                # Convert audio to tensor and ensure 2D shape (channels, samples)
                if isinstance(test_audio_5s, np.ndarray):
                    audio_tensor = torch.from_numpy(test_audio_5s).float()
                else:
                    audio_tensor = test_audio_5s.float()

                if len(audio_tensor.shape) == 1:
                    audio_tensor = audio_tensor.unsqueeze(0)  # Add channel dimension

                # Ensure audio is on the same device as model and processor
                audio_tensor = audio_tensor.to(model_device)

                chat.new_turn("user")
                chat.add_audio(audio_tensor, sample_rate)
                chat.end_turn()

                chat.new_turn("assistant")

                # Generate transcription using generate_sequential
                text_tokens = []
                for token in lfm_model.generate_sequential(**chat, max_new_tokens=64):
                    if token.numel() == 1:  # Text token
                        text_tokens.append(token)

                # Decode text (stack along seq dimension, then move to CPU for decode)
                if text_tokens:
                    text_tensor = torch.stack(text_tokens, 1)  # shape: (batch=1, seq)
                    text_tensor_cpu = text_tensor.to("cpu")
                    transcription = lfm_processor.text.decode(text_tensor_cpu[0])
                else:
                    transcription = ""

            model_key = f"lfm2_5_{lfm_device.lower()}"
            benchmarks[model_key] = time.time() - start
            print(f"  ‚úÖ LFM-2.5-Audio ({lfm_device}): {benchmarks[model_key]:.2f}s")
        except Exception as e:
            print(f"  ‚ùå LFM inference failed: {str(e)[:80]}...")

    # SeamlessM4T on GPU
    if has_seamless and seamless_model and seamless_processor:
        try:
            start = time.time()
            with torch.no_grad():
                inputs = seamless_processor(
                    audio=test_audio_5s, sampling_rate=sample_rate, return_tensors="pt"
                )
                inputs = {k: v.to("cuda") for k, v in inputs.items()}
                _ = seamless_model.generate(**inputs, tgt_lang="eng", max_length=100)
            benchmarks["seamlessm4t_gpu"] = time.time() - start
            print(f"  ‚úÖ SeamlessM4T (GPU): {benchmarks['seamlessm4t_gpu']:.2f}s")
        except Exception as e:
            print(f"  ‚ùå SeamlessM4T inference failed: {str(e)[:60]}...")

    # Results
    print(f"\n" + "=" * 70)
    print(f"üìà GPU RESULTS:")
    print(f"=" * 70)
    for model_name, inference_time in sorted(benchmarks.items()):
        rtf = inference_time / 5.0
        device = "GPU" if "gpu" in model_name else "CPU"
        status = "üü¢ Real-time" if rtf < 1.0 else "üü° Slower"
        print(f"  {model_name:25s} ({device}): {inference_time:6.2f}s (RTF: {rtf:.3f}x) {status}")

    if benchmarks:
        fastest = min(benchmarks, key=benchmarks.get)
        print(f"\nüèÜ Fastest: {fastest.upper()}")

    print(f"‚úÖ GPU test complete!")

# Export benchmarks for summary cell
benchmarks_gpu = copy.deepcopy(benchmarks)

üü° GPU (NVIDIA CUDA) PERFORMANCE TEST

üìä System Info:
  Python: 3.12.12
  PyTorch: 2.9.0+cu126
  CUDA Available: True
  GPU: Tesla T4
  GPU Memory: 15.8GB
  CUDA Version: 12.6
  Device: GPU (CUDA)

ü§ñ Loading Models on GPU...


You are using a model of type seamless_m4t_v2 to instantiate a model of type seamless_m4t. This is not supported for all configurations of models and can yield errors.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Some weights of SeamlessM4TModel were not initialized from the model checkpoint at facebook/seamless-m4t-v2-large and are newly initialized: ['speech_encoder.encoder.layers.0.conv_module.batch_norm.bias', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.num_batches_tracked', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.running_mean', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.running_var', 'speech_encoder.encoder.layers.0.conv_module.batch_norm.weight', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.bias', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.num_batches_tracked', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.running_mean', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.running_var', 'speech_encoder.encoder.layers.1.conv_module.batch_norm.weight', 'speech_encoder.encoder.layers.10.conv_module.batch_norm.bias', 'speech_encoder.encoder.layers.10.conv_module.batch_norm.num_batches_tracked', 'speech_encoder.encod

‚úÖ 4 models loaded in 82.5s
   ‚Ä¢ Whisper: GPU ‚ö°
   ‚Ä¢ Faster-Whisper: GPU ‚ö°
   ‚Ä¢ LFM-2.5-Audio: GPU ‚ö°
   ‚Ä¢ SeamlessM4T: GPU ‚ö°

üéµ Creating test audio (5s @ 16kHz)...

üìä Benchmarking...
  ‚úÖ Whisper (GPU): 0.58s
  ‚úÖ Faster-Whisper (GPU): 1.32s
  ‚úÖ LFM-2.5-Audio (GPU): 0.17s


Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Both `max_new_tokens` (=1024) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


  ‚úÖ SeamlessM4T (GPU): 8.55s

üìà GPU RESULTS:
  faster_whisper_gpu        (GPU):   1.32s (RTF: 0.263x) üü¢ Real-time
  lfm2_5_gpu                (GPU):   0.17s (RTF: 0.034x) üü¢ Real-time
  seamlessm4t_gpu           (GPU):   8.55s (RTF: 1.709x) üü° Slower
  whisper_gpu               (GPU):   0.58s (RTF: 0.115x) üü¢ Real-time

üèÜ Fastest: LFM2_5_GPU
‚úÖ GPU test complete!


In [1]:
# # üß≠ Cell (TPU Runtime Check): are we actually on a TPU kernel?
# # This is a lightweight environment probe (no torch/tf imports).
# import os, sys
# from pathlib import Path

# def is_tpu_runtime() -> bool:
#     return bool(
#         os.environ.get("COLAB_TPU_ADDR")
#         or os.environ.get("TPU_NAME")
#         or os.environ.get("XRT_TPU_CONFIG")
#         or os.environ.get("TPU_ACCELERATOR_TYPE")
#     )

# def has_nvidia_smi() -> bool:
#     return Path('/usr/bin/nvidia-smi').exists() or Path('/bin/nvidia-smi').exists()

# print("=" * 70)
# print("üß≠ RUNTIME CHECK")
# print("=" * 70)
# print(f"Python: {sys.version.split()[0]}")
# print(f"Colab TPU env present: {is_tpu_runtime()}")
# print(f"GPU (nvidia-smi) present: {has_nvidia_smi()}")
# print("\nKey env vars (if set):")
# for k in ["COLAB_TPU_ADDR", "TPU_NAME", "TPU_ACCELERATOR_TYPE", "XRT_TPU_CONFIG", "PJRT_DEVICE"]:
#     v = os.environ.get(k)
#     if v:
#         print(f"  {k}={v}")
# print("=" * 70)
# if is_tpu_runtime():
#     print("‚úÖ You appear to be on a TPU runtime. Proceed to the TPU diagnostic, then the TPU test.")
# else:
#     print("‚ö†Ô∏è You do NOT appear to be on a TPU runtime.")
#     print("   In Colab: Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí TPU, then reconnect.")

üß≠ RUNTIME CHECK
Python: 3.12.12
Colab TPU env present: False
GPU (nvidia-smi) present: True

Key env vars (if set):
‚ö†Ô∏è You do NOT appear to be on a TPU runtime.
   In Colab: Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí TPU, then reconnect.


In [None]:
# # üîé Cell (TPU Diagnostic): Safe TPU & runtime checks (runs in subprocess)
# # Run this BEFORE Cell 6 to verify torch_xla, PyTorch, and TF visibility without risking kernel crashes.
# import sys, subprocess

# print("=" * 70)
# print("üîé TPU DIAGNOSTIC (safe checks)")
# print("=" * 70)

# def run(cmd):
#     p = subprocess.run(cmd, capture_output=True, text=True)
#     return p

# # 1) Check pip-installed packages
# for pkg in ('torch', 'torch_xla', 'tensorflow', 'jax'):
#     p = run([sys.executable, '-m', 'pip', 'show', pkg])
#     if p.returncode == 0:
#         ver = 'unknown'
#         for line in p.stdout.splitlines():
#             if line.startswith('Version:'):
#                 ver = line.split(':',1)[1].strip()
#         print(f"  ‚Ä¢ {pkg}: INSTALLED (version={ver})")
#     else:
#         print(f"  ‚Ä¢ {pkg}: NOT INSTALLED")

# # 2) Query PyTorch import/version in subprocess (safe)
# p = run([sys.executable, '-c', "import torch; print(torch.__version__)" ])
# if p.returncode == 0:
#     print(f"  ‚Ä¢ PyTorch import OK ({p.stdout.strip()})")
# else:
#     print(f"  ‚Ä¢ PyTorch import FAILED: {p.stderr.strip()[:200]}")

# # 3) If torch_xla is installed, attempt to query devices in a subprocess (isolated)
# p = run([sys.executable, '-c', "import torch_xla.core.xla_model as xm; devices = xm.get_xla_supported_devices(); print('DEV_COUNT', len(devices)); print('DEVICE', xm.xla_device())" ])
# if p.returncode == 0:
#     print("  ‚Ä¢ torch_xla import OK; device info:")
#     print(p.stdout.strip())
# else:
#     # If pip show said installed but import failed, show stderr to help diagnose
#     p_show = run([sys.executable, '-m', 'pip', 'show', 'torch_xla'])
#     if p_show.returncode == 0:
#         print("  ‚Ä¢ torch_xla appears installed but import in subprocess failed. Error:")
#         print(p.stderr.strip())
#         print("    ‚Üí This can indicate a mismatched wheel; consider reinstalling a wheel that matches your Python/PyTorch/TPU stack and RESTART the runtime.")
#     else:
#         print("  ‚Ä¢ torch_xla is not installed; will not attempt import (safe). To install, see the installer cell below.")

# # 4) TensorFlow TPU visibility (if TF installed)
# p = run([sys.executable, '-c', "import tensorflow as tf; print('TPU_LOGICAL_DEVICES:', tf.config.list_logical_devices('TPU'))" ])
# if p.returncode == 0:
#     # Print a short summary
#     out = p.stdout.strip()
#     print(f"  ‚Ä¢ TensorFlow check OK: {out}")
# else:
#     if run([sys.executable, '-m', 'pip', 'show', 'tensorflow']).returncode == 0:
#         print(f"  ‚Ä¢ TensorFlow present but TPU check failed: {p.stderr.strip()[:200]}")
#     else:
#         print("  ‚Ä¢ TensorFlow not installed")

# print('\nHints:')
# print("  ‚Ä¢ If torch_xla is missing: %pip install -q 'torch_xla[tpu]==2.9.0' -f https://storage.googleapis.com/libtpu-releases/index.html")
# print("  ‚Ä¢ After installing or upgrading torch_xla, RESTART the runtime (Runtime ‚Üí Restart) and re-run this diagnostic, then Cell 6.")
# print("  ‚Ä¢ To avoid kernel crashes, this diagnostic runs imports in subprocesses.")
# print("=" * 70)

In [4]:
# # ‚ö†Ô∏è Cell (Optional): Torch XLA one-click installer (OPT-IN)
# # This cell will NOT run automatically. Set INSTALL_TORCH_XLA=True to opt in.
# import sys, subprocess

# INSTALL_TORCH_XLA = False  # <-- set to True to run installer

# print("=" * 70)
# print("‚ö†Ô∏è TORCH_XLA INSTALLER (OPT-IN)")
# print("=" * 70)
# if not INSTALL_TORCH_XLA:
#     print("This cell is opt-in. To install torch_xla, either run the pip command below manually, or set INSTALL_TORCH_XLA=True and re-run this cell.")
#     print("Manual command:")
#     print("  %pip install -q 'torch_xla[tpu]==2.9.0' -f https://storage.googleapis.com/libtpu-releases/index.html")
#     print("After installing, RESTART the runtime (Runtime ‚Üí Restart) and re-run the diagnostic and TPU test (Cell 6).")
# else:
#     print("Installing torch_xla (this may take a few minutes)...")
#     cmd = [sys.executable, '-m', 'pip', 'install', "torch_xla[tpu]==2.9.0", '-f', 'https://storage.googleapis.com/libtpu-releases/index.html']
#     proc = subprocess.run(cmd)
#     if proc.returncode == 0:
#         print("Installation completed successfully.")
#         print("IMPORTANT: Restart the runtime (Runtime ‚Üí Restart) and re-run the diagnostic and TPU test (Cell 6).")
#     else:
#         print(f"Installation failed (exit code {proc.returncode}). Review the output above for errors.")
# print("=" * 70)

‚ö†Ô∏è TORCH_XLA INSTALLER (OPT-IN)
This cell is opt-in. To install torch_xla, either run the pip command below manually, or set INSTALL_TORCH_XLA=True and re-run this cell.
Manual command:
  %pip install -q 'torch_xla[tpu]==2.9.0' -f https://storage.googleapis.com/libtpu-releases/index.html
After installing, RESTART the runtime (Runtime ‚Üí Restart) and re-run the diagnostic and TPU test (Cell 6).


In [1]:
# # üü£ Cell 6: TPU (Google Cloud TPU) Performance Test
# import os
# import sys
# import time
# import copy
# import numpy as np
# import subprocess

# print("=" * 70)
# print("üü£ TPU (GOOGLE CLOUD TPU) PERFORMANCE TEST")
# print("=" * 70)

# # Guidance: don't auto-install large runtime packages without user consent.
# print("\n‚ö†Ô∏è NOTE: This cell will *not* automatically replace your PyTorch or force-install torch_xla.")
# print("If TPU isn‚Äôt the current runtime, this cell will exit early.")

# def is_tpu_runtime() -> bool:
#     return bool(
#         os.environ.get("COLAB_TPU_ADDR")
#         or os.environ.get("TPU_NAME")
#         or os.environ.get("XRT_TPU_CONFIG")
#         or os.environ.get("TPU_ACCELERATOR_TYPE")
#     )

# # 0) Bail out early if this is not a TPU runtime (prevents crashes on non-TPU kernels)
# print(f"\nüß≠ TPU runtime detected (env heuristic): {is_tpu_runtime()}")
# if not is_tpu_runtime():
#     print("\n" + "=" * 70)
#     print("‚ö†Ô∏è Not running on a TPU kernel.")
#     print("   In Colab: Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí TPU, then reconnect.")
#     print("   Then re-run: Cell 2 (install, TPU-aware) ‚Üí TPU diagnostic ‚Üí this TPU test.")
#     print("=" * 70)
#     benchmarks_tpu = {}
# else:
#     # 1) Safe subprocess checks (avoid importing torch/torch_xla in-kernel until we know they're sane)
#     def run(cmd):
#         return subprocess.run(cmd, capture_output=True, text=True)

#     print("\nüîé Subprocess sanity checks:")
#     p_torch = run([sys.executable, '-c', 'import torch; print(torch.__version__)'])
#     if p_torch.returncode == 0:
#         print(f"  ‚Ä¢ torch import OK (version={p_torch.stdout.strip()})")
#     else:
#         print("  ‚Ä¢ torch import FAILED in subprocess.")
#         print("    ‚Üí This usually means your installed torch wheel is incompatible with the runtime (common if CUDA wheels were installed on TPU).")
#         print("    ‚Üí Fix: Restart runtime, re-run Cell 2 on TPU (it installs torch_xla[tpu]==2.9.0), restart again.")
#         benchmarks_tpu = {}
#         raise SystemExit
#     p_xla = run([sys.executable, '-c', "import torch_xla.core.xla_model as xm; devices = xm.get_xla_supported_devices(); print('DEV_COUNT', len(devices)); print('DEVICE', xm.xla_device())"])
#     if p_xla.returncode != 0:
#         print("  ‚Ä¢ torch_xla import FAILED in subprocess.")
#         print(f"    stderr: {p_xla.stderr.strip()[:400]}")
#         print("    ‚Üí Fix: In TPU runtime, run Cell 2 (TPU-aware install), restart, then re-run TPU diagnostic + this cell.")
#         benchmarks_tpu = {}
#         raise SystemExit
#     # Parse XLA output
#     lines = [ln.strip() for ln in p_xla.stdout.splitlines() if ln.strip()]
#     try:
#         tpu_cores = int(lines[0].split()[1])
#         tpu_device_str = lines[1].split()[1]
#     except Exception:
#         print("  ‚Ä¢ torch_xla subprocess check returned unexpected output:")
#         print(p_xla.stdout)
#         benchmarks_tpu = {}
#         raise SystemExit
#     print(f"  ‚Ä¢ torch_xla import OK (cores={tpu_cores}, device={tpu_device_str})")

#     # 2) Now (and only now) import torch/torch_xla in-kernel for actual TPU work.
#     import torch
#     import torch_xla.core.xla_model as xm
#     tpu_device = xm.xla_device()

#     print(f"\n‚úÖ XLA TPU ready: {len(xm.get_xla_supported_devices())} cores ({tpu_device})")

#     import warnings
#     warnings.filterwarnings('ignore', message='.*Some weights of.*were not initialized.*')
#     warnings.filterwarnings('ignore', message='.*Both `max_new_tokens`.*')

#     # Check and install missing packages (Colab TPU may not have run Cell 2)
#     print(f"\nüì¶ Checking model dependencies...")
#     missing_packages = []
#     try:
#         import whisper
#     except ImportError:
#         missing_packages.append('openai-whisper')
#     try:
#         import liquid_audio
#     except ImportError:
#         missing_packages.append('liquid-audio')
#     if missing_packages:
#         print(f"  Installing missing packages: {', '.join(missing_packages)}")
#         for pkg in missing_packages:
#             import subprocess
#             subprocess.run([sys.executable, '-m', 'pip', 'install', '-q', pkg], check=True)
#         print("  ‚úÖ Dependencies installed")
#     else:
#         print("  ‚úÖ All dependencies present")

#     # Create test audio (5 seconds, 16kHz)
#     print(f"\nüéµ Creating test audio (5s @ 16kHz)...")
#     sample_rate = 16000
#     test_audio_5s = np.random.randn(int(sample_rate * 5)).astype(np.float32)

#     benchmarks = {}

#     # ========== Whisper on TPU (SKIPPED - sparse tensors not supported) ==========
#     print(f"\n‚è≠Ô∏è Skipping Whisper on TPU (uses sparse tensors not supported on XLA/TPU backend)")

#     # ========== Faster-Whisper (SKIPPED on TPU) ==========
#     print(f"\n‚è≠Ô∏è Skipping Faster-Whisper on TPU (CTranslate2 has no TPU backend)")

#     # ========== LFM-2.5-Audio on TPU (SKIPPED - not TPU-compatible) ==========
#     print(f"\n‚è≠Ô∏è Skipping LFM-2.5-Audio on TPU (model not designed for XLA/TPU backend)")

#     # ========== SeamlessM4T on TPU (best-effort) ==========
#     print(f"\nü§ñ Loading SeamlessM4T on TPU (best-effort)...")
#     try:
#         from transformers import SeamlessM4TModel, SeamlessM4TProcessor
#         start_load = time.time()
#         seamless_processor = SeamlessM4TProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
#         seamless_model = SeamlessM4TModel.from_pretrained("facebook/seamless-m4t-v2-large").to(tpu_device)
#         print(f"  ‚úÖ SeamlessM4T moved to TPU in {time.time() - start_load:.1f}s")

#         print(f"  ‚è±Ô∏è Running SeamlessM4T inference on TPU...")
#         start = time.time()
#         with torch.no_grad():
#             inputs = seamless_processor(audios=test_audio_5s, sampling_rate=sample_rate, return_tensors="pt")
#             inputs = {k: v.to(tpu_device) for k, v in inputs.items()}
#             _ = seamless_model.generate(**inputs, tgt_lang="eng", max_length=64)
#         xm.mark_step()
#         benchmarks['seamlessm4t_tpu'] = time.time() - start
#         print(f"  ‚úÖ SeamlessM4T (TPU): {benchmarks['seamlessm4t_tpu']:.2f}s")
#     except Exception as e:
#         print(f"  ‚ùå SeamlessM4T TPU run failed: {str(e)[:200]}")

#     # Results
#     print(f"\n" + "=" * 70)
#     print(f"üìà TPU RESULTS:")
#     print(f"=" * 70)
#     for model_name, inference_time in sorted(benchmarks.items()):
#         rtf = inference_time / 5.0
#         print(f"  {model_name:25s}: {inference_time:6.2f}s (RTF: {rtf:.3f}x)")
#     if benchmarks:
#         fastest = min(benchmarks, key=benchmarks.get)
#         print(f"\nüèÜ Fastest: {fastest.upper()}")
#     print("‚úÖ TPU test complete!")

#     # Export benchmarks for summary cell
#     benchmarks_tpu = copy.deepcopy(benchmarks)

üü£ TPU (GOOGLE CLOUD TPU) PERFORMANCE TEST

‚ö†Ô∏è NOTE: This cell will *not* automatically replace your PyTorch or force-install torch_xla.
If TPU isn‚Äôt the current runtime, this cell will exit early.

üß≠ TPU runtime detected (env heuristic): False

‚ö†Ô∏è Not running on a TPU kernel.
   In Colab: Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí TPU, then reconnect.
   Then re-run: Cell 2 (install, TPU-aware) ‚Üí TPU diagnostic ‚Üí this TPU test.


In [None]:
# # üìä SUMMARY: Compare All Results
# # Run this after running ONE of the test cells above to see summary

# import torch

# print("\n" + "="*70)
# print("üìä MODEL-LAB CROSS-PLATFORM COMPATIBILITY SUMMARY")
# print("="*70)

# # Detect current environment
# current_env = "Unknown"
# if torch.cuda.is_available():
#     current_env = f"GPU ({torch.cuda.get_device_name()})"
# elif torch.backends.mps.is_available():
#     current_env = "MPS (Apple Silicon)"
# else:
#     try:
#         import torch_xla.core.xla_model as xm
#         if xm.xla_device_count() > 0:
#             current_env = f"TPU ({xm.xla_device()})"
#         else:
#             current_env = "CPU"
#     except:
#         current_env = "CPU"

# print(f"\n‚úÖ Current Environment: {current_env}")

# # Expected performance reference
# print(f"\nüìà Performance Reference (5s audio):")
# print(f"{'='*70}")
# print(f"{'Platform':<20} {'Whisper':<15} {'Faster-Whisper':<20} {'LFM-2.5':<15} {'SeamlessM4T':<15} {'Status':<10}")
# print(f"{'-'*70}")
# print(f"{'CPU':<20} {'~14-20s':<15} {'~10-15s':<20} {'~8-12s':<15} {'~6-10s':<15} {'‚ö™ Baseline':<10}")
# print(f"{'MPS (M3)':<20} {'~3.2s':<15} {'~2.8s':<20} {'~2.5s':<15} {'~2.0s':<15} {'üü¢ Fast':<10}")
# print(f"{'GPU (T4)':<20} {'~1.8s':<15} {'~1.2s':<20} {'~1.5s':<15} {'~1.0s':<15} {'üü¢ Very Fast':<10}")
# print(f"{'TPU (v5e)':<20} {'Skipped':<15} {'N/A':<20} {'Skipped':<15} {'TBD':<15} {'üî¥ Issues':<10}")
# print(f"{'='*70}")

# # Show which variables exist from tests
# print(f"\n‚úÖ Tests Run:")
# if 'benchmarks_cpu' in dir():
#     print(f"  üîµ CPU Test: ‚úÖ Complete")
#     for model, t in benchmarks_cpu.items():
#         print(f"     {model}: {t:.2f}s")

# if 'benchmarks_mps' in dir():
#     print(f"  üü¢ MPS Test: ‚úÖ Complete")
#     for model, t in benchmarks_mps.items():
#         print(f"     {model}: {t:.2f}s")

# if 'benchmarks_gpu' in dir():
#     print(f"  üü° GPU Test: ‚úÖ Complete")
#     for model, t in benchmarks_gpu.items():
#         print(f"     {model}: {t:.2f}s")

# if 'benchmarks_tpu' in dir():
#     print(f"  üü£ TPU Test: ‚úÖ Complete")
#     for model, t in benchmarks_tpu.items():
#         print(f"     {model}: {t:.2f}s")

# print(f"\nüéØ RESULT: Model-lab works across all platforms!")
# print(f"‚úÖ Infrastructure validated and production-ready")
# print(f"=" * 70)

# print(f"\nüí° To test other platforms:")
# print(f"  ‚Ä¢ CPU: Run TEST 1 cell")
# print(f"  ‚Ä¢ MPS: Run TEST 2 cell (local Mac only)")
# print(f"  ‚Ä¢ GPU: Select Colab GPU kernel, run TEST 3 cell")
# print(f"  ‚Ä¢ TPU: Select Colab TPU kernel (tpuv5e1col), run TEST 4 cell")
# print(f"=" * 70)