<a href="https://colab.research.google.com/github/meizhong986/WhisperJAV/blob/main/notebook/WhisperJAV_colab_parallel_expert.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# WhisperJAV (parallel) Two-Pass Edition v1.7.4 (Expert)

**Adaptive Two-Pass Processing** - Automatically optimizes for your platform.

| Platform | GPUs | How it works |
|----------|------|---------------|
| **Kaggle** | 2x T4 (8GB each) | **Parallel** - Pass 1 on GPU 0, Pass 2 on GPU 1 simultaneously |
| **Colab L4/A100** | 1x GPU (16-24GB) | **Sequential** - Pass 1 first, then Pass 2 (avoids memory issues) |

| Option | What it controls |
|--------|------------------|
| **Scene Detection** | How to split audio into chunks (auditok, silero, semantic) |
| **Speech Segmenter** | How to detect speech in audio (silero, ten) |
| **Speech Enhancer** | Audio cleanup for noisy sources (ffmpeg-dsp, clearvoice, etc.) |
| **Model** | Which AI model to use (large-v2, large-v3, turbo, kotoba) |

---
<div style="font-size: 8px; line-height: 1.0;">
1. Upload your videos to <code>Google Drive/WhisperJAV/</code><br>
2. Run <b>Step 1: Expert Configuration</b> (required)<br>
3. Run <b>Step 2: Two-Pass Transcribe</b> and wait for completion<br>
4. Run <b>Step 3: AI Translation</b> (if selected)
</div>

<small>The notebook will automatically disconnect when finished to save your GPU credits.</small>

In [None]:
#@title Step 1: Expert Configuration { display-mode: "form" }

#@markdown ## üìÅ Files & Output
folder_name = "WhisperJAV" #@param {type:"string"}
subtitle_language = "Japanese" #@param ["Japanese", "English (auto-translate)", "English (AI translate)"]

#@markdown ---
#@markdown ## 1Ô∏è‚É£ Pass 1 Configuration (GPU 0)
pass1_quality = "balanced" #@param ["faster", "fast", "balanced", "fidelity", "transformers"]
pass1_sensitivity = "aggressive" #@param ["conservative", "balanced", "aggressive"]
pass1_model = "automatic" #@param ["automatic", "large-v2", "large-v3", "turbo", "kotoba-bilingual", "kotoba-v2.0", "kotoba-v2.1", "kotoba-v2.2"]

#@markdown **Expert Audio Setup (Pass 1)**
pass1_scene_detector = "automatic" #@param ["automatic", "auditok", "silero", "semantic"]
pass1_speech_segmenter = "automatic" #@param ["automatic", "silero", "ten", "none"]
pass1_speech_enhancer = "none" #@param ["none", "ffmpeg-dsp", "clearvoice", "zipenhancer", "bs-roformer"]
#@markdown <font size="1">auditok=energy (fast), silero=VAD, semantic=texture (complex audio) | enhancer: ffmpeg-dsp(no GPU), clearvoice(48k), bs-roformer(vocal)</font>

#@markdown **FFmpeg Filters (Pass 1)** *(only if enhancer is ffmpeg-dsp)*
pass1_ffmpeg_amplify = True #@param {type:"boolean"}
pass1_ffmpeg_loudnorm = False #@param {type:"boolean"}
pass1_ffmpeg_compress = False #@param {type:"boolean"}
pass1_ffmpeg_highpass = False #@param {type:"boolean"}

#@markdown ---
#@markdown ## 2Ô∏è‚É£ Pass 2 Configuration (GPU 1)
pass2_quality = "transformers" #@param ["faster", "fast", "balanced", "fidelity", "transformers"]
pass2_sensitivity = "aggressive" #@param ["conservative", "balanced", "aggressive"]
pass2_model = "kotoba-bilingual" #@param ["automatic", "large-v2", "large-v3", "turbo", "kotoba-bilingual", "kotoba-v2.0", "kotoba-v2.1", "kotoba-v2.2"]

#@markdown **Expert Audio Setup (Pass 2)**
pass2_scene_detector = "automatic" #@param ["automatic", "auditok", "silero", "semantic"]
pass2_speech_segmenter = "ten" #@param ["automatic", "silero", "ten", "none"]
pass2_speech_enhancer = "none" #@param ["none", "ffmpeg-dsp", "clearvoice", "zipenhancer", "bs-roformer"]

#@markdown **FFmpeg Filters (Pass 2)** *(only if enhancer is ffmpeg-dsp)*
pass2_ffmpeg_amplify = True #@param {type:"boolean"}
pass2_ffmpeg_loudnorm = False #@param {type:"boolean"}
pass2_ffmpeg_compress = False #@param {type:"boolean"}
pass2_ffmpeg_highpass = False #@param {type:"boolean"}

#@markdown ---
#@markdown ## üîó Merge Strategy
merge_method = "prefer first pass" #@param ["automatic", "keep all", "prefer first pass", "prefer second pass"]

#@markdown ---
#@markdown ## ü§ñ AI Translation *(if selected)*
translation_service = "deepseek" #@param ["deepseek", "openrouter", "gemini", "claude", "gpt"]
api_key = "" #@param {type:"string"}
translation_style = "standard" #@param ["standard", "explicit"]

#@markdown ---
#@markdown ## ‚öôÔ∏è Session
opening_credit = "" #@param {type:"string"}
closing_credit = "Subs by WhisperJAV" #@param {type:"string"}
auto_disconnect = True #@param {type:"boolean"}

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# CONFIGURATION LOGIC
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

# Mapping dictionaries
combine_map = {"automatic": "smart_merge", "keep all": "full_merge",
               "prefer first pass": "pass1_primary", "prefer second pass": "pass2_primary"}
language_map = {"Japanese": "native", "English (auto-translate)": "direct-to-english",
                "English (AI translate)": "llm"}
tone_map = {"standard": "standard", "explicit": "pornify"}

# Model mapping (None = use pipeline default)
model_map = {
    "automatic": None,
    "large-v2": "large-v2",
    "large-v3": "large-v3",
    "turbo": "large-v3-turbo",
    "kotoba-bilingual": "kotoba-tech/kotoba-whisper-bilingual-v1.0",
    "kotoba-v2.0": "kotoba-tech/kotoba-whisper-v2.0",
    "kotoba-v2.1": "kotoba-tech/kotoba-whisper-v2.1",
    "kotoba-v2.2": "kotoba-tech/kotoba-whisper-v2.2"
}

# Define model compatibility:
KOTOBA_MODELS = {"kotoba-bilingual", "kotoba-v2.0", "kotoba-v2.1", "kotoba-v2.2"}
LEGACY_PIPELINES = {"faster", "fast", "balanced", "fidelity"}

# Auto-correct incompatible model-pipeline combinations
warnings_list = []

# Check Pass 1 compatibility
if pass1_model in KOTOBA_MODELS and pass1_quality in LEGACY_PIPELINES:
    warnings_list.append(f"Pass 1: {pass1_model} requires 'transformers' pipeline. Auto-correcting from '{pass1_quality}' to 'transformers'.")
    pass1_quality = "transformers"

# Check Pass 2 compatibility
if pass2_model in KOTOBA_MODELS and pass2_quality in LEGACY_PIPELINES:
    warnings_list.append(f"Pass 2: {pass2_model} requires 'transformers' pipeline. Auto-correcting from '{pass2_quality}' to 'transformers'.")
    pass2_quality = "transformers"

# Memory warning
heavy_enhancers = {'clearvoice', 'bs-roformer', 'zipenhancer'}
if pass1_speech_enhancer in heavy_enhancers and pass2_speech_enhancer in heavy_enhancers:
    warnings_list.append("Using GPU-based enhancement on both passes may cause OOM on T4 GPU (Sequential Mode). Suggest using ffmpeg-dsp for one pass.")

# Helpers
def build_ffmpeg_filters(amplify, loudnorm, compress, highpass):
    """Combine selected FFmpeg filters into comma-separated string."""
    filters = []
    if amplify: filters.append("amplify")
    if loudnorm: filters.append("loudnorm")
    if compress: filters.append("compress")
    if highpass: filters.append("highpass")
    return ",".join(filters) if filters else None

def map_value(val):
    return None if val == "automatic" else val

def map_segmenter(val):
    return "none" if val == "none" else map_value(val)

# Unified Config Construction
WHISPERJAV_CONFIG = {
    'pass1_pipeline': pass1_quality,
    'pass1_sensitivity': pass1_sensitivity,
    'pass1_speech_segmenter': map_segmenter(pass1_speech_segmenter),
    'pass1_model': model_map[pass1_model],
    'pass2_pipeline': pass2_quality,
    'pass2_sensitivity': pass2_sensitivity,
    'pass2_speech_segmenter': map_segmenter(pass2_speech_segmenter),
    'pass2_model': model_map[pass2_model],
    'merge_strategy': combine_map[merge_method],
    'folder_name': folder_name,
    'subtitle_language': language_map[subtitle_language],
    'translation_service': translation_service,
    'api_key': api_key,
    'translation_style': tone_map[translation_style],
    'opening_credit': opening_credit,
    'closing_credit': closing_credit,
    'auto_disconnect': auto_disconnect,
    # Compatibility checks for Step 2
    '_pass1_quality': pass1_quality,
    '_pass1_sensitivity': pass1_sensitivity,
    '_pass1_speech_segmenter': pass1_speech_segmenter,
    '_pass1_model': pass1_model,
    '_pass2_quality': pass2_quality,
    '_pass2_sensitivity': pass2_sensitivity,
    '_pass2_speech_segmenter': pass2_speech_segmenter,
    '_pass2_model': pass2_model,
    '_merge_method': merge_method,
    '_subtitle_language': subtitle_language,
    '_translation_style': translation_style,
}

WHISPERJAV_EXPERT_CONFIG = {
    # Pass 1 Expert
    'pass1_scene_detector': map_value(pass1_scene_detector),
    'pass1_speech_segmenter': map_segmenter(pass1_speech_segmenter), # Now unified
    'pass1_speech_enhancer': None if pass1_speech_enhancer == "none" else pass1_speech_enhancer,
    'pass1_ffmpeg_filters': build_ffmpeg_filters(pass1_ffmpeg_amplify, pass1_ffmpeg_loudnorm, pass1_ffmpeg_compress, pass1_ffmpeg_highpass) if pass1_speech_enhancer == "ffmpeg-dsp" else None,
    # Pass 2 Expert
    'pass2_scene_detector': map_value(pass2_scene_detector),
    'pass2_speech_segmenter': map_segmenter(pass2_speech_segmenter), # Now unified
    'pass2_speech_enhancer': None if pass2_speech_enhancer == "none" else pass2_speech_enhancer,
    'pass2_ffmpeg_filters': build_ffmpeg_filters(pass2_ffmpeg_amplify, pass2_ffmpeg_loudnorm, pass2_ffmpeg_compress, pass2_ffmpeg_highpass) if pass2_speech_enhancer == "ffmpeg-dsp" else None,
    # Display helpers
    '_pass1_scene_detector': pass1_scene_detector,
    '_pass1_speech_enhancer': pass1_speech_enhancer,
    '_pass2_scene_detector': pass2_scene_detector,
    '_pass2_speech_enhancer': pass2_speech_enhancer,
}

from IPython.display import display, HTML

# Display warnings
for warning in warnings_list:
    display(HTML(f'<div style="padding:6px 10px;background:#fef9c3;border-radius:4px;font-size:10px;margin-bottom:4px"><b>‚ö†Ô∏è Auto-corrected:</b> {warning}</div>'))

# Build status display
p1_info = f"{pass1_quality}"
if pass1_speech_segmenter != "automatic":
    p1_info += f"/{pass1_speech_segmenter}"
if pass1_model != "automatic":
    p1_info += f"/{pass1_model}"

p2_info = f"{pass2_quality}"
if pass2_speech_segmenter != "automatic":
    p2_info += f"/{pass2_speech_segmenter}"
if pass2_model != "automatic":
    p2_info += f"/{pass2_model}"

display(HTML(f'<div style="padding:10px;background:#e0f2fe;border-radius:4px;font-size:11px">'
             f'<b>Parallel Configuration Loaded</b><br>'
             f'Pass 1: {p1_info} | Pass 2: {p2_info}<br>'
             f'Merge: {merge_method} | Folder: {folder_name}'
             f'</div>'))

In [None]:
#@title Step 2: Two-Pass Transcribe { display-mode: "form" }
#@markdown Connect Drive ‚Üí Install ‚Üí Run passes (parallel on Kaggle, sequential on Colab) ‚Üí Merge results

import os, sys, subprocess, shlex, time, re
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed
from IPython.display import display, HTML, clear_output
from dataclasses import dataclass
from typing import List, Dict, Any, Tuple, Optional

def status(msg, ok=True):
    icon = "‚úì" if ok else "‚úó"
    print(f"{icon} {msg}")

def section(title):
    print(f"\n{'‚îÄ'*50}\n{title}\n{'‚îÄ'*50}")

# Check config
if 'WHISPERJAV_CONFIG' not in dir():
    display(HTML('<div style="padding:8px;background:#fef2f2;border-radius:4px;color:#991b1b;font-size:10px"><b>Error:</b> Run Step 1 first</div>'))
    raise SystemExit()
cfg = WHISPERJAV_CONFIG

# Check for expert config (always present now)
expert = WHISPERJAV_EXPERT_CONFIG if 'WHISPERJAV_EXPERT_CONFIG' in dir() else None

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# CONNECT GOOGLE DRIVE
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
section("CONNECTING GOOGLE DRIVE")
try:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=False)
    folder_path = Path(f"/content/drive/MyDrive/{cfg['folder_name']}")
    folder_path.mkdir(parents=True, exist_ok=True)
    status(f"Connected: {folder_path}")
except Exception as e:
    status(f"Failed to connect: {e}", False)
    raise SystemExit()

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# CHECK GPUs AND DETERMINE MODE
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
section("DETECTING PLATFORM")
gpu_check = subprocess.run("nvidia-smi --query-gpu=name,memory.total --format=csv,noheader", shell=True, capture_output=True, text=True)
if gpu_check.returncode != 0 or not gpu_check.stdout.strip():
    status("No GPU detected. Go to Runtime ‚Üí Change runtime type ‚Üí T4 GPU", False)
    raise SystemExit()

gpu_lines = [line.strip() for line in gpu_check.stdout.strip().split('\n') if line.strip()]
num_gpus = len(gpu_lines)

for i, gpu_info in enumerate(gpu_lines):
    status(f"GPU {i}: {gpu_info}")

# Adaptive mode selection
if num_gpus >= 2:
    PARALLEL_MODE = True
    gpu_assignment = {1: "0", 2: "1"}
    print(f"\n  ‚ö° Kaggle Mode: PARALLEL (Pass 1 ‚Üí GPU 0, Pass 2 ‚Üí GPU 1)")
else:
    PARALLEL_MODE = False
    gpu_assignment = {1: "0", 2: "0"}
    print(f"\n  üìù Colab Mode: SEQUENTIAL (avoids memory contention)")

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# INSTALL WHISPERJAV
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
section("INSTALLING (2-3 min)")
install_start = time.time()

steps = [
    ("apt-get update -qq && apt-get install -y -qq ffmpeg portaudio19-dev libc++1 libc++abi1 > /dev/null 2>&1", "System tools"),
    ("pip install -q tqdm numba tiktoken ffmpeg-python soundfile auditok numpy scipy pysrt srt aiofiles jsonschema Pillow colorama librosa matplotlib pyloudnorm requests faster-whisper transformers optimum accelerate huggingface-hub pydantic ten-vad silero-vad pydub regex modelscope addict", "Python packages"),
    ("pip install -q --no-deps git+https://github.com/openai/whisper.git@main", "Whisper"),
    ("pip install -q --no-deps git+https://github.com/meizhong986/stable-ts-fix-setup.git@main", "Stable-TS"),
    ("pip install -q git+https://github.com/meizhong986/WhisperJAV.git@main", "WhisperJAV")
]

for cmd, name in steps:
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    if result.returncode != 0:
        status(f"{name} failed", False)
        raise SystemExit()
    status(name)

# Conditional installation of speech enhancer dependencies
if expert:
    extra_packages = set()
    for enhancer in [expert.get('pass1_speech_enhancer'), expert.get('pass2_speech_enhancer')]:
        if enhancer == 'clearvoice':
            extra_packages.add('clearvoice')
        elif enhancer == 'zipenhancer':
            # zipenhancer uses modelscope which is already installed above
            pass
        elif enhancer == 'bs-roformer':
            extra_packages.add('bs-roformer-infer')
    
    if extra_packages:
        pkg_list = ' '.join(extra_packages)
        result = subprocess.run(f"pip install -q {pkg_list}", shell=True, capture_output=True, text=True)
        if result.returncode != 0:
            status(f"Speech enhancer packages failed (continuing anyway)", False)
        else:
            status(f"Speech enhancer packages ({', '.join(extra_packages)})")

status(f"Installation complete ({time.time()-install_start:.0f}s)")

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# FIND MEDIA FILES
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
section("SCANNING FILES")
video_types = {'.mp4', '.mkv', '.avi', '.mov', '.wmv', '.flv', '.webm', '.m4v', '.mp3', '.wav', '.flac', '.m4a'}
videos = [f for f in folder_path.iterdir() if f.suffix.lower() in video_types]

if not videos:
    status(f"No media files in {cfg['folder_name']}/", False)
    raise SystemExit()

status(f"Found {len(videos)} file(s)")
for v in videos[:5]:
    print(f"  ‚Ä¢ {v.name}")
if len(videos) > 5:
    print(f"  ... and {len(videos)-5} more")

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# MERGE FUNCTIONS (from whisperjav/ensemble/merge.py)
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê

@dataclass
class Subtitle:
    index: int
    start_time: float
    end_time: float
    text: str

    @property
    def duration(self) -> float:
        return self.end_time - self.start_time

def parse_srt(path: Path) -> List[Subtitle]:
    if not path.exists():
        return []
    subtitles = []
    content = path.read_text(encoding='utf-8')
    blocks = re.split(r'\n\s*\n', content.strip())
    for block in blocks:
        if not block.strip():
            continue
        lines = block.strip().split('\n')
        if len(lines) < 3:
            continue
        try:
            index = int(lines[0].strip())
            ts_match = re.match(r'(\d{2}):(\d{2}):(\d{2}),(\d{3})\s*-->\s*(\d{2}):(\d{2}):(\d{2}),(\d{3})', lines[1].strip())
            if not ts_match:
                continue
            g = ts_match.groups()
            start = int(g[0])*3600 + int(g[1])*60 + int(g[2]) + int(g[3])/1000
            end = int(g[4])*3600 + int(g[5])*60 + int(g[6]) + int(g[7])/1000
            text = '\n'.join(lines[2:]).strip()
            subtitles.append(Subtitle(index, start, end, text))
        except:
            continue
    return subtitles

def write_srt(subtitles: List[Subtitle], path: Path):
    def ts(seconds):
        h, m = int(seconds // 3600), int((seconds % 3600) // 60)
        s, ms = int(seconds % 60), int((seconds % 1) * 1000)
        return f"{h:02d}:{m:02d}:{s:02d},{ms:03d}"
    lines = []
    for i, sub in enumerate(subtitles, 1):
        lines.extend([str(i), f"{ts(sub.start_time)} --> {ts(sub.end_time)}", sub.text, ''])
    path.write_text('\n'.join(lines), encoding='utf-8')

def merge_srt(srt1: Path, srt2: Path, output: Path, strategy: str) -> Dict[str, Any]:
    subs1, subs2 = parse_srt(srt1), parse_srt(srt2)
    
    if strategy == 'full_merge':
        merged = [Subtitle(0, s.start_time, s.end_time, s.text) for s in subs1 + subs2]
    elif strategy == 'pass1_primary':
        merged = [Subtitle(0, s.start_time, s.end_time, s.text) for s in subs1]
        for s2 in subs2:
            if not any(max(s1.start_time, s2.start_time) < min(s1.end_time, s2.end_time) for s1 in subs1):
                merged.append(Subtitle(0, s2.start_time, s2.end_time, s2.text))
    elif strategy == 'pass2_primary':
        merged = [Subtitle(0, s.start_time, s.end_time, s.text) for s in subs2]
        for s1 in subs1:
            if not any(max(s1.start_time, s2.start_time) < min(s1.end_time, s2.end_time) for s2 in subs2):
                merged.append(Subtitle(0, s1.start_time, s1.end_time, s1.text))
    else:  # smart_merge
        merged, used = [], set()
        for s1 in subs1:
            best_i, best_overlap = None, 0
            for i, s2 in enumerate(subs2):
                if i in used: continue
                overlap = max(0, min(s1.end_time, s2.end_time) - max(s1.start_time, s2.start_time))
                if overlap > best_overlap:
                    best_overlap, best_i = overlap, i
            if best_i is not None and best_overlap > 0.3 * min(s1.duration, subs2[best_i].duration):
                used.add(best_i)
                chosen = s1 if s1.duration <= subs2[best_i].duration else subs2[best_i]
                merged.append(Subtitle(0, chosen.start_time, chosen.end_time, chosen.text))
            else:
                merged.append(Subtitle(0, s1.start_time, s1.end_time, s1.text))
        for i, s2 in enumerate(subs2):
            if i not in used:
                merged.append(Subtitle(0, s2.start_time, s2.end_time, s2.text))
    
    merged.sort(key=lambda s: s.start_time)
    write_srt(merged, output)
    return {'pass1_count': len(subs1), 'pass2_count': len(subs2), 'merged_count': len(merged)}

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# TWO-PASS TRANSCRIPTION (ADAPTIVE)
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
section("TWO-PASS TRANSCRIPTION" + (" (PARALLEL)" if PARALLEL_MODE else " (SEQUENTIAL)"))

def build_pass_command(pass_num: int, video_path: Path, output_dir: Path, cfg: dict, expert: Optional[dict] = None) -> Tuple[List[str], Path]:
    """Build whisperjav command for a single pass.

    Note: WhisperJAV doesn't have --output-name, so we use separate directories
    for each pass to avoid conflicts when running in parallel.
    Output naming is automatic: {basename}.{lang_code}.whisperjav.srt
    """
    # Use separate directory for each pass to avoid conflicts
    pass_output_dir = output_dir / f"pass{pass_num}"
    pass_output_dir.mkdir(parents=True, exist_ok=True)

    pipeline = cfg[f'pass{pass_num}_pipeline']
    sensitivity = cfg[f'pass{pass_num}_sensitivity']
    segmenter = cfg[f'pass{pass_num}_speech_segmenter']
    model = cfg[f'pass{pass_num}_model']

    cmd = [
        'whisperjav', str(video_path),
        '--output-dir', str(pass_output_dir),
        '--ensemble',
        '--pass1-pipeline', pipeline,
        '--pass1-sensitivity', sensitivity,
    ]

    # Add speech segmenter from basic config (now unified)
    if segmenter:
        cmd.extend(['--pass1-speech-segmenter', segmenter])

    # Add model if specified
    if model:
        cmd.extend(['--pass1-model', model])

    # Add expert options if provided
    if expert:
        # Scene detector
        scene_detector = expert.get(f'pass{pass_num}_scene_detector')
        if scene_detector:
            cmd.extend(['--pass1-scene-detector', scene_detector])
        
        # Speech segmenter already handled from unified config
        
        # Speech enhancer (ensemble mode only)
        speech_enhancer = expert.get(f'pass{pass_num}_speech_enhancer')
        if speech_enhancer:
            if speech_enhancer == 'ffmpeg-dsp':
                effects = expert.get(f'pass{pass_num}_ffmpeg_filters')
                effects_str = effects if effects else 'loudnorm'
                cmd.extend(['--pass1-speech-enhancer', f'ffmpeg-dsp:{effects_str}'])
            else:
                cmd.extend(['--pass1-speech-enhancer', speech_enhancer])

    # Set subtitle language
    if cfg['subtitle_language'] == 'direct-to-english':
        cmd.extend(['--subs-language', 'direct-to-english'])
    else:
        cmd.extend(['--subs-language', 'native'])

    # Return the pass output directory - we'll find the SRT file after processing
    return cmd, pass_output_dir

def find_output_srt(pass_output_dir: Path, video_name: str) -> Path:
    """Find the generated SRT file in the pass output directory.

    WhisperJAV auto-generates: {basename}.{lang}.whisperjav.srt
    e.g., video.ja.whisperjav.srt or video.en.whisperjav.srt
    """
    base_name = Path(video_name).stem
    # Look for any SRT file matching the video name
    patterns = [
        f"{base_name}.*.whisperjav.srt",  # Standard format
        f"{base_name}.srt",                # Fallback
        f"{base_name}*.srt",               # Any SRT with base name
    ]
    for pattern in patterns:
        matches = list(pass_output_dir.glob(pattern))
        if matches:
            return matches[0]
    # Last resort: any SRT in directory
    all_srts = list(pass_output_dir.glob("*.srt"))
    return all_srts[0] if all_srts else None

def run_pass(pass_num: int, video: Path, output_dir: Path, cfg: dict, expert: Optional[dict], gpu_id: str) -> Dict:
    """Run a single pass on a specific GPU."""
    cmd, pass_output_dir = build_pass_command(pass_num, video, output_dir, cfg, expert)

    env = os.environ.copy()
    env['CUDA_VISIBLE_DEVICES'] = gpu_id

    start_time = time.time()
    result = subprocess.run(shlex.join(cmd), shell=True, capture_output=True, text=True, env=env)
    elapsed = time.time() - start_time

    # Find the output SRT file
    actual_output = find_output_srt(pass_output_dir, video.name)

    return {
        'pass': pass_num,
        'video': video.name,
        'success': result.returncode == 0 and actual_output and actual_output.exists(),
        'output': actual_output,
        'output_dir': pass_output_dir,
        'elapsed': elapsed,
        'gpu': gpu_id,
        'stderr': result.stderr[-500:] if result.stderr else ''  # Last 500 chars for debugging
    }

# Display mode info
p1_info = cfg['_pass1_quality']
if cfg['_pass1_speech_segmenter'] != 'automatic':
    p1_info += f"/{cfg['_pass1_speech_segmenter']}"
if cfg['_pass1_model'] != 'automatic':
    p1_info += f"/{cfg['_pass1_model']}"
if expert:
    if expert.get('_pass1_scene_detector') != 'automatic':
        p1_info += f" [scene:{expert['_pass1_scene_detector']}]"
    if expert.get('_pass1_speech_enhancer') != 'none':
        p1_info += f" [enh:{expert['_pass1_speech_enhancer']}]"

p2_info = cfg['_pass2_quality']
if cfg['_pass2_speech_segmenter'] != 'automatic':
    p2_info += f"/{cfg['_pass2_speech_segmenter']}"
if cfg['_pass2_model'] != 'automatic':
    p2_info += f"/{cfg['_pass2_model']}"
if expert:
    if expert.get('_pass2_scene_detector') != 'automatic':
        p2_info += f" [scene:{expert['_pass2_scene_detector']}]"
    if expert.get('_pass2_speech_enhancer') != 'none':
        p2_info += f" [enh:{expert['_pass2_speech_enhancer']}]"

print(f"Pass 1: {p1_info}")
print(f"Pass 2: {p2_info}")
print(f"Merge: {cfg['_merge_method']}\n")

# Process each video
all_results = []
merged_outputs = []

for video_idx, video in enumerate(videos, 1):
    print(f"\n[{video_idx}/{len(videos)}] Processing: {video.name}")

    results = {}

    if PARALLEL_MODE:
        # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
        # KAGGLE: Run both passes in parallel on separate GPUs
        # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
        with ThreadPoolExecutor(max_workers=2) as executor:
            futures = {
                executor.submit(run_pass, 1, video, folder_path, cfg, expert, gpu_assignment[1]): 1,
                executor.submit(run_pass, 2, video, folder_path, cfg, expert, gpu_assignment[2]): 2
            }

            for future in as_completed(futures):
                pass_num = futures[future]
                result = future.result()
                results[pass_num] = result
                status_icon = "‚úì" if result['success'] else "‚úó"
                print(f"    {status_icon} Pass {pass_num} (GPU {result['gpu']}): {result['elapsed']:.1f}s")
                if not result['success'] and result['stderr']:
                    print(f"        Error: {result['stderr'][:200]}")
    else:
        # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
        # COLAB: Run passes sequentially on same GPU
        # ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
        for pass_num in [1, 2]:
            result = run_pass(pass_num, video, folder_path, cfg, expert, gpu_assignment[pass_num])
            results[pass_num] = result
            status_icon = "‚úì" if result['success'] else "‚úó"
            print(f"    {status_icon} Pass {pass_num}: {result['elapsed']:.1f}s")
            if not result['success'] and result['stderr']:
                print(f"        Error: {result['stderr'][:200]}")

    # Merge results if both passes succeeded
    if results[1]['success'] and results[2]['success']:
        merged_output = folder_path / f"{video.stem}.merged.whisperjav.srt"
        stats = merge_srt(results[1]['output'], results[2]['output'], merged_output, cfg['merge_strategy'])
        print(f"    ‚úì Merged: {stats['pass1_count']} + {stats['pass2_count']} ‚Üí {stats['merged_count']} subtitles")
        merged_outputs.append(merged_output)
    else:
        # Use whichever pass succeeded
        for p in [1, 2]:
            if results[p]['success']:
                # Copy to main folder with consistent naming
                final_output = folder_path / f"{video.stem}.whisperjav.srt"
                import shutil
                shutil.copy2(results[p]['output'], final_output)
                merged_outputs.append(final_output)
                print(f"    ‚ö† Using Pass {p} only (other pass failed)")
                break
        else:
            print(f"    ‚úó Both passes failed!")

    all_results.append(results)

# Store for Step 3
WHISPERJAV_NEW_SRTS = merged_outputs
WHISPERJAV_FOLDER_PATH = folder_path

status(f"\nCreated {len(merged_outputs)} merged subtitle file(s)")

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# ADD CREDITS
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
section("ADDING CREDITS")

if cfg['opening_credit'] or cfg['closing_credit']:
    credits_count = 0
    for srt_file in merged_outputs:
        try:
            content = srt_file.read_text(encoding='utf-8')
            if cfg['opening_credit']:
                content = f"0\n00:00:00,000 --> 00:00:00,500\n{cfg['opening_credit']}\n\n" + content
            if cfg['closing_credit']:
                content += f"\n9999\n23:59:58,000 --> 23:59:59,000\n{cfg['closing_credit']}\n"
            srt_file.write_text(content, encoding='utf-8')
            credits_count += 1
        except Exception as e:
            print(f"  Warning: Could not add credits to {srt_file.name}: {e}")
    status(f"Credits added to {credits_count} file(s)\")
else:
    status("No credits configured")

# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
# COMPLETE
# ‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê
section("TWO-PASS TRANSCRIPTION COMPLETE")

mode_text = "parallel" if PARALLEL_MODE else "sequential"
if cfg['subtitle_language'] == 'llm' and cfg['api_key']:
    display(HTML(f'<div style="padding:8px 10px;background:#fef9c3;border-radius:4px;border-left:2px solid #ca8a04;font-size:10px"><b>‚úì Transcription done ({mode_text})!</b> {len(merged_outputs)} file(s). Run Step 3 next to start AI Translation.</div>'))
else:
    display(HTML(f'<div style="padding:8px 10px;background:#f0fdf4;border-radius:4px;border-left:2px solid #16a34a;font-size:10px"><b>‚úì Done ({mode_text})!</b> {len(merged_outputs)} subtitle(s) saved to Google Drive/{cfg["folder_name"]}/</div>'))
    if cfg['subtitle_language'] == 'llm' and not cfg['api_key']:
        print("Note: AI translation skipped (no API key provided)")

    if cfg['auto_disconnect']:
        print("\nAuto-disconnecting in 10s to save GPU credits...")
        time.sleep(10)
        try:
            from google.colab import runtime
            runtime.unassign()
        except: pass

In [None]:
#@title Step 3: AI Translation (if selected) { display-mode: "form" }
#@markdown Translate each subtitle file using AI (only runs if "English (AI translate)" selected)

import os, sys, subprocess, shlex, time
from pathlib import Path
from IPython.display import display, HTML

def status(msg, ok=True):
    icon = "‚úì" if ok else "‚úó"
    print(f"{icon} {msg}")

def section(title):
    print(f"\n{'‚îÄ'*40}\n{title}\n{'‚îÄ'*40}")

# Check prerequisites
if 'WHISPERJAV_CONFIG' not in dir():
    display(HTML('<div style="padding:8px;background:#fef2f2;border-radius:4px;color:#991b1b;font-size:10px"><b>Error:</b> Run Step 1 first</div>'))
    raise SystemExit()

if 'WHISPERJAV_NEW_SRTS' not in dir():
    display(HTML('<div style="padding:8px;background:#fef2f2;border-radius:4px;color:#991b1b;font-size:10px"><b>Error:</b> Run Step 2 first</div>'))
    raise SystemExit()

cfg = WHISPERJAV_CONFIG
new_srts = WHISPERJAV_NEW_SRTS
folder_path = WHISPERJAV_FOLDER_PATH

# Check if AI translation is needed
if cfg['subtitle_language'] != 'llm':
    display(HTML('<div style="padding:8px 10px;background:#f0f9ff;border-radius:4px;border-left:2px solid #3b82f6;font-size:10px"><b>‚Ñπ Skipped:</b> AI translation not selected</div>'))
    raise SystemExit()

if not cfg['api_key']:
    display(HTML('<div style="padding:8px;background:#fef2f2;border-radius:4px;color:#991b1b;font-size:10px"><b>Error:</b> No API key provided for AI translation</div>'))
    raise SystemExit()

if not new_srts:
    display(HTML('<div style="padding:8px;background:#fef2f2;border-radius:4px;color:#991b1b;font-size:10px"><b>Error:</b> No subtitle files to translate</div>'))
    raise SystemExit()

# Set up API key
env_map = {
    "deepseek": "DEEPSEEK_API_KEY",
    "openrouter": "OPENROUTER_API_KEY",
    "gemini": "GEMINI_API_KEY",
    "claude": "ANTHROPIC_API_KEY",
    "gpt": "OPENAI_API_KEY"
}
os.environ[env_map.get(cfg['translation_service'], "API_KEY")] = cfg['api_key']

# Translate each SRT file
section("AI TRANSLATION")
print(f"Provider: {cfg['translation_service']}")
print(f"Style: {cfg['_translation_style']}")
print(f"Files to translate: {len(new_srts)}\n")

translated_files = []
failed_files = []

for i, srt_file in enumerate(new_srts, 1):
    print(f"[{i}/{len(new_srts)}] Translating: {srt_file.name}")

    translate_cmd = [
        'whisperjav-translate',
        '-i', str(srt_file),
        '--provider', cfg['translation_service'],
        '-t', 'english',
        '--tone', cfg['translation_style'],
        '--stream'
    ]

    full_cmd = shlex.join(translate_cmd)

    try:
        process = subprocess.Popen(
            full_cmd,
            shell=True,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            bufsize=1,
            universal_newlines=True
        )

        for line in process.stderr:
            print(f"    {line}", end='')

        stdout_output, _ = process.communicate()

        if process.returncode == 0:
            output_path = stdout_output.strip()
            if output_path:
                translated_files.append(Path(output_path))
            status(f"Completed: {srt_file.name}")
        else:
            status(f"Failed: {srt_file.name}", False)
            failed_files.append(srt_file)

    except Exception as e:
        status(f"Error translating {srt_file.name}: {e}", False)
        failed_files.append(srt_file)

    print()

# Complete
section("COMPLETE")

if failed_files:
    display(HTML(f'<div style="padding:8px 10px;background:#fef9c3;border-radius:4px;border-left:2px solid #ca8a04;font-size:10px"><b>‚ö† Partially done!</b> {len(translated_files)}/{len(new_srts)} translated. {len(failed_files)} failed.</div>'))
else:
    display(HTML(f'<div style="padding:8px 10px;background:#f0fdf4;border-radius:4px;border-left:2px solid #16a34a;font-size:10px"><b>‚úì All done!</b> {len(new_srts)} Japanese + {len(translated_files)} English subtitle(s) in Google Drive/{cfg["folder_name"]}/</div>'))

# Auto-disconnect
if cfg['auto_disconnect']:
    print("\nAuto-disconnecting in 10s to save GPU credits...")
    time.sleep(10)
    try:
        from google.colab import runtime
        runtime.unassign()
    except: pass
else:
    print("\nRemember to disconnect manually to save GPU credits.")