# üé¨ Japanese to Chinese Subtitle Generator (Whisper Edition)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gatorbonita/translator/blob/main/whispermode/Japanese_to_Chinese_Subtitles_Colab.ipynb)

Generate high-quality Chinese subtitles for Japanese videos using:
- **Whisper** for transcription (excellent Japanese accuracy!)
- **Google Translate** for translation
- **FREE GPU** from Google Colab

---

## üìã Quick Start

1. **Enable GPU**: Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator: **GPU** ‚Üí Save
2. **Run all cells**: Runtime ‚Üí Run all (Ctrl+F9)
3. **Upload your video** when prompted
4. **Wait for processing** (~3-5 min for 30-min video)
5. **Download your .srt file**

---

## ‚öôÔ∏è Step 1: Setup Environment

This will:
- Check if GPU is available
- Install required packages
- Takes ~2-3 minutes first time

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

# Check GPU availability
import torch
print("üîç Checking GPU availability...")
if torch.cuda.is_available():
    print(f"‚úÖ GPU detected: {torch.cuda.get_device_name(0)}")
    print(f"   VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
else:
    print("‚ö†Ô∏è  No GPU detected, will use CPU (slower)")
    print("   Enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU")

print("\nüì¶ Installing dependencies...")
print("This may take 2-3 minutes on first run.\n")

# Install dependencies
!pip install -q moviepy faster-whisper google-cloud-translate python-dotenv loguru

print("\n‚úÖ Setup complete!")

## üîê Step 2: Google Cloud Translation Setup

You need Google Cloud credentials **only for translation** (Whisper handles transcription locally).

### Option A: Upload Credentials File (Recommended)

Run the cell below and upload your `credentials.json` file when prompted.

### Option B: Skip Translation (Testing)

Set `SKIP_TRANSLATION = True` to skip translation and only test transcription.

---

**Don't have credentials?** [Quick Setup Guide](https://console.cloud.google.com):
1. Enable Cloud Translation API
2. Create Service Account ‚Üí Download JSON key
3. Upload the JSON file here

In [None]:
import os
from google.colab import files

# Set to True to skip translation (for testing transcription only)
SKIP_TRANSLATION = False

if not SKIP_TRANSLATION:
    print("üì§ Please upload your Google Cloud credentials JSON file...")
    uploaded = files.upload()
    
    if uploaded:
        cred_filename = list(uploaded.keys())[0]
        os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = cred_filename
        print(f"‚úÖ Credentials loaded: {cred_filename}")
    else:
        print("‚ö†Ô∏è  No credentials uploaded. Will skip translation.")
        SKIP_TRANSLATION = True
else:
    print("‚ö†Ô∏è  Translation disabled. Will only generate Japanese transcripts.")

## ‚öôÔ∏è Step 3: Configuration

Adjust settings here:

- **WHISPER_MODEL**: `tiny`, `base`, `small`, `medium`, `large-v3`
  - `medium` = Best balance (recommended)
  - `large-v3` = Best quality (slower)
  - `small` = Faster, good quality

- **TARGET_LANGUAGE**: `zh-CN` (Simplified) or `zh-TW` (Traditional)

- **DEVICE**: `auto` (uses GPU if available), `cpu`, or `cuda`

In [None]:
# Configuration
WHISPER_MODEL = 'medium'  # Options: tiny, base, small, medium, large-v3
TARGET_LANGUAGE = 'zh-CN'  # zh-CN = Simplified, zh-TW = Traditional
DEVICE = 'auto'  # auto = use GPU if available

print(f"‚öôÔ∏è  Configuration:")
print(f"   Whisper Model: {WHISPER_MODEL}")
print(f"   Target Language: {TARGET_LANGUAGE}")
print(f"   Device: {DEVICE}")
print(f"   Translation: {'Disabled' if SKIP_TRANSLATION else 'Enabled'}")

## üîß Step 4: Load Functions

Loading all the necessary functions...

In [None]:
from dataclasses import dataclassfrom pathlib import Pathfrom typing import Listfrom moviepy.editor import VideoFileClip, AudioFileClipfrom faster_whisper import WhisperModelfrom google.cloud import translate_v2 as translatefrom google.oauth2 import service_accountimport timefrom datetime import datetime@dataclassclass TranscriptSegment:    """Represents a subtitle segment."""    text: str    start_time: float    end_time: float    confidence: float = 1.0# Audio Extractiondef extract_audio(video_path, output_dir='/content'):    """Extract audio from video."""    print(f"\nüéµ Extracting audio from video...")    video = VideoFileClip(video_path)        if video.audio is None:        raise Exception("Video has no audio track!")        audio_path = f"{output_dir}/audio_temp.wav"    video.audio.write_audiofile(        audio_path,        fps=16000,        nbytes=2,        codec='pcm_s16le',        ffmpeg_params=['-ac', '1'],        logger=None,        verbose=False    )        duration = video.duration    video.close()        print(f"‚úÖ Audio extracted: {duration:.1f} seconds")    return audio_path, duration# Whisper Transcriptiondef transcribe_with_whisper(audio_path, model_size='medium', device='auto'):    """Transcribe audio with Whisper."""    print(f"\nüé§ Transcribing with Whisper ({model_size} model)...")        # Determine device and compute type    if device == 'auto':        if torch.cuda.is_available():            device = 'cuda'            compute_type = 'float16'            print("   Using GPU acceleration")        else:            device = 'cpu'            compute_type = 'int8'            print("   Using CPU")    elif device == 'cuda':        compute_type = 'float16'    else:        compute_type = 'int8'        # Load model    print("   Loading Whisper model...")    model = WhisperModel(model_size, device=device, compute_type=compute_type)        # Transcribe    print("   Transcribing... (this may take a few minutes)")    segments_generator, info = model.transcribe(        audio_path,        language='ja',        beam_size=5,        word_timestamps=True,        vad_filter=True,        vad_parameters=dict(min_silence_duration_ms=500)    )        print(f"   Detected language: {info.language} (confidence: {info.language_probability:.2%})")        # CRITICAL FIX: Convert generator to list immediately!    # faster-whisper returns a generator that can only be iterated once    print("   Converting segments to list...")    segments = list(segments_generator)    print(f"   ‚≠ê Whisper found {len(segments)} raw segments covering full audio")        # Process segments    transcript_segments = []    print("   Processing segments into subtitles...")    for i, segment in enumerate(segments):        if (i + 1) % 20 == 0:            print(f"      ‚ñ∂ Processing segment {i+1}/{len(segments)}...")        if segment.words:            # Group words into subtitle-friendly segments            word_segments = create_segments_from_words(segment.words)            transcript_segments.extend(word_segments)        else:            transcript_segments.append(                TranscriptSegment(                    text=segment.text.strip(),                    start_time=segment.start,                    end_time=segment.end,                    confidence=1.0                )            )        print(f"‚úÖ Transcription complete: {len(transcript_segments)} segments")    return transcript_segmentsdef create_segments_from_words(words, max_duration=5.0, max_chars=80):    """Group words into subtitle segments."""    segments = []    current_words = []    current_start = None    sentence_endings = {'„ÄÇ', 'ÔºÅ', 'Ôºü', '„ÄÅ'}        for word in words:        if current_start is None:            current_start = word.start                current_words.append(word.word)        current_end = word.end        duration = current_end - current_start        text = ''.join(current_words).strip()                should_finalize = False        if duration >= max_duration or len(text) >= max_chars:            should_finalize = True        elif any(text.endswith(punct) for punct in sentence_endings):            if len(text) > 10 or duration > 1.0:                should_finalize = True                if should_finalize:            segments.append(TranscriptSegment(                text=text,                start_time=current_start,                end_time=current_end,                confidence=1.0            ))            current_words = []            current_start = None        if current_words:        text = ''.join(current_words).strip()        if text:            segments.append(TranscriptSegment(                text=text,                start_time=current_start,                end_time=current_end,                confidence=1.0            ))        return segments# Translationdef translate_segments(segments, target_language='zh-CN', batch_size=128):    """Translate segments to Chinese."""    if not segments:        return []        print(f"\nüåè Translating to {target_language}...")        # Initialize translator    cred_path = os.getenv('GOOGLE_APPLICATION_CREDENTIALS')    credentials = service_account.Credentials.from_service_account_file(cred_path)    client = translate.Client(credentials=credentials)        translated_segments = []        # Process in batches    for i in range(0, len(segments), batch_size):        batch = segments[i:i + batch_size]        texts = [seg.text for seg in batch]                # Translate with retry        for attempt in range(3):            try:                results = client.translate(texts, target_language=target_language, source_language='ja')                if isinstance(results, dict):                    translated_texts = [results['translatedText']]                else:                    translated_texts = [r['translatedText'] for r in results]                break            except Exception as e:                if attempt == 2:                    raise Exception(f"Translation failed: {e}")                time.sleep(2 ** attempt)                # Create translated segments        for segment, translated_text in zip(batch, translated_texts):            translated_segments.append(TranscriptSegment(                text=translated_text,                start_time=segment.start_time,                end_time=segment.end_time,                confidence=segment.confidence            ))                if len(segments) > batch_size:            print(f"   Translated {min(i + batch_size, len(segments))}/{len(segments)} segments")        print(f"‚úÖ Translation complete: {len(translated_segments)} segments")    return translated_segments# SRT Generationdef generate_srt(segments, output_path):    """Generate SRT subtitle file."""    print(f"\nüìù Generating SRT file...")        # Merge short segments    merged = merge_short_segments(segments)        with open(output_path, 'w', encoding='utf-8') as f:        for i, segment in enumerate(merged, start=1):            f.write(f"{i}\n")            start_ts = format_timestamp(segment.start_time)            end_ts = format_timestamp(segment.end_time)            f.write(f"{start_ts} --> {end_ts}\n")            f.write(f"{segment.text}\n\n")        print(f"‚úÖ SRT file created: {output_path}")    return output_pathdef format_timestamp(seconds):    """Convert seconds to SRT timestamp format."""    hours = int(seconds // 3600)    minutes = int((seconds % 3600) // 60)    secs = int(seconds % 60)    millisecs = int((seconds % 1) * 1000)    return f"{hours:02d}:{minutes:02d}:{secs:02d},{millisecs:03d}"def merge_short_segments(segments, min_duration=1.0, max_chars=80):    """Merge segments that are too short."""    if not segments:        return []        merged = []    current = None        for segment in segments:        if current is None:            current = TranscriptSegment(                text=segment.text,                start_time=segment.start_time,                end_time=segment.end_time,                confidence=segment.confidence            )            continue                duration = current.end_time - current.start_time        combined_text = current.text + " " + segment.text                should_merge = (            duration < min_duration or            (len(combined_text) <= max_chars and segment.start_time - current.end_time < 1.0)        )                if should_merge:            current.text = combined_text            current.end_time = segment.end_time        else:            merged.append(current)            current = TranscriptSegment(                text=segment.text,                start_time=segment.start_time,                end_time=segment.end_time,                confidence=segment.confidence            )        if current is not None:        merged.append(current)        return mergedprint("‚úÖ Functions loaded!")

## üì§ Step 5: Upload Video

Upload your Japanese video file. Supported formats:
- MP4, MKV, AVI, MOV, WebM, and more

**Note**: Large files (>100MB) may take time to upload. Consider using Google Drive for very large files.

In [None]:
from google.colab import files

print("üì§ Please upload your video file...")
uploaded = files.upload()

if uploaded:
    video_filename = list(uploaded.keys())[0]
    video_path = f"/content/{video_filename}"
    print(f"\n‚úÖ Video uploaded: {video_filename}")
    print(f"   Size: {len(uploaded[video_filename]) / (1024*1024):.1f} MB")
else:
    raise Exception("No video file uploaded!")

## üöÄ Step 6: Process Video

This will:
1. Extract audio from video
2. Transcribe with Whisper (Japanese)
3. Translate to Chinese (if enabled)
4. Generate SRT subtitle file

**Estimated time** (30-min video with GPU):
- Audio extraction: ~30 seconds
- Whisper transcription: ~2-4 minutes
- Translation: ~10 seconds
- **Total: ~3-5 minutes**

In [None]:
import time

start_time = time.time()

try:
    # Step 1: Extract audio
    audio_path, duration = extract_audio(video_path)
    
    # Step 2: Transcribe with Whisper
    japanese_segments = transcribe_with_whisper(
        audio_path,
        model_size=WHISPER_MODEL,
        device=DEVICE
    )
    
    # Step 3: Translate (if enabled)
    if SKIP_TRANSLATION:
        print("\n‚ö†Ô∏è  Skipping translation (disabled)")
        final_segments = japanese_segments
        output_suffix = "_ja.srt"  # Japanese only
    else:
        final_segments = translate_segments(
            japanese_segments,
            target_language=TARGET_LANGUAGE
        )
        output_suffix = f"_{TARGET_LANGUAGE}.srt"
    
    # Step 4: Generate SRT
    video_name = Path(video_filename).stem
    output_path = f"/content/{video_name}{output_suffix}"
    generate_srt(final_segments, output_path)
    
    # Cleanup
    import os
    if os.path.exists(audio_path):
        os.remove(audio_path)
    
    # Summary
    elapsed = time.time() - start_time
    print("\n" + "="*60)
    print("üéâ SUBTITLE GENERATION COMPLETE!")
    print("="*60)
    print(f"Video duration: {duration:.1f} seconds ({duration/60:.1f} minutes)")
    print(f"Processing time: {elapsed:.1f} seconds ({elapsed/60:.1f} minutes)")
    print(f"Segments: {len(final_segments)}")
    print(f"Model: {WHISPER_MODEL}")
    print(f"Output: {output_path}")
    print("="*60)
    
    SUBTITLE_FILE = output_path
    
except Exception as e:
    print(f"\n‚ùå Error: {e}")
    import traceback
    traceback.print_exc()

## üì• Step 7: Download Subtitle File

Download your generated subtitle file!

In [None]:
from google.colab import files

try:
    if 'SUBTITLE_FILE' in globals() and os.path.exists(SUBTITLE_FILE):
        print(f"üì• Downloading: {SUBTITLE_FILE}")
        files.download(SUBTITLE_FILE)
        print("\n‚úÖ Download started! Check your browser's download folder.")
        print("\nüì∫ To use:")
        print("   1. Open video in VLC player")
        print("   2. Subtitle ‚Üí Add Subtitle File")
        print("   3. Select the downloaded .srt file")
    else:
        print("‚ùå No subtitle file found. Please run Step 6 first.")
except Exception as e:
    print(f"‚ùå Download error: {e}")

## üëÄ (Optional) Preview Subtitles

Preview the first 10 subtitle entries:

In [None]:
if 'SUBTITLE_FILE' in globals() and os.path.exists(SUBTITLE_FILE):
    print("üìñ Preview of first 10 subtitles:\n")
    print("="*60)
    
    with open(SUBTITLE_FILE, 'r', encoding='utf-8') as f:
        lines = f.readlines()
        preview_lines = []
        count = 0
        
        for line in lines:
            preview_lines.append(line)
            if line.strip() == '' and len(preview_lines) > 1:
                count += 1
                if count >= 10:
                    break
        
        print(''.join(preview_lines))
    
    print("="*60)
else:
    print("‚ùå No subtitle file found. Please run Step 6 first.")

---

## üí° Tips & Troubleshooting

### For Better Results:
- ‚úÖ **Use GPU**: Runtime ‚Üí Change runtime type ‚Üí GPU
- ‚úÖ **Better quality**: Use `large-v3` model (slower)
- ‚úÖ **Faster processing**: Use `small` or `base` model
- ‚úÖ **Clear audio**: Remove background music if possible

### Common Issues:

**"Out of memory"**
- Use smaller model: `WHISPER_MODEL = 'small'`
- Runtime ‚Üí Factory reset runtime

**"No GPU detected"**
- Runtime ‚Üí Change runtime type ‚Üí GPU ‚Üí Save
- Restart runtime if needed

**"Translation error"**
- Check credentials file is uploaded
- Verify Translation API is enabled
- Set `SKIP_TRANSLATION = True` to test transcription only

**"Video upload fails"**
- For files >100MB, use Google Drive:
  ```python
  from google.colab import drive
  drive.mount('/content/drive')
  video_path = '/content/drive/MyDrive/your-video.mp4'
  ```

### Model Comparison:

| Model | VRAM | Speed | Quality |
|-------|------|-------|--------|
| tiny | 1GB | ‚ö°‚ö°‚ö°‚ö°‚ö° | ‚≠ê‚≠ê |
| base | 1GB | ‚ö°‚ö°‚ö°‚ö° | ‚≠ê‚≠ê‚≠ê |
| small | 2GB | ‚ö°‚ö°‚ö° | ‚≠ê‚≠ê‚≠ê‚≠ê |
| medium | 5GB | ‚ö°‚ö° | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê |
| large-v3 | 10GB | ‚ö° | ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê |

**Colab Free GPU**: ~15GB VRAM, can run up to `large-v3`

---

## üìö Resources

- **GitHub Repository**: [gatorbonita/translator](https://github.com/gatorbonita/translator)
- **Whisper Edition Docs**: [whispermode/README.md](https://github.com/gatorbonita/translator/blob/main/whispermode/README.md)
- **Google Cloud Setup**: [Translation API Guide](https://cloud.google.com/translate/docs/setup)

---

<div align="center">

**üåü Enjoying this tool? Star the repo on GitHub!**

Made with ‚ù§Ô∏è using Whisper + Google Translate

</div>