Skip to content

kwanghyunyoon/MultiLanguageTimingTool

Repository files navigation

Multi-Language Timing Sync Tool

Compare scripts in multiple languages (English, Spanish, Korean, Hindi) and calculate precise timing variance for YouTube video synchronization using ElevenLabs text-to-speech character-level timestamps.

Overview

This tool solves a critical problem: multilingual video production timing inconsistency. When you create videos with synchronized text overlays in multiple languages, each language takes different amounts of time to speak:

  • Spanish typically takes ~5% longer than English
  • Korean takes ~10% longer than English
  • Hindi takes ~12% longer than English

This tool calculates the exact time differences and recommends how much buffer (silence) you need to add to each language version to keep overlays synchronized within ±80ms (imperceptible to viewers).

Key Features

Accurate Multilingual Timing — Uses ElevenLabs API /with-timestamps endpoint for character-level precision
UTF-8 Grapheme Aware — Properly handles Hindi (Devanagari) and Korean (Hangul) combining characters
Buffer Recommendations — Tells you exactly how much silence to add to sync videos
Tolerance-Based Status — Automatically flags whether each language is "OK" or needs "Adjust" or "Cut"
Detailed Variance Report — All 6 pairwise language comparisons + summary statistics
Simple REST API — Single POST endpoint with JSON input/output

Installation

Prerequisites

  • Node.js 18+ (for Intl.Segmenter API support)
  • ElevenLabs Account with API key and voice IDs
  • npm or yarn

Setup

  1. Clone and install dependencies:
cd MultiLanguageTimingTool
npm install
  1. Configure environment variables:
cp .env.example .env

Then edit .env and add:

Example .env:

ELEVENLABS_API_KEY=sk_abc123def456...
ELEVENLABS_MODEL_ID=eleven_multilingual_v2
VOICE_ID_EN=voice_EXAVITQu4MsJ60DaXUN1
VOICE_ID_ES=voice_EXAVITQu4MsJ60DaXUN2
VOICE_ID_KO=voice_EXAVITQu4MsJ60DaXUN3
VOICE_ID_HI=voice_EXAVITQu4MsJ60DaXUN4
PORT=3000
NODE_ENV=development
  1. Start the development server:
npm run dev

Server will start on http://localhost:3000

Usage

API Endpoint

POST /api/v1/compare

Compares 4-language scripts and returns timing variance report.

Request Example

curl -X POST http://localhost:3000/api/v1/compare \
  -H "Content-Type: application/json" \
  -d '{
    "en": "Welcome to our channel! Today we are discussing the importance of language learning.",
    "es": "¡Bienvenido a nuestro canal! Hoy vamos a discutir la importancia del aprendizaje de idiomas.",
    "ko": "저희 채널에 오신 것을 환영합니다! 오늘은 언어 학습의 중요성에 대해 논의하겠습니다.",
    "hi": "हमारे चैनल में आपका स्वागत है! आज हम भाषा सीखने के महत्व पर चर्चा करेंगे।"
  }'

Response Example

{
  "success": true,
  "data": {
    "timestamp": "2026-04-11T14:30:00.000Z",
    "referenceLanguage": "es",
    "referenceDurationMs": 4523,
    "toleranceMs": 80,
    "languages": [
      {
        "language": "en",
        "durationMs": 4250,
        "varianceFromReferenceMs": -273,
        "recommendedBufferMs": 273,
        "withinTolerance": false,
        "status": "Adjust"
      },
      {
        "language": "es",
        "durationMs": 4523,
        "varianceFromReferenceMs": 0,
        "recommendedBufferMs": 0,
        "withinTolerance": true,
        "status": "OK"
      },
      {
        "language": "ko",
        "durationMs": 4180,
        "varianceFromReferenceMs": -343,
        "recommendedBufferMs": 343,
        "withinTolerance": false,
        "status": "Adjust"
      },
      {
        "language": "hi",
        "durationMs": 4410,
        "varianceFromReferenceMs": -113,
        "recommendedBufferMs": 113,
        "withinTolerance": true,
        "status": "OK"
      }
    ],
    "allPairVariances": [
      {
        "language1": "en",
        "language2": "es",
        "varianceMs": -273,
        "absVarianceMs": 273
      },
      ...
    ],
    "summary": {
      "maxVarianceMs": 343,
      "minVarianceMs": 0,
      "averageVarianceMs": 182,
      "allWithinTolerance": false
    }
  },
  "timestamp": "2026-04-11T14:30:00.000Z"
}

Understanding the Response

Reference Language: The language with the longest duration. All others are compared against this baseline.

Buffer Recommendations:

  • Status "OK" — Variance within ±80ms tolerance, no action needed
  • Status "Adjust" — Add silence (in milliseconds) to match reference duration
  • Status "Cut" — Language exceeds reference, needs trimming

Example Workflow:

  1. Run tool with your 4-language scripts
  2. Get buffer recommendations (e.g., English needs +273ms silence)
  3. In your video editor:
    • English: Add 273ms silence at the end
    • Spanish: No change (reference language)
    • Korean: Add 343ms silence at the end
    • Hindi: Add 113ms silence at the end
  4. Now all overlays sync within ±80ms (imperceptible difference)

Development

Project Structure

src/
├── server.ts                    # Express app entry point
├── types/index.ts               # TypeScript interfaces
├── utils/
│   ├── characterCounting.ts     # UTF-8 grapheme handling
│   ├── timingCalculations.ts    # Core variance algorithm
│   ├── visualCueGenerator.ts    # CSV visual cue generation (Phase 4)
│   ├── csvFormatter.ts          # CSV serialization + UTF-8 BOM (Phase 4)
│   └── encodingValidator.ts     # Unicode encoding validation (Phase 4)
├── services/
│   ├── elevenLabsClient.ts      # ElevenLabs API wrapper
│   ├── translationService.ts    # Translation stub (Phase 4)
│   └── csvGenerator.ts          # CSV export orchestrator (Phase 4)
└── routes/
    └── compareRoutes.ts         # POST /compare + POST /export-csv endpoints

tests/
├── characterCounting.test.ts    # 22 character tests
├── timingCalculations.test.ts   # 22 variance tests
├── csvGenerator.test.ts         # CSV generation & validation tests (Phase 4)
└── visualCueGenerator.test.ts   # Visual cue generation tests (Phase 4)

Running Tests

# Run all tests
npm test

# Run tests in watch mode
npm test -- --watch

# Run specific test file
npm test characterCounting.test.ts

Building for Production

npm run build

# Output artifacts in dist/ folder
ls dist/
  server.js
  types/
  utils/
  services/
  routes/

Configuration

ElevenLabs Model Options

Recommended models (all support multilingual):

Model Speed Cost Best For
eleven_multilingual_v2 Medium Standard Production (stable & reliable)
eleven_flash_v2_5 Fastest 50% cheaper Quick testing, high volume
eleven_multilingual_sts_v2 Medium Standard Streaming (lower latency)

Change model by editing ELEVENLABS_MODEL_ID in .env.

Timing Tolerance

Default tolerance is ±80ms (YouTube/streaming standard).

To modify, edit TOLERANCE_MS constant in src/utils/timingCalculations.ts:

const TOLERANCE_MS = 80; // Change to desired threshold in milliseconds

Then rebuild:

npm run build

Troubleshooting

Missing API Key Error

Error: Missing ELEVENLABS_API_KEY in environment variables

Solution: Copy .env.example to .env and add your API key from https://elevenlabs.io/app/api-keys

Invalid Voice ID Error

ElevenLabs voice not found for language: en. Check voice IDs in .env.

Solution:

  1. Go to https://elevenlabs.io/app/voice-lab
  2. Copy the voice ID for each voice
  3. Paste into corresponding VOICE_ID_* field in .env
  4. Voice IDs start with voice_ (e.g., voice_EXAVITQu4MsJ60DaXUN1)

Rate Limit Error

ElevenLabs API rate limit exceeded. Retry after a moment.

Solution: Wait a moment and retry. ElevenLabs has rate limits based on your plan. Check your usage at https://elevenlabs.io/app/usage.

Character Encoding Issues

Make sure your script files are saved as UTF-8 with BOM removed:

  • VS Code: Select UTF-8 in bottom right, not UTF-8 with BOM
  • MacOS/Linux: Verify with file -b -i scriptfile.txt (should show charset=utf-8)

API Documentation

Endpoints

POST /api/v1/compare

Compare 4-language scripts and get timing variance.

Request Body:

{
  "en": "English script text (max 10,000 chars)",
  "es": "Spanish script text (max 10,000 chars)",
  "ko": "Korean script text (max 10,000 chars)",
  "hi": "Hindi script text (max 10,000 chars)"
}

Success Response (200):

{
  "success": true,
  "data": { /* TimeVarianceReport */ },
  "timestamp": "2026-04-11T14:30:00.000Z"
}

Error Responses:

  • 400 — Missing or invalid scripts
  • 401 — Invalid API key
  • 404 — Voice not found
  • 429 — Rate limit exceeded
  • 503 — ElevenLabs service unavailable

POST /api/v1/export-csv

Export 4-language scripts as a Canva-ready CSV with visual cues (Phase 4 feature).

This endpoint generates a CSV file with dual text columns for each language:

  • Audio — Full script text (what ElevenLabs speaks)
  • Canva_Text — Shortened visual cues (reduced cognitive load for viewers)

Visual cues are automatically generated by extracting key phrases and preserving duration markers (e.g., "Breathe deeply for 4 seconds" → "Breathe Deeply (4s)").

Request Body:

{
  "en": "Breathe in for 4 seconds.\nHold for 4 seconds.\nExhale for 4 seconds.",
  "es": "Respira durante 4 segundos.\nMantén durante 4 segundos.\nExhala durante 4 segundos.",
  "ko": "4초 동안 숨을 마세요.\n4초 동안 유지하세요.\n4초 동안 숨을 내쉬세요.",
  "hi": "4 सेकंड के लिए सांस लें।\n4 सेकंड तक रोकें।\n4 सेकंड के लिए सांस छोड़ें।"
}

Success Response (200):

{
  "success": true,
  "data": {
    "totalScenes": 3,
    "rows": [
      {
        "sceneNumber": 1,
        "enAudio": "Breathe in for 4 seconds.",
        "enCanvaText": "Breathe In (4s)",
        "esAudio": "Respira durante 4 segundos.",
        "esCanvaText": "Respira (4s)",
        "koAudio": "4초 동안 숨을 마세요.",
        "koCanvaText": "숨 (4s)",
        "hiAudio": "4 सेकंड के लिए सांस लें।",
        "hiCanvaText": "सांस (4s)"
      },
      ...
    ],
    "csvContent": "[UTF-8 BOM]Scene_Number,English_Audio,English_Canva_Text,...",
    "metadata": {
      "generatedAt": "2026-04-12T10:30:00.000Z",
      "languagesIncluded": ["en", "es", "ko", "hi"],
      "encoding": "UTF-8"
    }
  },
  "timestamp": "2026-04-12T10:30:00.000Z"
}

CSV Column Structure:

Column Content Purpose
Scene_Number 1, 2, 3, ... Sequential scene identifier
English_Audio Full text What narrator reads aloud
English_Canva_Text Visual cue Shortened text for on-screen display
Spanish_Audio Full text Spanish narration
Spanish_Canva_Text Visual cue Spanish on-screen display
Korean_Audio Full text Korean narration
Korean_Canva_Text Visual cue Korean on-screen display
Hindi_Audio Full text Hindi narration
Hindi_Canva_Text Visual cue Hindi on-screen display

Import into Canva:

  1. Download the CSV from the API response (copy csvContent)
  2. In Canva, create a spreadsheet or table element
  3. Import CSV: Paste content into spreadsheet import dialog
  4. Map columns: English_Canva_Text, Spanish_Canva_Text, Korean_Canva_Text, Hindi_Canva_Text become your overlay text
  5. The *_Audio columns are for reference/backup (narrator's full scripts)

Unicode Handling:

Automatically handles international characters:

  • Hindi (Devanagari): हिंदी, नमस्ते, स्वागत है
  • Korean (Hangul): 한국어, 반갑습니다, 환영합니다

The CSV includes a UTF-8 BOM (Byte Order Mark) to ensure Excel, Google Sheets, and Canva correctly interpret these scripts.

Error Responses:

  • 400 — Missing or invalid scripts, exceeds 10,000 character limit
  • 500 — CSV encoding validation failed

Example cURL:

curl -X POST http://localhost:3000/api/v1/export-csv \
  -H "Content-Type: application/json" \
  -d '{
    "en": "Breathe in for 4 seconds.\nHold for 4 seconds.",
    "es": "Respira durante 4 segundos.\nMantén durante 4 segundos.",
    "ko": "4초 동안 숨을 마세요.\n4초 동안 유지하세요.",
    "hi": "4 सेकंड के लिए सांस लें।\n4 सेकंड तक रोकें।"
  }'

GET /api/v1/health

Health check endpoint.

Response (200):

{
  "success": true,
  "data": {
    "status": "healthy",
    "timestamp": "2026-04-11T14:30:00.000Z"
  },
  "timestamp": "2026-04-11T14:30:00.000Z"
}

GET /

API information and available endpoints.

Performance & Scaling

Current Limitations

  • Single full-script comparison (no scene/segment support yet)
  • Sequential ElevenLabs API calls (parallelized internally)
  • 10,000 character max per language (ElevenLabs billing optimization)
  • No caching (call ElevenLabs fresh each time for accuracy)

Future Optimizations

  • Batch processing for multiple script sets
  • Response caching with TTL
  • WebSocket streaming for real-time processing
  • Segment-based timing (scene-by-scene granularity)
  • Fallback estimation logic (for dev/testing without API)

Data Types

TimeVarianceReport

Complete analysis of timing variance across all languages.

{
  timestamp: string;                    // ISO 8601 timestamp
  referenceLanguage: 'en'|'es'|'ko'|'hi'; // Language with longest duration
  referenceDurationMs: number;          // Duration of reference language (ms)
  toleranceMs: number;                  // Acceptable variance threshold (±ms)
  languages: BufferRecommendation[];    // Recommendations for each language
  allPairVariances: LanguagePairVariance[]; // All 6 pairwise comparisons
  summary: {
    maxVarianceMs: number;              // Largest variance found
    minVarianceMs: number;              // Smallest variance found
    averageVarianceMs: number;          // Mean variance
    allWithinTolerance: boolean;        // All within ±80ms?
  };
}

Roadmap

✅ Phase 4: CSV Export with Visual Cues (Completed)

New Features:

  • POST /api/v1/export-csv endpoint for Canva-ready CSV exports
  • Automatic visual cue generation (shortened text + duration markers)
  • UTF-8 BOM support for Hindi (Devanagari) and Korean (Hangul)
  • Dual-column CSV format: Audio (full text) + Canva_Text (visual cues)
  • Comprehensive test coverage (csvGenerator, visualCueGenerator)

MVP Scope:

  • Stubbed translation service (predefined phrase mappings)
  • Scene numbering by line breaks
  • CSV validation and encoding checks

🔄 Phase 5: Real Translation Integration (Upcoming)

Planned Features:

  • Replace stubbed translation with real API:
    • Google Translate API (cost-effective for production)
    • OpenAI GPT-4 (better quality for wellness domain)
  • Translation caching to reduce API costs
  • Batch translation for multiple scripts
  • Domain-specific terminology for wellness/meditation scripts

Technical Details:

  • Create translationService.ts with pluggable API adapters
  • Add .env config for translation API selection
  • Implement rate-limiting and retry logic
  • Add profanity/safety checks for wellness content

📋 Phase 6: Scene Markers & Custom Cues (Future)

Planned Features:

  • Support explicit scene markers: [SCENE 1], [SCENE 2]
  • Custom cue templates per wellness category:
    • Breathing: Preserve duration + breath direction (e.g., "Breathe In (4s)")
    • Gratitude: Emphasize action (e.g., "Feel Gratitude")
    • Affirmations: Highlight affirmation text (e.g., "I Am Calm")
  • User-configurable cue length (currently fixed at ≤50 chars)
  • Metadata export (duration markers, scene markers)

🔗 Phase 7: Canva API Direct Integration (Future)

Planned Features:

  • Direct Canva API integration (replace CSV export)
  • Auto-apply visual cues to Canva designs
  • Update overlays with timing buffers automatically
  • Support for multiple design templates

License

ISC

Support

For issues with:

  • This tool: Create an issue on GitHub
  • ElevenLabs API: Contact https://support.elevenlabs.io
  • YouTube overlay sync: Refer to your video editor's documentation (DaVinci Resolve, Premiere Pro, etc.)

Next Steps

After getting buffer recommendations:

  1. Export timing report as CSV or JSON
  2. Add silence to each language track in your video editor
    • Most editors: Audio track > Insert silence > Specify duration
  3. Verify sync by watching the video with all overlays
  4. Fine-tune if needed (usually ±10-20ms manual adjustment suffices)
  5. Render and upload to YouTube

Happy multicultural content creation! 🌍🎬


Made with ❤️ for multilingual creators

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors