Multi-Language Timing Sync Tool

Compare scripts in multiple languages (English, Spanish, Korean, Hindi) and calculate precise timing variance for YouTube video synchronization using ElevenLabs text-to-speech character-level timestamps.

Overview

This tool solves a critical problem: multilingual video production timing inconsistency. When you create videos with synchronized text overlays in multiple languages, each language takes different amounts of time to speak:

Spanish typically takes ~5% longer than English
Korean takes ~10% longer than English
Hindi takes ~12% longer than English

This tool calculates the exact time differences and recommends how much buffer (silence) you need to add to each language version to keep overlays synchronized within ±80ms (imperceptible to viewers).

Key Features

✅ Accurate Multilingual Timing — Uses ElevenLabs API /with-timestamps endpoint for character-level precision
✅ UTF-8 Grapheme Aware — Properly handles Hindi (Devanagari) and Korean (Hangul) combining characters
✅ Buffer Recommendations — Tells you exactly how much silence to add to sync videos
✅ Tolerance-Based Status — Automatically flags whether each language is "OK" or needs "Adjust" or "Cut"
✅ Detailed Variance Report — All 6 pairwise language comparisons + summary statistics
✅ Simple REST API — Single POST endpoint with JSON input/output

Installation

Prerequisites

Node.js 18+ (for Intl.Segmenter API support)
ElevenLabs Account with API key and voice IDs
npm or yarn

Setup

Clone and install dependencies:

cd MultiLanguageTimingTool
npm install

Configure environment variables:

cp .env.example .env

Then edit .env and add:

Your ElevenLabs API key (from https://elevenlabs.io/app/api-keys)
Voice IDs for each language (from https://elevenlabs.io/app/voice-lab)

Example .env:

ELEVENLABS_API_KEY=sk_abc123def456...
ELEVENLABS_MODEL_ID=eleven_multilingual_v2
VOICE_ID_EN=voice_EXAVITQu4MsJ60DaXUN1
VOICE_ID_ES=voice_EXAVITQu4MsJ60DaXUN2
VOICE_ID_KO=voice_EXAVITQu4MsJ60DaXUN3
VOICE_ID_HI=voice_EXAVITQu4MsJ60DaXUN4
PORT=3000
NODE_ENV=development

Start the development server:

npm run dev

Server will start on http://localhost:3000

Usage

API Endpoint

POST /api/v1/compare

Compares 4-language scripts and returns timing variance report.

Request Example

curl -X POST http://localhost:3000/api/v1/compare \
  -H "Content-Type: application/json" \
  -d '{
    "en": "Welcome to our channel! Today we are discussing the importance of language learning.",
    "es": "¡Bienvenido a nuestro canal! Hoy vamos a discutir la importancia del aprendizaje de idiomas.",
    "ko": "저희 채널에 오신 것을 환영합니다! 오늘은 언어 학습의 중요성에 대해 논의하겠습니다.",
    "hi": "हमारे चैनल में आपका स्वागत है! आज हम भाषा सीखने के महत्व पर चर्चा करेंगे।"
  }'

Response Example

{
  "success": true,
  "data": {
    "timestamp": "2026-04-11T14:30:00.000Z",
    "referenceLanguage": "es",
    "referenceDurationMs": 4523,
    "toleranceMs": 80,
    "languages": [
      {
        "language": "en",
        "durationMs": 4250,
        "varianceFromReferenceMs": -273,
        "recommendedBufferMs": 273,
        "withinTolerance": false,
        "status": "Adjust"
      },
      {
        "language": "es",
        "durationMs": 4523,
        "varianceFromReferenceMs": 0,
        "recommendedBufferMs": 0,
        "withinTolerance": true,
        "status": "OK"
      },
      {
        "language": "ko",
        "durationMs": 4180,
        "varianceFromReferenceMs": -343,
        "recommendedBufferMs": 343,
        "withinTolerance": false,
        "status": "Adjust"
      },
      {
        "language": "hi",
        "durationMs": 4410,
        "varianceFromReferenceMs": -113,
        "recommendedBufferMs": 113,
        "withinTolerance": true,
        "status": "OK"
      }
    ],
    "allPairVariances": [
      {
        "language1": "en",
        "language2": "es",
        "varianceMs": -273,
        "absVarianceMs": 273
      },
      ...
    ],
    "summary": {
      "maxVarianceMs": 343,
      "minVarianceMs": 0,
      "averageVarianceMs": 182,
      "allWithinTolerance": false
    }
  },
  "timestamp": "2026-04-11T14:30:00.000Z"
}

Understanding the Response

Reference Language: The language with the longest duration. All others are compared against this baseline.

Buffer Recommendations:

Status "OK" — Variance within ±80ms tolerance, no action needed
Status "Adjust" — Add silence (in milliseconds) to match reference duration
Status "Cut" — Language exceeds reference, needs trimming

Example Workflow:

Run tool with your 4-language scripts
Get buffer recommendations (e.g., English needs +273ms silence)
In your video editor:
- English: Add 273ms silence at the end
- Spanish: No change (reference language)
- Korean: Add 343ms silence at the end
- Hindi: Add 113ms silence at the end
Now all overlays sync within ±80ms (imperceptible difference)

Development

Project Structure

src/
├── server.ts                    # Express app entry point
├── types/index.ts               # TypeScript interfaces
├── utils/
│   ├── characterCounting.ts     # UTF-8 grapheme handling
│   ├── timingCalculations.ts    # Core variance algorithm
│   ├── visualCueGenerator.ts    # CSV visual cue generation (Phase 4)
│   ├── csvFormatter.ts          # CSV serialization + UTF-8 BOM (Phase 4)
│   └── encodingValidator.ts     # Unicode encoding validation (Phase 4)
├── services/
│   ├── elevenLabsClient.ts      # ElevenLabs API wrapper
│   ├── translationService.ts    # Translation stub (Phase 4)
│   └── csvGenerator.ts          # CSV export orchestrator (Phase 4)
└── routes/
    └── compareRoutes.ts         # POST /compare + POST /export-csv endpoints

tests/
├── characterCounting.test.ts    # 22 character tests
├── timingCalculations.test.ts   # 22 variance tests
├── csvGenerator.test.ts         # CSV generation & validation tests (Phase 4)
└── visualCueGenerator.test.ts   # Visual cue generation tests (Phase 4)

Running Tests

# Run all tests
npm test

# Run tests in watch mode
npm test -- --watch

# Run specific test file
npm test characterCounting.test.ts

Building for Production

npm run build

# Output artifacts in dist/ folder
ls dist/
  server.js
  types/
  utils/
  services/
  routes/

Configuration

ElevenLabs Model Options

Recommended models (all support multilingual):

Model	Speed	Cost	Best For
`eleven_multilingual_v2`	Medium	Standard	Production (stable & reliable)
`eleven_flash_v2_5`	Fastest	50% cheaper	Quick testing, high volume
`eleven_multilingual_sts_v2`	Medium	Standard	Streaming (lower latency)

Change model by editing ELEVENLABS_MODEL_ID in .env.

Timing Tolerance

Default tolerance is ±80ms (YouTube/streaming standard).

To modify, edit TOLERANCE_MS constant in src/utils/timingCalculations.ts:

const TOLERANCE_MS = 80; // Change to desired threshold in milliseconds

Then rebuild:

npm run build

Troubleshooting

Missing API Key Error

Error: Missing ELEVENLABS_API_KEY in environment variables

Solution: Copy .env.example to .env and add your API key from https://elevenlabs.io/app/api-keys

Invalid Voice ID Error

ElevenLabs voice not found for language: en. Check voice IDs in .env.

Solution:

Go to https://elevenlabs.io/app/voice-lab
Copy the voice ID for each voice
Paste into corresponding VOICE_ID_* field in .env
Voice IDs start with voice_ (e.g., voice_EXAVITQu4MsJ60DaXUN1)

Rate Limit Error

ElevenLabs API rate limit exceeded. Retry after a moment.

Solution: Wait a moment and retry. ElevenLabs has rate limits based on your plan. Check your usage at https://elevenlabs.io/app/usage.

Character Encoding Issues

Make sure your script files are saved as UTF-8 with BOM removed:

VS Code: Select UTF-8 in bottom right, not UTF-8 with BOM
MacOS/Linux: Verify with file -b -i scriptfile.txt (should show charset=utf-8)

API Documentation

Endpoints

POST `/api/v1/compare`

Compare 4-language scripts and get timing variance.

Request Body:

{
  "en": "English script text (max 10,000 chars)",
  "es": "Spanish script text (max 10,000 chars)",
  "ko": "Korean script text (max 10,000 chars)",
  "hi": "Hindi script text (max 10,000 chars)"
}

Success Response (200):

{
  "success": true,
  "data": { /* TimeVarianceReport */ },
  "timestamp": "2026-04-11T14:30:00.000Z"
}

Error Responses:

400 — Missing or invalid scripts
401 — Invalid API key
404 — Voice not found
429 — Rate limit exceeded
503 — ElevenLabs service unavailable

POST `/api/v1/export-csv`

Export 4-language scripts as a Canva-ready CSV with visual cues (Phase 4 feature).

This endpoint generates a CSV file with dual text columns for each language:

Audio — Full script text (what ElevenLabs speaks)
Canva_Text — Shortened visual cues (reduced cognitive load for viewers)

Visual cues are automatically generated by extracting key phrases and preserving duration markers (e.g., "Breathe deeply for 4 seconds" → "Breathe Deeply (4s)").

Request Body:

{
  "en": "Breathe in for 4 seconds.\nHold for 4 seconds.\nExhale for 4 seconds.",
  "es": "Respira durante 4 segundos.\nMantén durante 4 segundos.\nExhala durante 4 segundos.",
  "ko": "4초 동안 숨을 마세요.\n4초 동안 유지하세요.\n4초 동안 숨을 내쉬세요.",
  "hi": "4 सेकंड के लिए सांस लें।\n4 सेकंड तक रोकें।\n4 सेकंड के लिए सांस छोड़ें।"
}

Success Response (200):

{
  "success": true,
  "data": {
    "totalScenes": 3,
    "rows": [
      {
        "sceneNumber": 1,
        "enAudio": "Breathe in for 4 seconds.",
        "enCanvaText": "Breathe In (4s)",
        "esAudio": "Respira durante 4 segundos.",
        "esCanvaText": "Respira (4s)",
        "koAudio": "4초 동안 숨을 마세요.",
        "koCanvaText": "숨 (4s)",
        "hiAudio": "4 सेकंड के लिए सांस लें।",
        "hiCanvaText": "सांस (4s)"
      },
      ...
    ],
    "csvContent": "[UTF-8 BOM]Scene_Number,English_Audio,English_Canva_Text,...",
    "metadata": {
      "generatedAt": "2026-04-12T10:30:00.000Z",
      "languagesIncluded": ["en", "es", "ko", "hi"],
      "encoding": "UTF-8"
    }
  },
  "timestamp": "2026-04-12T10:30:00.000Z"
}

CSV Column Structure:

Column	Content	Purpose
Scene_Number	1, 2, 3, ...	Sequential scene identifier
English_Audio	Full text	What narrator reads aloud
English_Canva_Text	Visual cue	Shortened text for on-screen display
Spanish_Audio	Full text	Spanish narration
Spanish_Canva_Text	Visual cue	Spanish on-screen display
Korean_Audio	Full text	Korean narration
Korean_Canva_Text	Visual cue	Korean on-screen display
Hindi_Audio	Full text	Hindi narration
Hindi_Canva_Text	Visual cue	Hindi on-screen display

Import into Canva:

Download the CSV from the API response (copy csvContent)
In Canva, create a spreadsheet or table element
Import CSV: Paste content into spreadsheet import dialog
Map columns: English_Canva_Text, Spanish_Canva_Text, Korean_Canva_Text, Hindi_Canva_Text become your overlay text
The *_Audio columns are for reference/backup (narrator's full scripts)

Unicode Handling:

✅ Automatically handles international characters:

Hindi (Devanagari): हिंदी, नमस्ते, स्वागत है
Korean (Hangul): 한국어, 반갑습니다, 환영합니다

The CSV includes a UTF-8 BOM (Byte Order Mark) to ensure Excel, Google Sheets, and Canva correctly interpret these scripts.

Error Responses:

400 — Missing or invalid scripts, exceeds 10,000 character limit
500 — CSV encoding validation failed

Example cURL:

curl -X POST http://localhost:3000/api/v1/export-csv \
  -H "Content-Type: application/json" \
  -d '{
    "en": "Breathe in for 4 seconds.\nHold for 4 seconds.",
    "es": "Respira durante 4 segundos.\nMantén durante 4 segundos.",
    "ko": "4초 동안 숨을 마세요.\n4초 동안 유지하세요.",
    "hi": "4 सेकंड के लिए सांस लें।\n4 सेकंड तक रोकें।"
  }'

GET `/api/v1/health`

Health check endpoint.

Response (200):

{
  "success": true,
  "data": {
    "status": "healthy",
    "timestamp": "2026-04-11T14:30:00.000Z"
  },
  "timestamp": "2026-04-11T14:30:00.000Z"
}

GET `/`

API information and available endpoints.

Performance & Scaling

Current Limitations

Single full-script comparison (no scene/segment support yet)
Sequential ElevenLabs API calls (parallelized internally)
10,000 character max per language (ElevenLabs billing optimization)
No caching (call ElevenLabs fresh each time for accuracy)

Future Optimizations

Batch processing for multiple script sets
Response caching with TTL
WebSocket streaming for real-time processing
Segment-based timing (scene-by-scene granularity)
Fallback estimation logic (for dev/testing without API)

Data Types

TimeVarianceReport

Complete analysis of timing variance across all languages.

{
  timestamp: string;                    // ISO 8601 timestamp
  referenceLanguage: 'en'|'es'|'ko'|'hi'; // Language with longest duration
  referenceDurationMs: number;          // Duration of reference language (ms)
  toleranceMs: number;                  // Acceptable variance threshold (±ms)
  languages: BufferRecommendation[];    // Recommendations for each language
  allPairVariances: LanguagePairVariance[]; // All 6 pairwise comparisons
  summary: {
    maxVarianceMs: number;              // Largest variance found
    minVarianceMs: number;              // Smallest variance found
    averageVarianceMs: number;          // Mean variance
    allWithinTolerance: boolean;        // All within ±80ms?
  };
}

Roadmap

✅ Phase 4: CSV Export with Visual Cues (Completed)

New Features:

POST /api/v1/export-csv endpoint for Canva-ready CSV exports
Automatic visual cue generation (shortened text + duration markers)
UTF-8 BOM support for Hindi (Devanagari) and Korean (Hangul)
Dual-column CSV format: Audio (full text) + Canva_Text (visual cues)
Comprehensive test coverage (csvGenerator, visualCueGenerator)

MVP Scope:

Stubbed translation service (predefined phrase mappings)
Scene numbering by line breaks
CSV validation and encoding checks

🔄 Phase 5: Real Translation Integration (Upcoming)

Planned Features:

Replace stubbed translation with real API:
- Google Translate API (cost-effective for production)
- OpenAI GPT-4 (better quality for wellness domain)
Translation caching to reduce API costs
Batch translation for multiple scripts
Domain-specific terminology for wellness/meditation scripts

Technical Details:

Create translationService.ts with pluggable API adapters
Add .env config for translation API selection
Implement rate-limiting and retry logic
Add profanity/safety checks for wellness content

📋 Phase 6: Scene Markers & Custom Cues (Future)

Planned Features:

Support explicit scene markers: [SCENE 1], [SCENE 2]
Custom cue templates per wellness category:
- Breathing: Preserve duration + breath direction (e.g., "Breathe In (4s)")
- Gratitude: Emphasize action (e.g., "Feel Gratitude")
- Affirmations: Highlight affirmation text (e.g., "I Am Calm")
User-configurable cue length (currently fixed at ≤50 chars)
Metadata export (duration markers, scene markers)

🔗 Phase 7: Canva API Direct Integration (Future)

Planned Features:

Direct Canva API integration (replace CSV export)
Auto-apply visual cues to Canva designs
Update overlays with timing buffers automatically
Support for multiple design templates

License

ISC

Support

For issues with:

This tool: Create an issue on GitHub
ElevenLabs API: Contact https://support.elevenlabs.io
YouTube overlay sync: Refer to your video editor's documentation (DaVinci Resolve, Premiere Pro, etc.)

Next Steps

After getting buffer recommendations:

Export timing report as CSV or JSON
Add silence to each language track in your video editor
- Most editors: Audio track > Insert silence > Specify duration
Verify sync by watching the video with all overlays
Fine-tune if needed (usually ±10-20ms manual adjustment suffices)
Render and upload to YouTube

Happy multicultural content creation! 🌍🎬

Made with ❤️ for multilingual creators

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
examples		examples
src		src
tests		tests
.env.example		.env.example
.env.swp		.env.swp
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
COMPLETION_SUMMARY.md		COMPLETION_SUMMARY.md
DELIVERY_SUMMARY.txt		DELIVERY_SUMMARY.txt
QUICK_START.sh		QUICK_START.sh
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Multi-Language Timing Sync Tool

Overview

Key Features

Installation

Prerequisites

Setup

Usage

API Endpoint

Request Example

Response Example

Understanding the Response

Development

Project Structure

Running Tests

Building for Production

Configuration

ElevenLabs Model Options

Timing Tolerance

Troubleshooting

Missing API Key Error

Invalid Voice ID Error

Rate Limit Error

Character Encoding Issues

API Documentation

Endpoints

POST /api/v1/compare

POST /api/v1/export-csv

GET /api/v1/health

GET /

Performance & Scaling

Current Limitations

Future Optimizations

Data Types

TimeVarianceReport

Roadmap

✅ Phase 4: CSV Export with Visual Cues (Completed)

🔄 Phase 5: Real Translation Integration (Upcoming)

📋 Phase 6: Scene Markers & Custom Cues (Future)

🔗 Phase 7: Canva API Direct Integration (Future)

License

Support

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

POST `/api/v1/compare`

POST `/api/v1/export-csv`

GET `/api/v1/health`

GET `/`

Packages