Compare scripts in multiple languages (English, Spanish, Korean, Hindi) and calculate precise timing variance for YouTube video synchronization using ElevenLabs text-to-speech character-level timestamps.
This tool solves a critical problem: multilingual video production timing inconsistency. When you create videos with synchronized text overlays in multiple languages, each language takes different amounts of time to speak:
- Spanish typically takes ~5% longer than English
- Korean takes ~10% longer than English
- Hindi takes ~12% longer than English
This tool calculates the exact time differences and recommends how much buffer (silence) you need to add to each language version to keep overlays synchronized within ±80ms (imperceptible to viewers).
✅ Accurate Multilingual Timing — Uses ElevenLabs API /with-timestamps endpoint for character-level precision
✅ UTF-8 Grapheme Aware — Properly handles Hindi (Devanagari) and Korean (Hangul) combining characters
✅ Buffer Recommendations — Tells you exactly how much silence to add to sync videos
✅ Tolerance-Based Status — Automatically flags whether each language is "OK" or needs "Adjust" or "Cut"
✅ Detailed Variance Report — All 6 pairwise language comparisons + summary statistics
✅ Simple REST API — Single POST endpoint with JSON input/output
- Node.js 18+ (for Intl.Segmenter API support)
- ElevenLabs Account with API key and voice IDs
- npm or yarn
- Clone and install dependencies:
cd MultiLanguageTimingTool
npm install- Configure environment variables:
cp .env.example .envThen edit .env and add:
- Your ElevenLabs API key (from https://elevenlabs.io/app/api-keys)
- Voice IDs for each language (from https://elevenlabs.io/app/voice-lab)
Example .env:
ELEVENLABS_API_KEY=sk_abc123def456...
ELEVENLABS_MODEL_ID=eleven_multilingual_v2
VOICE_ID_EN=voice_EXAVITQu4MsJ60DaXUN1
VOICE_ID_ES=voice_EXAVITQu4MsJ60DaXUN2
VOICE_ID_KO=voice_EXAVITQu4MsJ60DaXUN3
VOICE_ID_HI=voice_EXAVITQu4MsJ60DaXUN4
PORT=3000
NODE_ENV=development- Start the development server:
npm run devServer will start on http://localhost:3000
POST /api/v1/compare
Compares 4-language scripts and returns timing variance report.
curl -X POST http://localhost:3000/api/v1/compare \
-H "Content-Type: application/json" \
-d '{
"en": "Welcome to our channel! Today we are discussing the importance of language learning.",
"es": "¡Bienvenido a nuestro canal! Hoy vamos a discutir la importancia del aprendizaje de idiomas.",
"ko": "저희 채널에 오신 것을 환영합니다! 오늘은 언어 학습의 중요성에 대해 논의하겠습니다.",
"hi": "हमारे चैनल में आपका स्वागत है! आज हम भाषा सीखने के महत्व पर चर्चा करेंगे।"
}'{
"success": true,
"data": {
"timestamp": "2026-04-11T14:30:00.000Z",
"referenceLanguage": "es",
"referenceDurationMs": 4523,
"toleranceMs": 80,
"languages": [
{
"language": "en",
"durationMs": 4250,
"varianceFromReferenceMs": -273,
"recommendedBufferMs": 273,
"withinTolerance": false,
"status": "Adjust"
},
{
"language": "es",
"durationMs": 4523,
"varianceFromReferenceMs": 0,
"recommendedBufferMs": 0,
"withinTolerance": true,
"status": "OK"
},
{
"language": "ko",
"durationMs": 4180,
"varianceFromReferenceMs": -343,
"recommendedBufferMs": 343,
"withinTolerance": false,
"status": "Adjust"
},
{
"language": "hi",
"durationMs": 4410,
"varianceFromReferenceMs": -113,
"recommendedBufferMs": 113,
"withinTolerance": true,
"status": "OK"
}
],
"allPairVariances": [
{
"language1": "en",
"language2": "es",
"varianceMs": -273,
"absVarianceMs": 273
},
...
],
"summary": {
"maxVarianceMs": 343,
"minVarianceMs": 0,
"averageVarianceMs": 182,
"allWithinTolerance": false
}
},
"timestamp": "2026-04-11T14:30:00.000Z"
}Reference Language: The language with the longest duration. All others are compared against this baseline.
Buffer Recommendations:
- Status "OK" — Variance within ±80ms tolerance, no action needed
- Status "Adjust" — Add silence (in milliseconds) to match reference duration
- Status "Cut" — Language exceeds reference, needs trimming
Example Workflow:
- Run tool with your 4-language scripts
- Get buffer recommendations (e.g., English needs +273ms silence)
- In your video editor:
- English: Add 273ms silence at the end
- Spanish: No change (reference language)
- Korean: Add 343ms silence at the end
- Hindi: Add 113ms silence at the end
- Now all overlays sync within ±80ms (imperceptible difference)
src/
├── server.ts # Express app entry point
├── types/index.ts # TypeScript interfaces
├── utils/
│ ├── characterCounting.ts # UTF-8 grapheme handling
│ ├── timingCalculations.ts # Core variance algorithm
│ ├── visualCueGenerator.ts # CSV visual cue generation (Phase 4)
│ ├── csvFormatter.ts # CSV serialization + UTF-8 BOM (Phase 4)
│ └── encodingValidator.ts # Unicode encoding validation (Phase 4)
├── services/
│ ├── elevenLabsClient.ts # ElevenLabs API wrapper
│ ├── translationService.ts # Translation stub (Phase 4)
│ └── csvGenerator.ts # CSV export orchestrator (Phase 4)
└── routes/
└── compareRoutes.ts # POST /compare + POST /export-csv endpoints
tests/
├── characterCounting.test.ts # 22 character tests
├── timingCalculations.test.ts # 22 variance tests
├── csvGenerator.test.ts # CSV generation & validation tests (Phase 4)
└── visualCueGenerator.test.ts # Visual cue generation tests (Phase 4)
# Run all tests
npm test
# Run tests in watch mode
npm test -- --watch
# Run specific test file
npm test characterCounting.test.tsnpm run build
# Output artifacts in dist/ folder
ls dist/
server.js
types/
utils/
services/
routes/Recommended models (all support multilingual):
| Model | Speed | Cost | Best For |
|---|---|---|---|
eleven_multilingual_v2 |
Medium | Standard | Production (stable & reliable) |
eleven_flash_v2_5 |
Fastest | 50% cheaper | Quick testing, high volume |
eleven_multilingual_sts_v2 |
Medium | Standard | Streaming (lower latency) |
Change model by editing ELEVENLABS_MODEL_ID in .env.
Default tolerance is ±80ms (YouTube/streaming standard).
To modify, edit TOLERANCE_MS constant in src/utils/timingCalculations.ts:
const TOLERANCE_MS = 80; // Change to desired threshold in millisecondsThen rebuild:
npm run buildError: Missing ELEVENLABS_API_KEY in environment variables
Solution: Copy .env.example to .env and add your API key from https://elevenlabs.io/app/api-keys
ElevenLabs voice not found for language: en. Check voice IDs in .env.
Solution:
- Go to https://elevenlabs.io/app/voice-lab
- Copy the voice ID for each voice
- Paste into corresponding
VOICE_ID_*field in.env - Voice IDs start with
voice_(e.g.,voice_EXAVITQu4MsJ60DaXUN1)
ElevenLabs API rate limit exceeded. Retry after a moment.
Solution: Wait a moment and retry. ElevenLabs has rate limits based on your plan. Check your usage at https://elevenlabs.io/app/usage.
Make sure your script files are saved as UTF-8 with BOM removed:
- VS Code: Select
UTF-8in bottom right, notUTF-8 with BOM - MacOS/Linux: Verify with
file -b -i scriptfile.txt(should showcharset=utf-8)
Compare 4-language scripts and get timing variance.
Request Body:
{
"en": "English script text (max 10,000 chars)",
"es": "Spanish script text (max 10,000 chars)",
"ko": "Korean script text (max 10,000 chars)",
"hi": "Hindi script text (max 10,000 chars)"
}Success Response (200):
{
"success": true,
"data": { /* TimeVarianceReport */ },
"timestamp": "2026-04-11T14:30:00.000Z"
}Error Responses:
- 400 — Missing or invalid scripts
- 401 — Invalid API key
- 404 — Voice not found
- 429 — Rate limit exceeded
- 503 — ElevenLabs service unavailable
Export 4-language scripts as a Canva-ready CSV with visual cues (Phase 4 feature).
This endpoint generates a CSV file with dual text columns for each language:
- Audio — Full script text (what ElevenLabs speaks)
- Canva_Text — Shortened visual cues (reduced cognitive load for viewers)
Visual cues are automatically generated by extracting key phrases and preserving duration markers (e.g., "Breathe deeply for 4 seconds" → "Breathe Deeply (4s)").
Request Body:
{
"en": "Breathe in for 4 seconds.\nHold for 4 seconds.\nExhale for 4 seconds.",
"es": "Respira durante 4 segundos.\nMantén durante 4 segundos.\nExhala durante 4 segundos.",
"ko": "4초 동안 숨을 마세요.\n4초 동안 유지하세요.\n4초 동안 숨을 내쉬세요.",
"hi": "4 सेकंड के लिए सांस लें।\n4 सेकंड तक रोकें।\n4 सेकंड के लिए सांस छोड़ें।"
}Success Response (200):
{
"success": true,
"data": {
"totalScenes": 3,
"rows": [
{
"sceneNumber": 1,
"enAudio": "Breathe in for 4 seconds.",
"enCanvaText": "Breathe In (4s)",
"esAudio": "Respira durante 4 segundos.",
"esCanvaText": "Respira (4s)",
"koAudio": "4초 동안 숨을 마세요.",
"koCanvaText": "숨 (4s)",
"hiAudio": "4 सेकंड के लिए सांस लें।",
"hiCanvaText": "सांस (4s)"
},
...
],
"csvContent": "[UTF-8 BOM]Scene_Number,English_Audio,English_Canva_Text,...",
"metadata": {
"generatedAt": "2026-04-12T10:30:00.000Z",
"languagesIncluded": ["en", "es", "ko", "hi"],
"encoding": "UTF-8"
}
},
"timestamp": "2026-04-12T10:30:00.000Z"
}CSV Column Structure:
| Column | Content | Purpose |
|---|---|---|
| Scene_Number | 1, 2, 3, ... | Sequential scene identifier |
| English_Audio | Full text | What narrator reads aloud |
| English_Canva_Text | Visual cue | Shortened text for on-screen display |
| Spanish_Audio | Full text | Spanish narration |
| Spanish_Canva_Text | Visual cue | Spanish on-screen display |
| Korean_Audio | Full text | Korean narration |
| Korean_Canva_Text | Visual cue | Korean on-screen display |
| Hindi_Audio | Full text | Hindi narration |
| Hindi_Canva_Text | Visual cue | Hindi on-screen display |
Import into Canva:
- Download the CSV from the API response (copy
csvContent) - In Canva, create a spreadsheet or table element
- Import CSV: Paste content into spreadsheet import dialog
- Map columns:
English_Canva_Text,Spanish_Canva_Text,Korean_Canva_Text,Hindi_Canva_Textbecome your overlay text - The
*_Audiocolumns are for reference/backup (narrator's full scripts)
Unicode Handling:
✅ Automatically handles international characters:
- Hindi (Devanagari): हिंदी, नमस्ते, स्वागत है
- Korean (Hangul): 한국어, 반갑습니다, 환영합니다
The CSV includes a UTF-8 BOM (Byte Order Mark) to ensure Excel, Google Sheets, and Canva correctly interpret these scripts.
Error Responses:
- 400 — Missing or invalid scripts, exceeds 10,000 character limit
- 500 — CSV encoding validation failed
Example cURL:
curl -X POST http://localhost:3000/api/v1/export-csv \
-H "Content-Type: application/json" \
-d '{
"en": "Breathe in for 4 seconds.\nHold for 4 seconds.",
"es": "Respira durante 4 segundos.\nMantén durante 4 segundos.",
"ko": "4초 동안 숨을 마세요.\n4초 동안 유지하세요.",
"hi": "4 सेकंड के लिए सांस लें।\n4 सेकंड तक रोकें।"
}'Health check endpoint.
Response (200):
{
"success": true,
"data": {
"status": "healthy",
"timestamp": "2026-04-11T14:30:00.000Z"
},
"timestamp": "2026-04-11T14:30:00.000Z"
}API information and available endpoints.
- Single full-script comparison (no scene/segment support yet)
- Sequential ElevenLabs API calls (parallelized internally)
- 10,000 character max per language (ElevenLabs billing optimization)
- No caching (call ElevenLabs fresh each time for accuracy)
- Batch processing for multiple script sets
- Response caching with TTL
- WebSocket streaming for real-time processing
- Segment-based timing (scene-by-scene granularity)
- Fallback estimation logic (for dev/testing without API)
Complete analysis of timing variance across all languages.
{
timestamp: string; // ISO 8601 timestamp
referenceLanguage: 'en'|'es'|'ko'|'hi'; // Language with longest duration
referenceDurationMs: number; // Duration of reference language (ms)
toleranceMs: number; // Acceptable variance threshold (±ms)
languages: BufferRecommendation[]; // Recommendations for each language
allPairVariances: LanguagePairVariance[]; // All 6 pairwise comparisons
summary: {
maxVarianceMs: number; // Largest variance found
minVarianceMs: number; // Smallest variance found
averageVarianceMs: number; // Mean variance
allWithinTolerance: boolean; // All within ±80ms?
};
}New Features:
POST /api/v1/export-csvendpoint for Canva-ready CSV exports- Automatic visual cue generation (shortened text + duration markers)
- UTF-8 BOM support for Hindi (Devanagari) and Korean (Hangul)
- Dual-column CSV format: Audio (full text) + Canva_Text (visual cues)
- Comprehensive test coverage (csvGenerator, visualCueGenerator)
MVP Scope:
- Stubbed translation service (predefined phrase mappings)
- Scene numbering by line breaks
- CSV validation and encoding checks
Planned Features:
- Replace stubbed translation with real API:
- Google Translate API (cost-effective for production)
- OpenAI GPT-4 (better quality for wellness domain)
- Translation caching to reduce API costs
- Batch translation for multiple scripts
- Domain-specific terminology for wellness/meditation scripts
Technical Details:
- Create
translationService.tswith pluggable API adapters - Add
.envconfig for translation API selection - Implement rate-limiting and retry logic
- Add profanity/safety checks for wellness content
Planned Features:
- Support explicit scene markers:
[SCENE 1],[SCENE 2] - Custom cue templates per wellness category:
- Breathing: Preserve duration + breath direction (e.g., "Breathe In (4s)")
- Gratitude: Emphasize action (e.g., "Feel Gratitude")
- Affirmations: Highlight affirmation text (e.g., "I Am Calm")
- User-configurable cue length (currently fixed at ≤50 chars)
- Metadata export (duration markers, scene markers)
Planned Features:
- Direct Canva API integration (replace CSV export)
- Auto-apply visual cues to Canva designs
- Update overlays with timing buffers automatically
- Support for multiple design templates
ISC
For issues with:
- This tool: Create an issue on GitHub
- ElevenLabs API: Contact https://support.elevenlabs.io
- YouTube overlay sync: Refer to your video editor's documentation (DaVinci Resolve, Premiere Pro, etc.)
After getting buffer recommendations:
- Export timing report as CSV or JSON
- Add silence to each language track in your video editor
- Most editors: Audio track > Insert silence > Specify duration
- Verify sync by watching the video with all overlays
- Fine-tune if needed (usually ±10-20ms manual adjustment suffices)
- Render and upload to YouTube
Happy multicultural content creation! 🌍🎬
Made with ❤️ for multilingual creators